The digital library use case is a typical example of a scenario where a publisher of distributed web content meant for both human and machine consumption: humans browse the descriptions of electronic documents available online and machines need to crawl these descriptions to build indexes, navigation and search pages.
In this example we focus on automating the construction of indexes. The idea is to crawl GRDDL source documents and extract embedded RDFa to feed an RDF store. SPARQL queries are then solved against this store and rendered as web pages to automatically generate up-to-date indexes.
This example relies on XSLT1.0 for encoding GRDDL Transformations. The GRDDL source documents use the RDFa syntax to express metadata in XHTML. The GRDDL source documents are linked to their transformations using the simplest method: a link element in the head of the document referencing the appropriate XSLT stylesheet.
Here is a short example of what a GRDDL source document using RDFa looks like (source file ; test it with RDFa Highlight bookmarklet ; view XHTML 1.0 validation results):
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
<?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml" xml:base="http://www.dc4plus.com/references/rdf_sem.html" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:foaf="http://xmlns.com/foaf/0.1/" > <head profile="http://ns.inria.fr/grddl/rdfa/"> <title>Biblio description</title> </head> <body> <h1>Biblio description</h1> <dl about="http://www.w3.org/TR/2004/REC-rdf-mt-20040210/"> <dt>Title</dt> <dd property="dc:title">RDF Semantics - W3C Recommendation 10 February 2004</dd> <dt>Author</dt> <dd rel="dc:creator" href="#a1"> <span id="a1"> <link rel="rdf:type" href="[foaf:Person]" /> <span property="foaf:name">Patrick Hayes</span> see <a rel="foaf:homepage" href="http://www.ihmc.us/users/user.php?UserID=42">homepage</a> </span> </dd> </dl> </body> </html> |
To turn the XHTML document into a GRDDL source document in Line 7 she adds a profile
attribute in the head element to denote that her document contains RDFa metadata.
Using the generic GRDDL transformation specified in this profile docuent (see the RDFa2RDFXML.xsl refereced in the GRDDL RDFa profile) the RDF is extracted from this file (output for this example); it corresponds to the following triples:
00 01 02 03 04 05 06 07 08 09 10 11 |
http://www.w3.org/TR/2004/REC-rdf-mt-20040210/ dc:title "RDF Semantics - W3C Recommendation 10 February 2004" http://www.w3.org/TR/2004/REC-rdf-mt-20040210/ dc:creator http://www.dc4plus.com/references/rdf_sem.html#a1 http://www.dc4plus.com/references/rdf_sem.html#a1 rdf:type http://xmlns.com/foaf/0.1/Person http://www.dc4plus.com/references/rdf_sem.html#a1 foaf:name "Patrick Hayes" foaf:homepage http://www.ihmc.us/users/user.php?UserID=42 |
The part concerning the technical report correspond to the following graph:
Then queries can be solved against this RDF. For instance to build an index of all the documents with their title and the name of their authors one can use the following SPARQL query:
00 01 02 03 04 05 06 07 08 |
PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> select ?doc ?title ?name display xml where { ?doc dc:title ?title . ?doc dc:creator ?c . ?c foaf:name ?name } |
Among other results, this qery retrieves the initial document used for this example; here is an extract of the result obtained for instance using Corese:
00 01 02 03 04 |
<result> <binding name='doc'><uri>http://www.w3.org/TR/2004/REC-rdf-mt-20040210/</uri></binding> <binding name='title'><literal datatype='http://www.w3.org/2001/XMLSchema#string'>RDF Semantics - W3C Recommendation 10 February 2004</literal></binding> <binding name='name'><literal datatype='http://www.w3.org/2001/XMLSchema#string'>Patrick Hayes</literal></binding> </result> |