Fabien GANDON,
Fabien.Gandon@sophia.inria.fr
The Semantic Web is viewed often as heavy, AI-driven ontologies
Where's the data on the Semantic Web?
Microformats are often thought of as a competing alternative to the Semantic Web
<span class="vevent"> <p><abbr class="dtstart" title="2006-12-05">December 5-</abbr> <abbr class="dtend" title="2006-12-07">7th</abbr> </b> At <b><span class="summary">;XML 2006</span></b> (<span class="location">Boston, MA USA</span>) for a presentation on "Social Semantic Mashups".</span>
<li><a href="http://www.w3.org/People/Connolly/" rel="colleague met">Dan Connolly</a></li> <li> <a href="http://seanmcgrath.blogspot.com/" rel="colleague met">Sean McGrath</a></li> <li><a href="http://www.jclark.com/" rel="colleague">James Clark</a></li>
<div class="vcard"><p> <span class="fn n"> <span class="given-name">Harry </span> <span class="additional-name">Reeves</span> <span class="family-name"> Halpin</span></p> <p><span class="tel">+44-131-650-4421</span></p> <table><tr><td> <span class="street-address">2 Buccleuch Place</span> </td></tr><tr><td> <span class="locality">Edinburgh</span> <span class="postalcode">EH8 9LW</span> </td></tr><tr><td> <span class="region">Scotland</span> <span class="country-name">UK</span> </td></tr></table>vCard Extraction
With GRDDL, microformat data can be viewed as Semantic Web data.
So the first step of Semantic Web deployment is already happening...
The small, light and agile Working Group has produced:
Jane is trying to coordinate a meeting with friends. She uses GRDDL to extract data from each of their calendar pages and combine it in a single model. She then writes a query to filter the events down to those dates when all of them are in the same city.
The Technical University of Marcilly decides to use a wiki with metadata embedded in its pages to tag, structure, navigate and query the resources of the wiki. GRDDL is used to extract these metadata as RDF to feed the different tools of the system.
The Open Archives Initiative (OAI) publishes an XML schema that universities can use to publish their archived documents. This schema also identifies a GRDDL transform to apply to all its instance documents in order to extract their Creative Commons license.
Whenever he gets in touch with someone, Steffen starts a simple script that aims at gathering as much metadata about this person as possible. Because most of these web pages are not even valid HTML, the script calls an HTML-tidying tool and if the tidying is complex some of the metadata is likely to be no longer coherent.
Oceanic wishes to also publish RDF descriptions of their parts reusing the AirPartML documents produced for an arrangement with a consortium of airlines. The AirPartML schemas are strict and therefore Oceanic cannot alter their XML documents to specify a transformation. Yet using the HTTP Headers, Oceanic can specify link and profiles for transformation when serving their AirPartML documents.
xmlns:grddl='http://www.w3.org/2003/g/data-view#' grddl:transformation="glean_title.xsl"
<head profile="http://www.w3.org/2003/g/data-view"> <link rel="transformation" href="glean_title.xsl" />
<book xmlns="http://example.org/book/"
xmlns:grddl='http://www.w3.org/2003/g/data-view#'
grddl:transformation="glean_title.xsl
http://example.org/book/getAuthor.xsl" >
<title>The man who mistook his wife for a hat</title>
...
</book>
<html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://www.w3.org/2003/g/data-view" > <title>The man who mistook his wife for a hat</title> <link rel="transformation" href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" /> <link rel="transformation" href="glean_title.xsl" /> <meta name="DC.Subject" content="clinical tales" /> ... </head> ... </html>
XML namespace document (or XHTML profile document)
<head profile="http://purl.org/NET/erdf/profile">
<a rel="profileTransformation" href="http://purl.org/NET/erdf/extract-rdf">GRDDL transform</a>
Example referencing GRDDL transformations directly in the head of the HTML.
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Robin's Schedule</title> </head> <body> <ol class="schedule"> <li>2007 <ol> <li class="vevent"> <strong class="summary">Web Design Conference</strong> in <span class="location">Edinburgh, UK</span>: <abbr class="dtstart" title="2007-01-08">Jan 8</abbr> to <abbr class="dtend" title="2007-01-11">10</abbr> </li> <li class="vevent"> <strong class="summary">Board Review</strong> in <span class="location">New York, USA</span>: <abbr class="dtstart" title="2007-02-23">Feb 23</abbr> to <abbr class="dtend" title="2007-02-25">24</abbr> </li> </ol> </li> </ol> </body> </html>
<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en" lang="en">
<head profile="http://www.w3.org/2003/g/data-view" >
<title>Robin's Schedule</title>
</head>
<body>& ...
<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en" lang="en">
<head profile="http://www.w3.org/2003/g/data-view" >
<title>Robin's Schedule</title>
<link rel="transformation" href="http://www.w3.org/2002/12/cal/glean-hcal"/>
</head>
<body>& ...
Example referencing GRDDL transformations in a profile document referenced in the head of the HTML.
From 7 October, 2006 to 12 October, 2006 I will be attending the National Tiddlywinks Championship in Bognor Regis, UK.
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Where Am I</title> <link rel="schema.cal" href="http://www.w3.org/2002/12/cal#" /> </head> <body> <p class="-cal-Vevent" id="tiddlywinks"> From <span class="cal-dtstart" title="2006-10-07">7 October, 2006</span> to <span class="cal-dtend" title="2006-10-13">12 October, 2006</span> I will be attending the <span class="cal-summary">National Tiddlywinks Championship</span> in <span class="cal-location">Bognor Regis, UK</span>. </p> </body> </html>
<head profile="http://purl.org/NET/erdf/profile">
<title>Where Am I</title>
<link rel="schema.cal" href="http://www.w3.org/2002/12/cal#" />
</head>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head profile="http://www.w3.org/2003/g/data-view"> <title>Embedded RDF HTML Profile</title> <link rel="transformation" href="http://www.w3.org/2003/g/glean-profile" /> </head> <body> <p> <a rel="profileTransformation" href="http://purl.org/NET/erdf/extract-rdf">GRDDL transform</a> </p> </body> </html>
Example referencing GRDDL transformations in an XML document.
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:voc="urn:hl7-org:v3/voc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" templateId="2.16.840.1.113883.3.27.1776"> ... <author> <time value="20000407"/> <assignedAuthor> <id extension="KP00017" root="2.16.840.1.113883.3.933"/> <assignedPerson> <name> <given>Robert</given> <family>Dolin</family> <suffix>MD</suffix> </name> </assignedPerson> <recordTarget> <patientRole> <patientPatient> <name> <given>Henry</given> <family>Levin</family> <suffix>the 7th</suffix> </name> <administrativeGenderCode code="M" codeSystem="2.16.840.1.113883.5.1"/> <birthTime value="19320924"/> </patientPatient> </patientRole> </recordTarget> </author> </ClinicalDocument>
<ClinicalDocument
xmlns="urn:hl7-org:v3"
xmlns:voc="urn:hl7-org:v3/voc"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
templateId="2.16.840.1.113883.3.27.1776"
xmlns:grddl="http://www.w3.org/2003/g/data-view#"
grddl:transformation="glean-HL7-CDA.xslt">
Current web pages, written in HTML, contain significant inherent structured data.
RDFa = RDF in attributes to ...
... allow publishers to express this data and structure more completely.
... allow tools to read this data easily.
... allow users to transfer structured data between applications and web sites.
... support a new world of user functionality.
Examples from the RDFa Primer:
RDFa vs. microformats
This document is licensed under a <a href="http://cc.org/licenses/by/3.0/"> CC License </a> and was written by TimBL.
The idea that this implicit structured can be marked up is not new: microformats.
What's new:
This document is licensed under a <a href="http://cc.org/licenses/by/3.0/" xmlns:cc="http://cc.org/ns#" rel="cc:license"> CC License </a> and was written by TimBL.
use existing HTML attributes whenever possible: rel.
"Bridging the Clickable and Semantic Webs": there's already a clickable link, now we type it.
self-contained:
This document ... <div rel="dc:creator" class="foaf:Person" xmlns:dc="http://..." xmlns:foaf="http://..."> and was written by <span property="foaf:nickname"> TimBL </span>. </div>
yields
<> dc:creator [a foaf:Person ; foaf:nickname "TimBL"] .
slightly expand the use of HTML attributes: rel on any element to introduce a new RDF bnode (striping).
use the inherent semantics of HTML: class attribute is type information.
RDFa is a syntax for expressing structured data in XHTML.
The rendered, hypertext data of XHTML is reused by the RDFa markup, so...
The underlying abstract representation already is RDF, so...
RDFa can be parsed using GRDDL. There are Python, PHP, Java, and JavaScript implementations.
RDFa is specified:
Good News: RDFa does not interfere with existing HTML.
It does not validate, but the document remains conformant and all browsers render it just fine.
<?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:foaf="http://xmlns.com/foaf/0.1/" > <head profile="http://www.w3.org/2003/g/data-view" > <link rel="transformation" href="RDFa2RDFXML.xsl"/> <title>Biblio description</title> </head> <body> <h2>Biblio description</h2> <dl about="http://www.w3.org/TR/2004/REC-rdf-mt-20040210/"> <dt>Title</dt> <dd property="dc:title">RDF Semantics - W3C Recommendation 10 February 2004</dd> <dt>Author</dt> <dd rel="dc:creator" href="#a1"> <span about="#a1"> <link rel="rdf:type" href="[foaf:Person]" /> <span property="foaf:name">Patrick Hayes</span> see <a rel="foaf:homepage" href="http://www.ihmc.us/users/user.php?UserID=42">homepage</a> </span> </dd> </dl> </body> </html>l>
PREFIX dc: <http://purl.org/dc/elements/1.1/> <http://www.w3.org/TR/2004/REC-rdf-mt-20040210/> dc:title "RDF Semantics - W3C Recommendation 10 February 2004" <http://www.w3.org/TR/2004/REC-rdf-mt-20040210/> dc:creator <#a1> <#a1> rdf:type <http://xmlns.com/foaf/0.1/Person"/> <#a1> foaf:name "Patrick Hayes" <#a1> foaf:homepage <http://www.ihmc.us/users/user.php?UserID=42/>
Demo microformats+eRDF+RDFa+GRDDL
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/elements/1.1/"> <body> This photo was taken by <span class="author" about="photo1.jpg" property="dc:creator">Mark Birbeck</span>. </body> </html>Rendered: This photo was taken by Mark Birbeck.
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/elements/1.1/"> <head profile="http://www.w3.org/2003/g/data-view"> <link rel="transformation" href="http://www-sop.inria.fr/acacia/soft/RDFa2RDFXML_v_0_8.xsl"/> </head> <body> This photo was taken by <span class="author" about="photo1.jpg" property="dc:creator">Mark Birbeck</span>. </body> </html>
<http://www.w3.org/2006/07/SWD/RDFa/testsuite/testcases/photo1.jpg> dc:creator "Mark Birbeck"^^rdf:XMLLiteral
PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?x WHERE { ?x dc:creator "Mark Birbeck"^^rdf:XMLLiteral . }
<sparql xmlns='http://www.w3.org/2005/sparql-results#' xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' > <head><var name='x'/></head> <results distinct='false' sorted='false' > <result> <binding name='x'> <uri>http://www.w3.org/2006/07/SWD/RDFa/testsuite/testcases/photo1.jpg</uri> </binding> </result> </results> </sparql>
Don't bury your data in some HTML page: when you publish a document that contains data,
do reference GRDDL profiles and/or transformations for their extraction.