INRIA logo
Cover page images (INRIA Logo)

Bootstrapping the Semantic Web with GRDDL, Microformats, and RDFa


acacia Fabien GANDON, Fabien.Gandon@sophia.inria.fr

edinburgh Harry HALPIN, hhalpin@ibiblio.org

Ben Adida, ben@adida.net

Introduction: Data and Documents

The Web 2.0 and the Semantic Web

The Semantic Web is viewed often as heavy, AI-driven ontologies

Where's the data on the Semantic Web?

Microformats are often thought of as a competing alternative to the Semantic Web

The Limits of Microformats

From Microformats to the Semantic Web

With GRDDL, microformat data can be viewed as Semantic Web data.

So the first step of Semantic Web deployment is already happening...

GRDDL Working Group

The small, light and agile Working Group has produced:

Test Cases and Implementations

Next Steps: Micromodels

Guided of GRDDL use cases

Overview of motivating scenarios for Gleaning Resource Descriptions from Dialects of Languages.

Scheduling

Jane is trying to coordinate a meeting.

Jane is trying to coordinate a meeting with friends. She uses GRDDL to extract data from each of their calendar pages and combine it in a single model. She then writes a query to filter the events down to those dates when all of them are in the same city.

calendars Data in RDF SPARQL

Health Care

Kayode wants to query clinical data.

patient files Data in RDF Schemas Reports
Kayode uses a single-purpose XML vocabulary as the main representation format for a computer-based patient record. He uses GRDDL to be able to query these records both in their XML vocabulary and as RDF, without managing a dual representation.

Aggregating data

Stephan wants a synthetic review before buying a guitar.

reviews Data in RDF query
hCard creator
hCard example
Stephan wishes to buy a guitar and visits a site offering a review service. He uses GRDDL to aggregate reviews and profiles of the reviewers in order to select the reviews he can trust.

Querying sites and digital libraries

DC4Plus Corp. wants to automate the publication of its electronic documents.

documents Data in RDF reports and indexes
Adeline designs a system to allow here company to streamline the publication of Technical Reports. The system relies on shared templates for publishing documents and a GRDDL transformation to build an up-to-date RDF index used to create an authoritative repository.

Wikis and e-learning

The Technical University of Marcilly decided to use wikis to foster knowledge exchanges between lecturers and students.

The Technical University of Marcilly decides to use a wiki with metadata embedded in its pages to tag, structure, navigate and query the resources of the wiki. GRDDL is used to extract these metadata as RDF to feed the different tools of the system.

wiki pages Data in RDF schemas Sparql
(demo)

Web syndication

Extracting form descriptions to push entries to Voltaire's blog.

documents Data in RDF reports and indexes reports and indexes
Voltaire has setup a weblog engine that utilizes XForms for editing entries. He also provides a GRDDL transformation that extracts an RDF description of the XForms that other client applications can use it to update existing entries using the identified service URIs, and perform other such services.

Validated Documents

the OAI would like to be able to specify document licenses in the schema they share.

The Open Archives Initiative (OAI) publishes an XML schema that universities can use to publish their archived documents. This schema also identifies a GRDDL transform to apply to all its instance documents in order to extract their Creative Commons license.

wiki pages Data in RDF schemas Sparql Sparql

Pulling Data from the Web

Steffen wants to build a directory of the people he works with.

Whenever he gets in touch with someone, Steffen starts a simple script that aims at gathering as much metadata about this person as possible. Because most of these web pages are not even valid HTML, the script calls an HTML-tidying tool and if the tidying is complex some of the metadata is likely to be no longer coherent.

documents Data in RDF reports and indexes reports and indexes

Pushing a transformation

Oceanic Consortium wants to provide transformations for their files without altering them or their schema.

Oceanic wishes to also publish RDF descriptions of their parts reusing the AirPartML documents produced for an arrangement with a consortium of airlines. The AirPartML schemas are strict and therefore Oceanic cannot alter their XML documents to specify a transformation. Yet using the HTTP Headers, Oceanic can specify link and profiles for transformation when serving their AirPartML documents.

documents Header tells its GRDDL source Get transforms Data in RDF

GRDDL technical overview

Overview of some of the technical aspects for Gleaning Resource Descriptions from Dialects of Languages.

GRDDL enables you to...

Direct Reference of GRDDL Transformations

Examples of declarations

Indirect Reference of GRDDL Transformations

XML namespace document (or XHTML profile document)

Provide a faithful rendition

Example with Microformats

Example referencing GRDDL transformations directly in the head of the HTML.

Example with embedded RDF

Example referencing GRDDL transformations in a profile document referenced in the head of the HTML.

Example XML

Example referencing GRDDL transformations in an XML document.

Zoom on RDFa

Current web pages, written in HTML, contain significant inherent structured data.

RDFa = RDF in attributes to ...

... allow publishers to express this data and structure more completely.

... allow tools to read this data easily.

... allow users to transfer structured data between applications and web sites.

... support a new world of user functionality.

RDFa and microformats

Examples from the RDFa Primer:

RDFa vs. microformats

HTML Contains Implicit Structure

This document is licensed under a

<a href="http://cc.org/licenses/by/3.0/">
   CC License
</a>

and was written by TimBL.

The idea that this implicit structured can be marked up is not new: microformats.

What's new:

Basics: Typing a Link

This document is licensed under a

<a href="http://cc.org/licenses/by/3.0/"
   xmlns:cc="http://cc.org/ns#" rel="cc:license">
   CC License
</a>

and was written by TimBL.

use existing HTML attributes whenever possible: rel.

"Bridging the Clickable and Semantic Webs": there's already a clickable link, now we type it.

self-contained:

More Complex Structure: RDFa goes Deep

This document 
...
<div rel="dc:creator" class="foaf:Person"
   xmlns:dc="http://..." xmlns:foaf="http://...">
   and was written by
   <span property="foaf:nickname">
      TimBL
   </span>.
</div>

yields

<> dc:creator [a foaf:Person ; foaf:nickname "TimBL"] .

slightly expand the use of HTML attributes: rel on any element to introduce a new RDF bnode (striping).

use the inherent semantics of HTML: class attribute is type information.

RDFa syntax

RDFa is a syntax for expressing structured data in XHTML.

The rendered, hypertext data of XHTML is reused by the RDFa markup, so...

The underlying abstract representation already is RDF, so...

State of RDFa

RDFa can be parsed using GRDDL. There are Python, PHP, Java, and JavaScript implementations.

RDFa is specified:

Good News: RDFa does not interfere with existing HTML.
It does not validate, but the document remains conformant and all browsers render it just fine.

Biblio example.

Complete example and demo

Demo microformats+eRDF+RDFa+GRDDL

References.

Don't bury your data in some HTML page: when you publish a document that contains data,
do reference GRDDL profiles and/or transformations for their extraction.

GRDDL Source