INRIA logo
Cover page images (INRIA Logo)

GRDDL Use Cases: Scenarios of extracting RDF data from XML documents

acacia Fabien GANDON, Fabien.Gandon@sophia.inria.fr

Introduction: Data and Documents

Biblio example.

Scheduling

Jane is trying to coordinate a meeting.

Jane is trying to coordinate a meeting with friends. She uses GRDDL to extract data from each of their calendar pages and combine it in a single model. She then writes a query to filter the events down to those dates when all of them are in the same city.

calendars Data in RDF SPARQL

Health Care

Kayode wants to query clinical data.

patient files Data in RDF Schemas Reports
Kayode uses a single-purpose XML vocabulary as the main representation format for a computer-based patient record. He uses GRDDL to be able to query these records both in their XML vocabulary and as RDF, without managing a dual representation.

Aggregating data

Stephan wants a synthetic review before buying a guitar.

reviews Data in RDF query
Stephan wishes to buy a guitar and visits a site offering a review service. He uses GRDDL to aggregate reviews and profiles of the reviewers in order to select the reviews he can trust.

Querying sites and digital libraries

DC4Plus Corp. wants to automate the publication of its electronic documents.

documents Data in RDF reports and indexes
Adeline designs a system to allow here company to streamline the publication of Technical Reports. The system relies on shared templates for publishing documents and a GRDDL transformation to build an up-to-date RDF index used to create an authoritative repository..

Wikis and e-learning

The Technical University of Marcilly decided to use wikis to foster knowledge exchanges between lecturers and students.

The Technical University of Marcilly decides to use a wiki with metadata embedded in its pages to tag, structure, navigate and query the resources of the wiki. GRDDL is used to extract these metadata as RDF to feed the different tools of the system.

wiki pages Data in RDF schemas Sparql

Web syndication

Extracting form descriptions to push entries to Voltaire's blog.

documents Data in RDF reports and indexes reports and indexes
Voltaire has setup a weblog engine that utilizes XForms for editing entries. He also provides a GRDDL transformation that extracts an RDF description of the XForms that other client applications can use to update existing entries using the identified service URIs, and perform other such services.

Validated Documents

the OAI would like to be able to specify document licenses in the schema they share.

The Open Archives Initiative (OAI) publishes an XML schema that universities can use to publish their archived documents. This schema also identifies a GRDDL transform to apply to all its instance documents in order to extract their Creative Commons license.

wiki pages Data in RDF schemas Sparql Sparql

Pulling Data from the Web

Steffen wants to build a directory of the people he works with.

Whenever he gets in touch with someone, Steffen starts a simple script that aims at gathering as much metadata about this person as possible. Because most of these web pages are not even valid HTML, the script calls an HTML-tidying tool and if the tidying is complex some of the metadata is likely to be no longer coherent.

documents Data in RDF reports and indexes reports and indexes

Pushing a transformation

Oceanic Consortium wants to provide transformations for their files without altering them or their schema.

Oceanic wishes to also publish RDF descriptions of their parts reusing the AirPartML documents produced for an arrangement with a consortium of airlines. The AirPartML schemas are strict and therefore Oceanic cannot alter their XML documents to specify a transformation. Yet using the HTTP Headers, Oceanic can specify link and profiles for transformation when serving their AirPartML documents.

documents Header tells its GRDDL source Get transforms Data in RDF

References.