GRDDL Use Cases

Editor's Draft 11 Aug 2006

This version:
Latest version:
Fabien Gandon, INRIA
B, ABC Corp.
Also see Acknowledgements.


This document collects a number of use cases together with their goals and requirements for extracting RDF data from XML documents. These are motivating use cases for GRDDL (Gleaning Resource Descriptions from Dialects of Languages) a mechanism for getting RDF data out of XML documents and in particular XHTML pages using explicitly associated transformation algorithms, typically represented in XSLT.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is an editor's draft, subject to change without notice.
This collection started in August 2006.
Please send review comments and feedback to public-grddl-wg@w3.org, the mailing list of the GRDDL Working Group; the mailing list has a public archive.




Use case #1 - Scheduling : Jane is trying to coordinate a meeting.

Jane is trying to coordinate a meeting with her friends Robin, David and Kate. They each live in separate cities but often bump into each other at different conferences throughout the year. Jane wants to find a time when all of her friends are in the same city. Robin publishes his schedule on his home page using the hCalendar microformat. David publishes his in Embedded RDF using some RDF calendar properties and Kate uses a blog engine that encodes her diary as RDFa. Jane uses an online calendaring service that publishes an RSS 1.0 feed of her schedule. Jane writes a Sparql query that includes the three web pages and his own RSS feed in the FROM clause. The query looks for dates when all four friends are in the same city. Jane runs this query using his GRDDL-aware Sparql engine that fetches each web page and uses GRDDL to extract triples from each one, combining them into a single model against which the query is evaluated. Jane is delighted to find that all four of them will be at conferences in LA at the beginning of September and he immediately starts looking for restaurants to book for their night out.

Use case #2 - Structuring wikis : MyCorp wants a wiki effectively structured and seamlessly restructured.

MyCorp wants a wiki to foster knowledge exchanges between its employees. The wiki must be effectively and easily structured therefore MyCorp decides to setup a social tagging system on wiki pages. Ideally the information structuring the wiki must be:

As a result, MyCorp designed a wiki that stores its pages directly in XHTML and RDF annotations are used to represent the wiki structure. The RDF structure allows refactoring the wiki structure by editing the RDF annotations and the RDFS schemas they are based on.

Using RDFa and GRDDL in wikis

RDF annotations are embedded in the wiki pages themselves using the RDFa. This embedded RDF is extracted using a GRDDL XSLT stylesheet available online to provide semantic annotations directly to the semantic search engine or to any other application that needs to extract the embedded metadata:

Use case #3 - XForms-based Webapps: Tom wants to extract transport semantics from an online form used to edit blog entries.

Using GRDDL for XForm extraction Tom has developed a weblog engine that utilizes XForms for editing entries remotely using the Atom Publishing Protocol. Tom has found the use of XForms for authoring fragments of Atom quite useful for a variety of reasons. In particular, the Atom Publishing Protocol's use of HTTP and POX (Plain Old XML) as the primary remote messaging mechanism allows Tom to easily author various XForm documents that use XForm submission elements to dispatch operations on web resources.

As a result, the XForms for dispatching these operations each contain a rather rich set of information about transport-level services in the form of service URLs, media-types and HTTP methods. These are completely encapsulated in an XForms submission element. It so happens that there is an RDF vocabulary for expressing transport metadata called RDF Forms.

Tom wishes to write a general GRDDL profile that extracts an RDF Form graph from the XForms submission elements employed in the various web forms for editing, deleting, and updating Atom entries on his weblog. Such a profile can uniformly extract an RDF description of the transport mechanisms for a software agent to interpret. The software agent can automatically retrieve an Introspection Document (via the Atom Publishing Protocol), update existing entries using the identified service URLs, and perform other such services without the necessity of a top-heavy web service stack to capture the service endpoints available at Tom's weblog.

Use case #4 - Aggregating data: Stephan wants a synthetic review before buying a guitar.

Using GRDDL for hReview extraction Stephan wishes to buy a guitar, so decides to check reviews. There are various special interest publications online which feature musical instrument reviews. There are also blogs which contain reviews by individuals. Among the reviewers there may be friends of Stephan, people whose opinion Stephan values (e.g. well-known musicians and people whose reviews Stephan has found useful in the past). There may also be reviews planted by instrument manufacturers which offer very biased views.

Stephan visits a site offering a review service and enters his preference for guitar reviews which gave a high rating for the instrument. This initial request is answered with a list of all the relevant review titles/summaries together with information about the reviewers.

From this list Stephan chooses only the reviewers he trusts, and on submitting these preferences is finally presented with a set of full reviews which match his criteria."

Reviews published using hReview microformat can be discovered using existing search services. The documents can be GRDDL'd into RDF and aggregated together in a store. Information about the reviewers can also be aggregated from various sources including hCard and XFN microformats and autodiscovered FOAF profiles (perhaps also a scutter from Stephan's own profile). The filtering may be achieved by running SPARQL queries against the aggregated data, presented to the user through regular HTML form interfaces.

Use case #5 - Querying web sites: MyVendor wants to ensure that its online catalogue can be used to answer structured queries.

Use SPARQL to directly query RDF embedded in XHTML documents of the catalog.

Use case #6 - Select items in a page: Linda wants to extract one meeting from a schedule of meetings.

Putting the RDF inline, so you get, for example, a calendar icon next to calendar items. One way to do that would be if GRDDL transformations would output XHTML with RDFa. Suppose I want just one meeting from a schedule of meetings; this suggests a way to get the RDF data that corresponds to just one part of the page.

Use case #7 - Digital libraries and focused indexes: W3C wants to automate the publication of Technical Reports.

Using GRDDL for W3C TR The most visible part of W3C work is its Technical Reports published by the working groups. These reports are published following a well-defined process. TR Automation is a project based on the use of Semantic Web technologies to allow W3C to streamline the publication paper trail of W3C Technical Reports, to maintain an RDF-formalized index of these specifications and to create a number of tools using these newly available data. This project includes the following deliverables:

These deliverables all rely on an shared XSLT stylesheet to extract metadata about Technical Reports in RDF (GRDDL).

Use case #8 - Drive document transformation: Franck wants to type sections of his handouts to dynamically generate material for the practical sessions of his students.

Franck is a lecturer interested in e-Learning and he always saves his handouts as web pages available on the web site of his university. He found that every year there is a lot of redundancy between the handouts for his lectures, the material for practical sections, etc.

Ideally he would like to type sections of his handout and then use these annotations to propose dynamic view supporting: slideshows for his lectures, handouts, support for practical sessions, interfaces for students to revise and tests themselves, etc.

Using RDFa and GRDDL in elearning

What Franck imagines is editing at the same time both the data (i.e. the handout) and the metadata (i.e. the annotation) for instance by linking the style of a title to a pedagogical notion (e.g. a definition, an example, a counter-example, an exercise, etc.).

A possible solution for Franck is to include RDFa data in the XHTML file of his handout. Then using GRDDL applications can build precise indexes of available elements and extract them to generate dynamic views (e.g. extract all definitions to generate a synthetic revision list).

Use case #9 - Clinical data: Anna wants to identify a patient population.

Imagine a fellow assigned to determine a search criteria to identify a patient population for a particular study. He might have a set of classifications specific to the study he could express as logical rules (N3 rules). Then, he could write a client (that understood GRDDL) that speculatively picked a few patient records at random from a remote server (as XML documents) each of which would be associated (by GRDDL profile) to a transform to extract the clinical data as RDF (expressed in a universally supported vocabulary for CPR - such as the HL7 OWL ontology that Helen Chen from Agfa has been working on) and ask his speculative questions of the resulting RDF graph.

Or (to take the scenario a step further), apply the study specific rules on the resulting RDF to classify the patient data according to his domain of interest (specific diagnoses, pathological observations, etc.)

Use case #X: live bookmarks


[Automating TR]
Automating the publication of Technical Reports, Dominique Hazaël-Massieux, 2006/01/05 20:34:13, http://www.w3.org/2002/01/tr-automation/.
[GRDDL Draft]
Gleaning Resource Descriptions from Dialects of Languages (GRDDL), Dominique Hazaël-Massieux, Dan Connolly, Authors'draft, 2006/03/09 15:45:31, http://www.w3.org/2004/01/rdxh/spec. Latest version available at http://www.w3.org/TR/grddl/.
[OWL Overview]
OWL Web Ontology Language Overview, Deborah L. McGuinness and Frank van Harmelen, Editors, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-owl-features-20040210/. Latest version available at http://www.w3.org/TR/owl-features/.
Resource Description Framework (RDF) Model and Syntax Specification, Ora Lassila, Ralph R. Swick, Editors. World Wide Web Consortium Recommendation, 1999,
Latest version available at http://www.w3.org/TR/REC-rdf-syntax/.
RDF/A Syntax, A collection of attributes for layering RDF on XML languages, Mark Birbeck, Steven Pemberton, Ben Adida, Editors. Editor's Draft 27 October 2005,
RDF Vocabulary Description Language 1.0: RDF Schema, Dan Brickley and R.V. Guha, Editors. W3C Recommendation, 10 February 2004,
http://www.w3.org/TR/2004/REC-rdf-schema-20040210/ .
Latest version available at http://www.w3.org/TR/rdf-schema/.
SPARQL Query Language for RDF, Eric Prud'hommeaux and Andy Seaborne, Editors. W3C Candidate Recommendation 6 April 2006,
http://www.w3.org/TR/2006/CR-rdf-sparql-query-20060406/ .
Latest version available at http://www.w3.org/TR/rdf-sparql-query/.


The editor would like to thank the following Working Group members for their contributions to this document: A, B ,C .

This document is a product of the GRDDL Working Group.

Valid XHTML 1.1 Valid CSS!