SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors
NEWS
Cover Stories
Articles & Papers
Press Releases
CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG
TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps
EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
|
News: Cover Stories | | |
W3C GRDDL Recommendation Bridges HTML/Microformats and the Semantic Web. |
Contents
The World Wide Web Consortium has announced the publication of Gleaning Resource Descriptions from Dialects of Languages (GRDDL) as a W3C Recommendation, together with a separate GRDDL Test Cases Recommendation. The GRDDL specification represents "an important link between Semantic Web and microformats communities. With GRDDL (pronounced 'griddle'), software can automatically extract information from structured Web pages to make it part of the Semantic Web. Those accustomed to expressing structured data with microformats in XHTML can thus increase the value of their existing data by porting it to the Semantic Web, at very low cost."
Tim Berners-Lee, W3C Director compared GRDDL to style specifications: "Sometimes one line of code can make a world of difference. Just as stylesheets make Web pages more readable to people, GRDDL makes Web pages, microformat tags, XML documents, and data more readable to Semantic Web applications, opening more data to new possibilities and creative reuse."
The GRDDL specification "introduces markup based on existing standards for declaring that an XML document includes data compatible with the Resource Description Framework (RDF) and for linking to algorithms (typically represented in XSLT), for extracting this data from the document.
The markup includes a namespace-qualified attribute for use in general-purpose XML documents and a profile-qualified link relationship for use in valid XHTML documents. The GRDDL mechanism also allows an XML namespace document (or XHTML profile document) to declare that every document associated with that namespace (or profile) includes gleanable data and for linking to an algorithm for gleaning the data.
GRDDL "provides a relatively inexpensive set of mechanisms for bootstrapping RDF content from uniform XML dialects in such a way as to shift the burden of formulating RDF to transformation algorithms written specifically for these dialects. XML Transformation languages such as XSLT are quite versatile in their ability to process, manipulate, and generate XML and the use of XSLT to generate XHTML from single-purpose XML vocabularies is historically celebrated as a powerful idiom for separating structured content from presentation.
GRDDL shifts this idiom to a different end: separating structured content from its authoritative meaning (or semantics). The way in which GRDDL empowers authors of web content can be considered somewhat analogous to allowing a non-native speaker to learn the spoken form of a new language first, before attempting to master its written form — rather than trying to learn both simultaneously.
GRDDL works through associating transformations with an individual document either through direct inclusion of references or indirectly through profile documents. Content authors can nominate the transformations for producing RDF from their content and use GRDDL to refer to them. For XML formats the transformations are commonly expressed using XSLT 1.0, although other methods are permissible. Generally, if the transformation can be fully expressed in XSLT 1.0 then it is preferable to use that format since all GRDDL processors should be capable of interpreting an XSLT 1.0 document."
The GRDDL specification "is a concise technical specification of the GRDDL mechanism and its XML syntax. It specifies the GRDDL syntax to use in valid XHTML and well-formed XML documents, as well as how to encode GRDDL into namespaces and HTML profiles. Discussions of the GRDDL transformation link and security issues are also covered. Appendices provide links to extended examples and existing software and services that employ GRDDL."
Supporting documents in the GRDDL release include GRDDL Use Cases: Scenarios of Extracting RDF Data from XML Documents, GRDDL Primer, and a GRDDL Implementation Report.
Testimonials for the issuance of GRDDL as a W3C Recommendation have been supplied by several industry groups, including DCMI, INRIA, microformats.org, OpenLink Software, and Talis Group Ltd.
Recommendation documents:
Gleaning Resource Descriptions from Dialects of Languages (GRDDL). W3C Recommendation. 11-September-2007. Edited by Dan Connolly. With contributions from Ann Navarro, Lee Jonas, Joseph Reagle, Tim Berners-Lee, Dominique Hazaäl-Massieux, Norm Walsh, Noah Mendelsohn, Ben Adida, Murray Maloney, Brian McBride, Ian Davis, Harry Halpin, Jeremy Carroll, Chimezie Ogbuji, Fabien Gandon, Brian Suda, and Rachel Yager. This Version URI: http://www.w3.org/TR/2007/REC-grddl-20070911/. Latest Version URI: http://www.w3.org/TR/grddl/. Previous Version URI: http://www.w3.org/TR/2007/PR-grddl-20070716/.
GRDDL Test Cases. W3C Recommendation. 11-September-2007. Edited by Chimezie Ogbuji (Cleveland Clinic Foundation). Contributions from: Dan Connolly (W3C), Danny Ayers (Independent), Jeremy J. Carroll (Hewlett Packard), Brian McBride Hewlett Packard), Fabien Gandon (INRIA), John Clark (Cleveland Clinic Foundation), Dominique Hazaël-Massieux (W3C), and Harry Halpin (University of Edinburgh). This version URI: http://www.w3.org/TR/2007/REC-grddl-tests-20070911/. Latest version URI: http://www.w3.org/TR/grddl-tests/. Previous version URIs: http://www.w3.org/TR/2007/PR-grddl-tests-20070716/.
Supporting documents:
GRDDL Use Cases: Scenarios of Extracting RDF Data from XML Documents." W3C Working Group Note. 6-April-2007. "The GRDDL Use Cases document collects a number of use cases with their goals and requirements for GRDDL. These use cases also illustrate how XML and XHTML documents can be decorated with microformats, Embedded RDF or RDFa statements to support GRDDL transformations in charge of extracting valuable data that can then be used to automate a variety of tasks."
GRDDL Primer. W3C Working Draft. 2-October-2006. Edited by Ian Davis (Talis). With contributions from Brian Suda, Fabien Gandon (INRIA), and Chimezie Ogbuji (Cleveland Clinic Foundation). This version URI: http://www.w3.org/TR/2006/WD-grddl-primer-20061002/. Latest version URI: http://www.w3.org/TR/grddl-primer/. "The GRDDL Primer is a step-by-step tutorial on the GRDDL mechanism. It develops a number of examples from the GRDDL Use Cases document to illustrate GRDDL techniques for associating documents with transformations for extracting RDF."
GRDDL Implementation Report. Maintained by Dan Connolly and John Clark for the GRDDL WG. The Working Group's implementation report demonstrates that the goals for interoperable implementations, set in the May 2007 Candidate Recommendation draft of this document, were achieved. The relationship between the features of GRDDL and the normative tests is given in appendix A Test Coverage of GRDDL Test Cases. Test results in EARL/RDF are available from four implementations: (1) Jena 2007-06-25 update; (2) Raptor 1.4.16 pre-release Subversion r12368 2007-06-29 update; (3) GRDDL.py 2007-06-26 update; (4) OpenLink Virtuoso Sponger. The criteria in the May 2007 Candidate Recommendation draft of the GRDDL specification were achieved as follows: [1] There are at least two interoperable implementations of the specification that each implement all the features of the specification, as detailed above. This is determined by passing the approved tests in the GRDDL Test Cases. [2] A minimum of four weeks of the CR period has elapsed. [3] GRDDL is deployed in popular forums such as the XTech 2006 website and the Semantic Technologies website calendars.
Problem statement: "There are many domain-specific languages ('dialects') used in practice among the many XML documents on the web. There are dialects of XHTML, XML and RDF that are used to represent everything from poetry to prose, purchase orders to invoices, spreadsheets to databases, schemas to scripts, and linked lists to ontologies.
While this breadth of expression is quite liberating, inspiring new dialects to represent information, it can be a barrier to understanding across different domains or fields. For example, how does software discover the author of a poem, a spreadsheet and an ontology? And how can software determine whether authors of each are in fact the same? [...] The same musical work might be described in different XML dialects, [using different markup] encodings of the same information, but there remains no clear mechanism through which computer software might be able to determine this connection.
The Resource Description Framework (RDF) provides a standard for making statements about resources in the form of a subject-predicate-object expression; the entities (subject and object resources) and relationships (predicates) are identified using unambiguous URIs...
[One option is representation using FOAF-namespaced RDF/XML], but markup could also provide the same data in RDF using RDF/XML or one of the other RDF syntaxes. GRDDL provides a relatively inexpensive mechanism for bootstrapping RDF content from uniform XML dialects, shifting the burden from formulating RDF to creating transformation algorithms specifically for each dialect.
GRDDL works by associating transformations for an individual document, either through direct inclusion of references or indirectly through profile and namespace documents. Content authors can nominate the transformations for producing RDF from their content and use GRDDL to refer to them.
By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.
Likewise, by specifying a GRDDL namespace transformation or profile transformation, the creator of that namespace or profile states that the transformation will provide a faithful RDF rendition of a class of source documents which relate to that namespace or profile. A namespace document or a profile document also provide a means for their authors to explain in prose the purpose of the transformation or any policy statements..."
Adding GRDDL to well-formed XML: "The general form of associating a GRDDL transformation link with a well-formed XML document is adding to the root element a grddl namespace declaration and a grddl:transformation attribute whose value is an IRI reference, or list of IRI references, that refer to executable scripts or programs which are expected to transform the source document into RDF. This method is suitable for use with any XML dialects that can accomodate an extra namespace-qualified attribute on the root element. There are other ways to add GRDDL to HTML documents, especially designed to leverage HTML's existing capabilities and thereby overcome constraints imposed by the XML DTDs for some dialects of HTML."
Using GRDDL with XML Namespace Documents: "Transformations can be associated not only with individual documents but also with whole dialects that share an XML namespace. Any resource available for retrieval from a namespace URI is a namespace document. For example, a namespace document may have an XML Schema representation or an RDF Schema representation, or perhaps both, using content negotiation. To associate a GRDDL transformation with a whole dialect, include a grddl:namespaceTransformation property in a GRDDL result of the namespace document..."
Note: Henry S. Thompson (HCRC Language Technology Group, University of Edinburgh) prepared a new GRDDL-enabled namespace document for the XML Schema namespace, along with q representation of the corresponding RDF. The NS document contains a directory of links to these related resources using Resource Directory Description Language. It is 'GRDDL-enabled' in that a GRDDL-compliant processor can extract useful RDF representations of the information.
Using GRDDL with valid XHTML: "To accomodate the DTD-based syntax of XHTML, which precludes using attributes from foreign namespaces, we use GRDDL Data Views as a metadata profile. The general form of adding a GRDDL assertion to a valid XHTML document is by specifying the GRDDL profile in the profile attribute of the head element, and transformation as the value of the rel attribute of a link or a element whose href attribute value is an IRI reference that refers to an executable script or program which is expected to transform the source document into RDF. This method is suitable for use with valid XHTML documents which are constrained by an XML DTD.
GRDDL for HTML Profiles: "XHTML provides the profile mechanism to link to the meaning of properties and the set of legal values for those properties. As with namespace documents, a profile document can effectively be written using XHTML with embedded RDF statements and a GRDDL transformation to extract the definition of terms that are applicable. Those terms can then be used in an XHTML document to convey profile-dependent meaning. As discussed in Using GRDDL with valid XHTML, the GRDDL profile can be used with XHTML documents to apply GRDDL semantics over link elements where the value of rel attribute is transformation. This very powerful and flexible mechanism integrates well with microformat profiles which overlay the normally semantically-poor HTML markup..."
Today, the World Wide Web Consortium completed an important link between Semantic Web and microformats communities. With Gleaning Resource Descriptions from Dialects of Languages, or GRDDL (pronounced "griddle"), software can automatically extract information from structured Web pages to make it part of the Semantic Web. Those accustomed to expressing structured data with microformats in XHTML can thus increase the value of their existing data by porting it to the Semantic Web, at very low cost.
Getting Data Into and Out of the Web: How Is It Happening Today?
One aspect of recent developments some people call "Web 2.0" involves applications based on combining — in "mashups" — various types of data that are spread all around on the Web. A number of active communities innovating on the Web share the goal of sharing data such as calendar information, contact information, and geopositioning information. These communities have developed diverse social practices and technologies that satisfy their particular needs. For instance, search engines have had great success using statistical methods while people who share photos have found it useful to tag their photos manually with short text labels. Much of this work can be captured via "microformats". Microformats refer to sets of simple, open data formats built upon existing and widely adopted standards, including HTML, CSS, and XML.
This wave of activity has direct connections to the essence of the Semantic Web. The Semantic Web-based communities have pursued ways to improve the quality and availability of data on the Web, making it possible for more intensive data-integration and more diverse applications that can scale to the size of the Web and allow even more powerful mashups. The Web-based set of standards that supports this work is known as the Semantic Web stack. The foundations of the Semantic Web stack meet the requirements for formality of some applications such as managing bank statements, or combining volumes of medical data.
Each approach to "getting your data out there" has its place. But why limit yourself to just one approach if you can benefit, at low cost, from more than one? As microformats users consider more uses that require data modelling, or validation, how can they take advantage of their existing data in more formal applications?
A Bridge from Flexible Web Applications to the Semantic Web
GRDDL is the bridge for turning data expressed in an XML format (such as XHTML) into Semantic Web data. With GRDDL, authors transform the data they wish to share into a format that can be used and transformed again for more rigorous applications.
GRDDL Use Cases provides insight into why this is useful through a number of real-world scenarios, including scheduling a meeting, comparing information from various retailers before making a purchase, and extracting information from wikis to facilitate e-learning. Once data is part of the Semantic Web, it can be merged with other data (for example, from a relational database, similarly exposed to the Semantic Web) for queries, inferences, and conversion to other formats.
The Working Group has reported on implementation experience, and its members have come forward with statements of support and commitments to implement GRDDL.
GRDDL Test Cases is also published today, which describes and includes test cases for software agents to support GRDDL. The Working Group has produced a GRDDL service that allows users to input a GRDDL'd file and extract the important data.
Testimonials for GRDDL
These testimonials are in support of W3C's issuance of GRDDL as a W3C Recommendation, in English (DCMI, INRIA , microformats.org, OpenLink Software, Talis Group Ltd.) and in French (INRIA).
The Dublin Core Metadata Initiative congratulates the W3C on the finalization of GRDDL and welcomes it as an important addition to the Web metadata infrastructure.
GRDDL is an essential tool in bridging the various expressions of Dublin Core metadata, and DCMI is creating GRDDL transforms that expose Dublin Core metadata expressed in XML and HTML to the Semantic Web.
By standardizing the transformation mechanisms, GRDDL allows for syntactic choices while enabling semantic interoperability — both important needs in the metadata community — and as such is fundamental to the future evolution of the Web.
— Mikael Nilsson and Thomas Baker, DCMI Architecture Forum, Dublin Core Metadata Initiative
INRIA is proud to have contributed to the specification and design of GRDDL and is already promoting and integrating it in several projects and tools. Bridging the gap between the traditional Web and the Semantic Web is a seminal step in the deployment of semantic web technologies and applications. By allowing applications to automatically glean resources from the wealth of XML documents available online, this recommendation is opening a new highway for knowledge mashups and composition of application through web resources.
— Pierre Paradinas, Head of Technological Development, INRIA
Microformats provide an easy way for many people to contribute semantic data to the web. With GRDDL all of that data is made available for RDF Semantic Web tools. Microformats and GRDDL can work together to build a better web.
— Ryan King, an active member of microformats.org community
GRDDL is one of several initiatives from the W3C that seeks to unobtrusively evolve the current Web of Documents to a Web of interlinked Data.
— Kingsley Idehen, CEO, OpenLink Software
Talis believes that GRDDL represents one of the most important steps along the road to the Semantic Web. It provides a very simple yet extraordinarily powerful mechanism to uplift documents into the web of data. Talis intends to fully support GRDDL in our Semantic Web Platform, allowing our customers to automatically extract searchable RDF metadata from their existing content with very little effort.
— Ian Davis, CTO, Talis Group Ltd.
L'INRIA est fier d'avoir contribué aux spécifications et à la conception de GRDDL et intègre déjà cette technologie dans plusieurs projets et outils. Créer des passerelles entre le Web traditionnel et le Web sémantique est une étape critique dans le déploiement des technologies et des applications du Web sémantique. En permettant à des applications d'extraire automatiquement des données de toute la variété de documents XML accessibles en ligne, cette recommandation ouvre une nouvelle voie pour l'intégration de connaissances et la composition d'applications à travers les ressources du Web.
— Pierre Paradinas, Directeur du Développement Technologique, INRIA
- Announcement:
- Announcement 2007-09-11: "W3C Completes Bridge Between HTML/Microformats and Semantic Web. GRDDL Gives Web Content Hooks to Powerful Reuse and Data Integration." Source in English.
- Testimonials. Statements in support of W3C's issuance of GRDDL as a W3C Recommendation.
- W3C News Item
- Specification Release:
- GRDDL Working Group:
- Other GRDDL Resources:
- GRDDL Service: Extracting RDF from XHTML/XML Using GRDDL. This service uses a Python CGI script that takes an HTTP URI and returns the RDF/XML extracted from it through GRDDL, built on top of python-librdf. One supplies the URI address of the document from which RDF is to be extracted; options include RDF/XML as application/rdf+xml, RDF/XML as text/xml, Turtle as application/x-turtle, Turtle as text/rdf+n3, Turtle as text/plain text.
- Demonstration of an RDF in XHTML processor. This system was a (non-fully compliant) XSLT-based implementation of GRDDL, first presented to the RDF in XHTML Task Force on November 24, 2003, and substantially updated since. It has been superseded by a more robust GRDDL service.
- GRDDL Implementations. Last edited 2007-07-20 or later.
- Profile for latest GRDDL transformation for RDFa. By Fabien Gandon. "This document is a profile for GRDDL source documents using the latest GRDDL RDFa transform. The use of this profile licenses RDF data extracted by 2007/09/12/RDFa2RDFXML.xsl from an RDFa source; available as freeware and under the non viral open-source licence CeCILL-C."
- Custom RDF Dialects. "RDF has a standard XML syntax which has nice merging properties but isn't so nice for authors and doesn't integrate well with DTDs, XML Schemas, and the like. GRDDL is a technique for using XML/XHTML dialects (especially microformats) as custom RDF syntaxes by having each document point, directly or indirectly, to a transformation to an RDF graph. RDFa is a design for mixing RDF syntax into HTML. GRDDL accomodates a wider variety of dialects at the expense of asking consumers to execute potentially untrusted code. RDFa allows one parser to work for data from a variety of domains and provides a direct relationship between the RDF data and the HTML document structure."
- triplr, by Dave Beckett. Stuff in, triples out. The output-format is one of html, json, ntriples, rdf, rss or turtle.
- Jena GRDDL Reader. Where: Jena is a Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. The Jena GRDDL Reader is an implementation of GRDDL for the Jena Semantic Web Framework, using the Saxon XSLT Processor from Saxonica Limited.
- GRDDL Data Views: Getting Started, Learning More. Updated 2007/08/24 or later.
- GRDDL Quick Reference. Prepared by Danny Ayers.
- W3C Online GRDDL Service. By Danny Ayers. Blog. 2007-09-05. Don't think I've blogged this properly before, but thanks to Dom and Dave, there's now an up-to-date GRDDL service. It supports "inline" translation - a GET on a combined URI - so it's ideal for use with other RESTful services. As a simple example, point it at the GRDDL group weekly agenda, and with the Tabulator you can browse through...
- Embedded RDF HTML Profile. "This stylesheet was created by Ian Davis and is in the public domain. A GRDDL-capable client tool, when visiting a page which contains the profile reference http://purl.org/NET/erdf/profile will first visit this profile document and apply the transformation declared here. The results of that transformation will contain a statement identifying the transformation that should be applied to convert the originating document into RDF/XML. The GRDDL client has everything it needs to get RDF/XML from the original XHTML document, while the author of the originating document need not concern themselves with the details of the profile, merely including the reference to its URI is enough."
- GRDDL-enabled namespace document for the XML Schema namespace
- Press and commentary:
- General references:
|
| Receive daily news updates from Managing Editor, Robin Cover.
|
|