[Cache version from http://www.diffuse.org/S-Web.html; partial link integrity; use the canonical document/URL if possible.]
Standards Fora List
RTD Project List
RTD Case Studies
Information Society RTD
This brief summary of some of the key points raised during presentations to the IST Semantic Web Technologies Workshop held by the Information Society Directorate of the European Commission in Luxembourg on 22nd/23rd November 2000 covers information gleaned from the following presentations:
During their welcoming remarks the conference organizers, Franco Mastroddi and Hans-Georg Stork from INFSO/D5 of the European Commission, explained that the main purpose of the workshop was to identify areas in which European research projects could help the creation of the semantic web. During 2001 INFSO/D5 will be issuing a call for proposals for research related to semantic web technologies as part of the Information Access, Filtering, Analysis and Handling (IAF) section of the Multimedia Content and Tools key action within the IST Programme. This workshop allowed potential contributors to the programme to explain how their work could impact the development of semantic web technologies.
Rudi Studer, from the University of Karlsruhe, started the keynote presentations by explaining the roles ontologies are expected to play in the development of the semantic web. The World Wide Web as it exists today is designed for the benefit of human readers. It is not optimized for machine comprehension of data.
The Extensible Markup Language (XML) provides only a partial solution to the problems of creating a semantic web. It describes the structure of documents, and provides semantically meaningful tags for the information objects contained within them, but it does not provide a semantic model for the application of the tagged data.
Ontologies provide a formal shared conceptualization of a particular domain of interest. They provides well-defined semantics relating to semantics and their relationships. Lightweight ontologies simply define a hierarchy of concepts and their relationships. Heavyweight ontologies add cardinality constraints, taxonomies of relations, reified statements and controllable levels of expressiveness. At present most tools only support the development of lightweight ontologies.
The Resource Description Format (RDF) provides a hierarchy of languages for describing relationships between classes of document objects that form a domain-specific classification vocabulary. RDF Schemas (RDFS) only provide a small set of modelling primitives. RDF does not itself define a vocabulary.
The Ontology Inference Layer (OIL) developed by the On-to-Knowledge IST project (see below) provides modelling primitives for the description of logic and frame-based ontologies that provide an extension to RDFS.
The University of Karlsruhe is developing tools for defining semantic patterns that can be used at the epistemological level to link different ontologies. They are also developing tools for the creation of heavyweight ontologies, and for the access of ontologies via the use of RDF on the Internet.
Scalable RDF repositories need to be developed for the storage and high-speed searching of RDF statements. Tools for the annotation of web documents that do not have in-built metadata or semantics, such as HTML and PDF, need to be developed.
Semantic web portals will allow communities of interest to share ontologies, and provide a single point for the identification of resources that are related to these ontologies.
The second keynote presentation, from Johan Hjelm of Ericsson, looked at why the semantic web is important to the development of wireless based information services. When looking for data on a mobile device you do not have time to do Internet searching of the type we are used to today, and will not want to retrieve large data sources, such as large documents and pictures. Much of the data that will be accessed using the wireless web will never be stored within a document. Information will be supplied from databases, which will generate deliverables that are specifically designed for use on the mobile device you are currently using and your stated preferences relating to things like language, display size, payment methods, etc.
The Composite Capability/Preference Profiles (CC/PP) protocol will allow parameterized requests to be sent to a server containing both document and service profiles that can help systems to determine which stylesheet to apply to a resource. The adoption of stylesheet-based presentation formats will be vital for the efficient delivery of data over the wireless web.
The WAP Forum User Agent Profiling (UAprof) drafting committee has developed a profile for use within CC/PP. Other relevant profiles are the OGIS Geography Markup Language (GML) for describing the position of the wireless device, and Dublin Core for recording information relating to intellectual property rights.
Dieter Fensel of the Free University of Amsterdam discussed their work in the IST On-to-Knowledge project, which provides European input to the DARPA Agent Markup Language (DAML) initiative under development in the US.
A new proposal called OntoWeb will look at how to provide an ontology-based approach to business-to-business electronic commerce.
Ian Horrocks of the University of Manchester explained the role of OIL in the development and interchange of ontologies within On-to-Knowledge and DAML. OIL provides a set of formally defined semantics for defining ontologies. Being a frame-based syntax, OIL is simple and intuitive to use. The semantics of OIL are based on description logic. OIL has been layered on top of RDFS.
OIL includes facilities for reasoning support, which allows users to define relationships between different ontologies. OilEd provides a interactive editor for creating semantics, including in-built reasoning support. A new project called WonderWeb is being proposed to extend the current work on OIL.
Carole Goble from the University of Manchester explained the role of ontologies within the Conceptual Open Hypermedia (COHSE) project for informed WWW link navigation. The COHSE project concentrates on the management of links by both authors and readers. Links are created and applied when needed ("just-in-time links"). Links should be a permanent record of relationships, rather than being something identified temporarily as the result of a specific discovery process.
A set of predefined "concepts" are used by the distributed link service to identify the metadata that needs to be assigned to a resource. Users can then use these concepts to find relevant documents. The COHSE ontologies are defined using OIL.
Jérôme Euzenat of INRIA discussed how distributed semantic webs can interwork. One solution is to use pivot languages, such as OIL, which other languages map to. Alternatively you can use transformations or patterns to identify relationships between ontologies. Where languages share the same operators a generic modular language can be defined that can be subsetted to map a generic ontology to less fully defined ontologies.
The Document Semantic Description (DSD) language allows semantics relationships to be described using the Mathematical Markup Language (MathML) and the XML Path Language (XPath) to describe the transformation required to map one set of semantics to another.
Pasqualino Assini from the University of Essex pointed out that it is not possible to define metadata descriptions for dynamically generated data, such as the results of a search query on the Internet. Middleware languages such as DCOM, CORBA and Java were originally postulated as being the way in which you would identify the relationships between sharable types of objects. This has not proved to be the case.
The University of Essex Data Archive project has extended RDF to provide a way of recording methods used to generate data dynamically. These additional objects can also be used to record the relationships between methods.
Libby Miller from the University of Bristol explained how they were using an SQL-like language for querying RDF records called SQuish. The query tool is a Java application which uses the JDBC API for integration with the database. The RDF records used by the project are based on RSS 1.0.
The CEN/ISSS Electronic Commerce workshop's working group on Defining and Managing Semantics and Datatypes for Electronic Commerce is currently evaluating existing semantic sets for business data modelling, including the ISO Basic Semantic Register, the UN/EDIFACT data elements and the ebXML core components currently being defined to make existing EDI semantics available to XML users.
ISO 13250 Topic Maps allow multiple names to be assigned to concepts, with names being identified as being relevant to specific languages, industries and user communities. They also allow sets of resources to be associated with a topic, rather than trying to define the metadata to be associated with a single resource, which is the way RDF is typically used.
Business objects typically need to have different names assigned to them in different countries, and may need to serve different roles at different times as they move through a sequence of business processes.
The Notion System demonstrated by Mr. Poell of TNO is used to classify resources as part of a multilingual semantic network. The Notion System is a distributed multi-server network. It is based on the use of inference rules that can be combined into rule-sets, which are described using metadata.
One of the main problems of automating the process of assigning concepts to resources is name ambiguity. By understanding the context in which users are working you can reduce this ambiguity.
The relevance of information typically reduces with time. Therefore you need to derive relevance levels at the time of querying rather than permanently recording them.
Intelligent agents can be developed to evaluate the accuracy of existing links and to provide new relationships to existing semantics.
Mike Dewar of NAG Ltd explained the role of the IST OpenMath project and the OpenMath Society that has been developed from it. Unlike MathML, which is very presentation oriented and only covers mathematics up to high-school level, the OpenMath standard provides for a way of exchanging, in an abstract form, all types of mathematics. Using XSL transformations you can convert MathML files into the OpenMath format.
OpenMath allows users to define their own content dictionaries, which define locally meaningful semantic sets. These semantics can be formally defined using the OpenMath syntaxes, which are expressible in XML or a compact binary encoding.
OMDoc is a new framework for defining mathematical proofs that is being developed by members of the OpenMath society.
Wolfgang Klas from the University of Vienna discussed the need to fight the multimedia content engineering (MMCE) crisis when describing multimedia. At present their are not enough people with multimedia skills, or tools for management of multimedia resources. Today's multimedia presentations are typically prepared in a way that only allows one application of the data set. They need to be split up into small units that can form a "database" of resources that can be combined in many different ways to meet the needs of different user communities.
When setting up a repository of reusable multimedia objects the types of metadata that need to be recorded for each object are different from those that need to be applied to a single application of a set of stored objects. Much more detail about the contents of each segment needs to be recorded.
It is important that content description be able to distinguish those parts of the presentation that require specific presentation facilities from those that can be used in more restricted circumstances.
Getting people to create metadata records after capture of the data is difficult to manage. It would be much easier if metadata creation was an integral part of the data creation and editing process, but integrated tools for this do not currently exist.
Lynda Hardman of CWI looked at how the Synchronized Multimedia Integration Language (SMIL) files can be integrated with the semantic web. As members of the W3C Audiovisual working group, CWI are exploring how RDF can be used to annotate SMIL and other types of multimedia files using ontologies based on OIL.
The components of a SMIL document are structured based on their temporal relationships. These components are placed into display regions, each of which can have metadata associated with them. SMIL uses links based on XML Pointers to navigate through sets of multimedia objects.
SMIL has elements specifically designed for metadata
capture built into it, but only a small set of ontologically
uncontrolled fields is currently provided. The generalized SMIL
The RFML inference rule system developed by DFKI has been used to create a Rule Markup Language (RuleML) that can be used to qualify RDF statements. This open initiative is looking for more European input to develop a standard for knowledge representation that can be integrated into the Internet.
Paolo Avesani for ITC-IRST at the University of Trento discussed the problems associated of using concepts to identify data. Different people assign different meanings to concepts that share the same name, and different names to the same concept. Terms need to be contextualized before they can be unambiguously identified.
Peter Frankhauser from GMD-IPSI discussed the role of semantic annotations in the individualization of courses to provide "teachware on demand". As with other multimedia projects, the key is to annotate small reusable fragments of information (typically a screen full of data), rather than complete compilations that form course units or presentable sequences of information.
Fragments have dependencies, which form clusters that can be serialized for presentation as a course. Annotation will be based on the MPEG-7 Multimedia Content Description Interface and an extended version of the Learning Object Metadata (LOM) ontology.
Luca Botturi of the Univerista della Svizzera Italiana discussed semantic user and content modelling for an adaptive distance learning environment. Different rules apply in different environments. For example, rules applicable for company training differ from those that need to be applied in more generalized higher education situations.
Distance learning requires self assessment and collaboration tools in addition to basic course material and mechanisms for tutors to create materials and evaluate student progress.
The C-Net content net has been developed to allow the course content to be formally modelled. C-Net includes bridges that link sets of concepts that are connected to each other. C-Net is defined as a Bayesian net.
Heidrun Allert of the University of Hannover explained their proposal for the setting up a project to study Navigation and Access to Open Learning Repositories based on cognitive sciences. The project is part of the Wallenberg Global Learning Network, which includes partners in Germany, Sweden and at Stanford University.
Different cognitive routes are needed to handle different levels of knowledge among users. The project will use RDF descriptions of learning material and LOM descriptions of student needs. The project will look at how best to present this metadata to end-users in a way that will allow them to navigate around sets of services.
For rdfpic, which is designed to be used by non-technical users to document their photographs, the fields used are taken from the Dublin Core, with additional data based on schemas defined by users that describe their specific areas of interest. The fields can be used to notate part or all of a JPEG image. The RDF annotations are sent to the W3C Jigsaw server using HTTP, from where they can be queried to find relevant photographs.
Zdenek Mikovec from the Czech Technical University described a tool they have created for pictorial content retrieval. The tool uses XML descriptions that are transformed using XSL into displayable subsets of navigable data stored within a database that can be queried using the XML Query Language (XQL). Two applications have been defined, one of which is designed to allow web users to have access to a picture repository, and the second of which was specifically designed to allow blind users to determine the meaning of pictures.
Ralph Traphoener from the tecinno GmbH division of the Bertlesmann group, discussed the use of topic maps for knowledge based indexing. Sources need to be mapped to presentable knowledge views that conform to analysis models based on cases that are instances of a knowledge model.
Atanas Kiryakov of Sirma AI Ltd explained how the OntoMap project aims to facilitate easier access, understanding and comparison of upper-level knowledge models such as WordNet and EuroWordnet. The aim is to define a higher-level model that will allow different types of ontologies to be interlinked.
David de Roure from the University of Southampton discussed to role intelligent agents would play in the deployment of semantic webs.
Mobile users need adaptive information systems that provide them with customized views of the information they need access to. The XML Link Language (XLink) will make such adaptation easier to develop. With the increase in use of mobile devices that are connected to networks on an as-needed basis, systems will need to be able to quickly adapt to new configurations without stopping.
Most web applications of agents are autonomous "weak agency" applications. They are typically personal information assistants that monitor user activities to identify patterns that the agent can then use to anticipate what that user will want to do next.
Multiagent systems are postulated on agents talking to each other using agreed ontologies. At present there are few multiagent systems, partly because there are few widely agreed ontologies. Typically such systems will need agents that can deal with multiple ontologies obtained from a distributed environment.
When links are stored separately from the documents they refer to they themselves can form a concept space.
The Southampton Framework for Agent Research (SoFAR) uses "dim" (Distributed Information Management) agents in a multiagent environment. The next generation systems will need to introduce agents that can do inferencing as part of their processing.
Stanislaw Ambroszkiewicz of the Institute of Computer Science at the Polish Academy of Sciences discussed the problems related to semantic interoperability in agent space. An agent space is a place where autonomous agents can interact. The OMG MASIF protocol provides a standard for agent interfaces.
A generic map architecture using agents based on the Java Virtual Machine called Pegaz has been developed in Poland. Tarskian set constraints provide a theory for interoperability of semantics. The Entish interface language developed in Poland contains interaction layer, a representation layer and a communication language layer.
Patrick de Causmaecker from Kaho St-Leiven discussed the possibility of setting up a project for using non-holistic agents which only deal with one type of data for semantic analysis, rather than creating general purpose agents able to cope with a wide range of problems. Non-holistic agents are designed to be exported to another environment within which they have not been supplied with specific hooks, so they first have to determine how to interface with the local applications. Such agents have to act as "learning machines".
Mikael Nilsson from the Royal Institute of Technology in Sweden discussed the role of the visual modelling of the web. The Conzilla ontology-based browser developed at the RIT Center for user-oriented information technology design (CID) allows the development of dynamically configurable topologies for learning materials. One of the problems with such systems is that the context in which resources are encountered change over time.
To create reusable content without loosing context clarity you need to separate the contextual information from the content, and then use the context to navigate around the document. By developing concept maps that provide an ontological model for contexts, users can be provided with a consistent user interface based on visual representations of the map.
Eric van der Vlist of Dyomedea introduced the RDF Site Summary (RSS), which was developed by Netscape to allow users to describe their web sites. Development of the specification has now been taken over by a set of individuals who form an editorial group. In RSS 1.0 RDF can be used to assign metadata to a site summary.
Copies of the Powerpoint presentations used by speakers at the workshop can be found at http://www.cordis.lu/ist/ka3/iaf/presentation.htm#Semantic
Martin Bryan, Technical Manager, The Diffuse Project
Report last updated:|
|The Diffuse Project is funded under the European Commission's Information Society Technologies programme. Diffuse publications are maintained by TIEKE (the Finnish IT Development Centre), IC Focus and The SGML Centre.|