Cover Pages: XML Daily Newslink: Thursday, 15 March 2007

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
SAP AG http://www.sap.com

Headlines

A Relational View of the Semantic Web
Implementation of Business Rules and Business Processes in SOA
The Intrusion Detection Message Exchange Format (IDMEF)
National BIM Standard Version 1.0 Part 1: Out for Industry Review
XML Pipelining with Chunks for the Information Registry Information
IBM Puts Its SOA Where Its Virtualization Is
Tutorial: Specializing DITA Conditional Attributes
Update: Google to Make Search Logs Anonymous

A Relational View of the Semantic Web
Andrew Newman, XML.com

As people are increasingly coming to believe, Web 2.0 and the Semantic Web have a lot in common: both are concerned with allowing communities to share and reuse data. In this way, the Semantic Web and Web 2.0 can both be seen as attempts at providing data integration and presenting a web of data or information space. As Tim Berners-Lee wrote in 'Weaving the Web': "If HTML and the Web made all the online documents look like on huge book, RDF, schema and inference languages will make all the data in the world look like one huge database..." RDF is at the core of W3C's Semantic Web architectural layers. It is the standard specifically designed to provide a way to produce and consume data on the Web. It sits on top of standards such as XML, URIs, and Unicode and is used as a basis for schemas and ontologies. It consists of a set of statements that are composed of a subject, predicate, and object that form propositions of fact. How are queries performed on this "one huge database"? Up until recently, manipulating or retrieving RDF data has been done through vendor specific query languages or imperatively through APIs in languages such as Java, PHP, and Ruby. The W3C's proposed standard, SPARQL, is set to provide a declarative language to query and manipulate Semantic Web data. The current suite of existing technologies, such as SQL and the relational model, were devised without the specific requirements of disparate, uncontrolled, large-scale integration. It is unclear whether they are flexible enough to adapt to these new set of requirements in order to enable this idea of a global database. One of the goals of the Semantic Web is to be able to achieve querying of disparate data sources across the web. The proposed standard for querying the Semantic Web, SPARQL, can be seen as an extension of an existing formalization, the relational model. The use of the relational model provides a way to use previous work in query distribution, optimization, and formulation. The standard relational model is not sufficient, however, and must be extended to support untyped relations and operations in order to integrate these data sources.

Implementation of Business Rules and Business Processes in SOA
Boris Lublinsky and Didier Le Tien, InfoQ

In this article we outline commonalities and differences between business rules and business processes and present some guidelines on positioning business rules in SOA implementation and appropriate usage of each technology. Business rules can be viewed as a collection of business practices, defining the actual implementations -- business logic. Implementation of such logic can often be simplified through the usage of specialized tooling—business rules languages and business rules engines. A rules language is a domain specific language, containing constructs for defining of the business rules. These constructs can vary greatly depending on the business requirements. The scope of possibility ranges from a textual description (using a rule specific language or plain English) to the use of decision tables or decision trees. Service orchestration typically deals with long-running and asynchronous invocation of external activities/services. Today's business rules engines do not support these capabilities. Business process language/engine, designed specifically to define and execute long-running coordinations with asynchronous invocations is the most appropriate paradigm for the service orchestration. A prevalent pattern in SOA implementation is employment of rules engine as part of service implementation and the use of business process engines for service orchestration. In cases when rules control activities whose coordination is complex enough, or change much faster the process itself, so that they require usage of the rules engines, these rules are usually externalized as a special rules service, invoked by business process engine. Because this service invocation can be potentially expensive (network calls) some of the business process engines, for example Biztalk 2004 from Microsoft, WebSphere Process Server from IBM, Smart BPM Suite from PegaSystems and others incorporate both business process and rules engine in a single application. Business process and business rules should not be treated as competing but rather as complementary technologies. Business processes define a set of activities that need to be performed in order to perform a business functions. Business rules provide a value add approach to implement those activities by offering a superior level of flexibility and configurability to adapt to rapidly changing business environments.

The Intrusion Detection Message Exchange Format (IDMEF)
Herve Debar (et al., eds), Experimental RFC

The IETF announced that a new Request for Comments is now available online in the RFC libraries. The memo defines an Experimental Protocol for the Internet community. The purpose of the Intrusion Detection Message Exchange Format (IDMEF) is to define data formats and exchange procedures for sharing information of interest to intrusion detection and response systems and to the management systems that may need to interact with them. This document describes a data model to represent information exported by intrusion detection systems and explains the rationale for using this model. An implementation of the data model in the Extensible Markup Language (XML) is presented, an XML Document Type Definition is developed, and examples are provided. The most obvious place to implement the IDMEF is in the data channel between an intrusion detection analyzer (or "sensor") and the manager (or "console") to which it sends alarms. The diversity of uses for the IDMEF needs to be considered when selecting its method of implementation. The IDMEF data model is an object-oriented representation of the alert data sent to intrusion detection managers by intrusion detection analyzers. The data model defines extensions to the basic Document Type Definition (DTD) that allow carrying both simple and complex alerts. Extensions are accomplished through subclassing or association of new classes.

National BIM Standard Version 1.0 Part 1: Out for Industry Review
Staff, National Institute of Building Sciences (NIBS)

The first version of the National Building Information Modeling Standard (NBIMS) was released for a two month industry review period today. The document titled "National Building Information Modeling Standard Version 1.0—Part 1: Overview, Principles, and Methodologies", issued through the buildingSMART Alliance, provides the capital facilities industry with its first comprehensive look at the full scope of requirements for Building Information Modeling (BIM). The review period will span from March 12, 2007, until May 21, 2007; comments may be submitted through a comment forum that has been set up by the Open Geospatial Consortium (OGC) to receive your comments and initiate discussion. . The NBIMS will provide the diverse capital facilities ption through design and construction, even past demolition for improved operations, maintenance, facility management, and long-term sustainability. The document was assembled by over thirty subject matter experts from across the capital facilities industry. The NBIMS has six goals: (1) Seek industry wide agreement, (2) Develop an open and shared standard, (3) Facilitate discovery and requirements for sharing information throughout the facility lifecycle, (4) Develop and distribute knowledge that helps share information that is machine readable, (5) Define a minimum BIM, and (6) Provide for information assurance for BIM throughout the facility lifecycle... One of the critical functions of a BIM is to consistently maintain the semantic meaning of all encoded information throughout the facilities lifecycle. NBIMS is using the IFC [International Alliance for Interoperability, Industry Foundation Classes, IFC2x Edition 3—ifcXML and EXPRESS-G] data model of buildings as the data model for encoding information exchange and sharing because it constitutes an interoperability enabling technology that is open, freely available, non-proprietary and extensible, and is also applicable throughout the life of a facility. The IFC data model consists of definitions, rules and protocols that uniquely define data sets which describe capital facilities throughout their lifecycle. These definitions allow industry software developers to write IFC interfaces to their software that enable exchange and sharing of the same data in the same format with other software applications, regardless of individual software application's internal data structure. The first version of the IFC data model was released in 1997; the latest release is IFC2x3. XML-based implementations of the IFC data model are available as ifcXML; the latest published version is ifcXML2, the implementation of IFC2x2... IFC provides the only available data definitions, rules and protocols for populating any open standards based BIM. Thus, by definition, an open standards based BIM is an IFC-based BIM.

See also: IFC/ifcXML Specifications

XML Pipelining with Chunks for the Information Registry Information
Andrew L. Newton (ed), IETF Internet Draft

The IESG announced the approval of an Internet Draft as an IETF Proposed Standard: "XML Pipelining with Chunks for the Information Registry Information." The specification was produced by members of the IETF Cross Registry Information Service Protocol (CRISP) Working Group. Using S-NAPTR [Domain-Based Application Service Location Using SRV RRs and the Dynamic Delegation Discovery Service (DDDS)], IRIS has the ability to define the use of multiple application transports (or transfer protocols) for different types of registry services, all at the descretion of the server operator. The TCP transfer protocol defined in this document is completely modular and may be used by any registry types. This transfer protocol defines simple framing for sending XML in chunks so that XML fragments may be acted upon (or pipelined) before the reception of the entire XML instance. This document calls this XML pipelining with chunks (XPC) and its use with IRIS as IRIS-XPC. A second CRISP Working Group document, "A Common Schema for Internet Registry Information Service Transfer Protocols," has also been approved. It defines an XML Schema for use by Internet Registry Information Service (IRIS) application transfer protocols that share common characteristics. It describes common information about the transfer protocol, such as version, supported extensions, and supported security mechanisms. As an example, upon initiation of a connection, a server may send version information informing the client the data models supported by the server and the security mechanims supported by the server. The client may then respond appropriately. For example, the client may not recognize any of the data models supported by the server, and thefore close the connection. On the other hand, the client may recognize the data models and the security mechanisms and begin the procedure to initialize a security mechanism with the server before proceeding to query data according to a recognized data model. Both LWZ and XPC provide examples of the usage of the XML Schema defined in this document.

See also: Common Schema

IBM Puts Its SOA Where Its Virtualization Is
Clint Boulton, InternetNews.com

IT analysts have been saying for the last few years that companies can leverage greater computing efficiencies by plugging virtualization technologies into a service-oriented architecture (SOA). IBM is ushering virtualization to the SOA altar this week by splicing virtualization capabilities from its System p servers with SOA middleware to help customers boost hardware and software performance and ease maintenance pain. Virtualization enables IT administrators to run multiple pieces of software on a single machine. IBM employs virtualization in its System p servers, which allows a single machine to be split into multiple partitions, which can each run different operating systems and multiple applications. This helps customers shift processing resources during periods of peak activity. SOA is a distributed computing paradigm that allows multiple services and software to be reused, making computing more efficient. Plugging virtualization into an SOA, then, makes sense. But that doesn't mean it's easy. IBM will start offering System p Configurations for SOA Entry Points to allow processing resources to be shifted from one partition to another on the fly to meet peak demand needs. IBM's System p machines run IBM's AIX and Linux operating systems on the same box. Acording to IBM: "SOA is an architectural approach that structures IT assets as a series of reusable, loosely-coupled services that perform business functions using a standard specification. SOA creates the flexibility that successful businesses require. IBM has defined the SOA Foundation, an integrated, open set of software and best practices that provide what is need to get started with SOA. The software that makes up the SOA Foundation has been carefully selected from the broader IBM software portfolio to support each stage of the SOA lifecycle. To build flexible business on SOA, the SOA Foundation requires a rock solid (i.e., high performance, stable, high availability, with excellent security and reliability features) IT Infrastructure like System p."

See also: the announcement

Tutorial: Specializing DITA Conditional Attributes
W. Eliot Kimber, Blog

A new feature in Darwin Information Typing Architecture (DITA) 1.1 is the ability to specialize from the base= and props= attributes. For conditional processing, this lets you add your own attributes rather than using otherprops=, which can be clearer to authors and implementors. Note that at the time of writing the DITA Open Toolkit does not implement support for specializations of props=, but it should be added soon. This form of specialization is fairly easy to implement. This tutorial shows you how to do it using DTDs; the mechanism using Schemas is essentially the same and if you've stepped up to using the DITA 1.1 schemas I'm going presume you can figure this out on your own. The specialization requires two things: (1) Modification of any shell DTDs that need to reflect the specialized attribute (e.g., topic.dtd, reference.dtd, or your own specialized topic types' shell DTDs). You integrate the specialization attribute domain through the shell DTDs. (2) For each specialization of props=, a '.ent' declaration set that defines the attribute and a corresponding domain declaration. This is the "attribute domain specialization module". Note that as a rule, any production use of DITA will likely require local versions of the DITA-provided shell DTDs, if only to do configuration of the domains you need, so unless you are using DITA very informally, you should already have local copies of all the DITA-provided shell DTDs... For this tutorial assume we're putting everything in the directory dtd/myspecs within the normal DITA Open Toolkit distribution structure. It can go anywhere as long as you configure the entity resolution catalogs appropriately, but for initial development and testing I find it convenient to use relative paths to the various declaration components as that eliminates a variable from the configuration (resolution via catalogs) that can lead to confusing errors. Once you've established that the declarations are correct you should replace all relative paths with absolute URLs or (if you must) PUBLIC IDs that are resolved via catalogs. For my development work I use the OxygenXML editor, which makes it easy to set up catalog configurations for testing resolution via catalogs, and generally testing the correctness of all the parts...

See also: DITA resources

Update: Google to Make Search Logs Anonymous
Stephen Lawson, InfoWorld

Google will start making its records about users' searches anonymous after 18 to 24 months under a policy announced recently. Until now, the dominant search company has indefinitely retained a log of every search, with identifiers that can associate it with a particular computer. The new policy, to be implemented within the next year, is intended to better protect users' privacy, two executives wrote in a Google Blog entry posted Wednesday. Privacy advocates have raised alarms over search providers and other Internet companies retaining information about users' activities because that data could be subpoenaed by law enforcement, be lost by the provider, or fall into the hands of hackers. Under the new policy, unless Google is legally required to retain them longer, server logs will still be retained but will be "anonymized" after 18 to 24 months so that they can't be identified with individual users, according to the blog entry. It was written by Peter Fleischer, Google's privacy counsel for Europe, and Nicole Wong, the company's deputy general counsel. Engineers are working out the technical details now... Two high-tech civil rights groups have called the move a good first step but said more work needs to be done. "This is a big step in the right direction," said Ari Schwartz, deputy director of the Center for Democracy and Technology. "Keeping the data around forever significantly compromises (Google's) users' privacy" according to Kevin Bankston, a staff attorney at the Electronic Frontier Foundation, in San Francisco. The U.S. government probably has subpoenaed search log data on individuals in criminal investigations, a move it wouldn't necessarily have to reveal; another danger is that an angry spouse or business partner could obtain the information in the course of a lawsuit.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors