Cover Pages: XML Daily Newslink: Thursday, 15 May 2008

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Primeton http://www.primeton.com

Headlines

SEC Proposes Requirement for Use of XBRL in Financial Reporting
YANG: A Data Modeling Language for NETCONF
Yahoo SearchMonkey Is Out of Its Cage
The State of the Service Component Architecture (SCA): An Update
Revision of the PREMIS Data Dictionary for Preservation Metadata
A Survey of Trust and Reputation Systems for Online Service Provision
Defending the XML Angle Bracket Tax

SEC Proposes Requirement for Use of XBRL in Financial Reporting
Staff, U.S. SEC Announcement

The U.S. Securities and Exchange Commission has voted to formally propose using new technology to get important information to investors faster, more reliably, and at a lower cost. At the center of the SEC proposal is "interactive data": computer tags similar in function to bar codes used to identify groceries and shipped packages. The interactive data tags uniquely identify individual items in a company's financial statement so they can be easily searched on the Internet, downloaded into spreadsheets, reorganized in databases, and put to any number of other comparative and analytical uses by investors, analysts, and journalists. The proposed rule would require all U.S. companies to provide financial information using interactive data beginning next year for the largest companies, and within three years for all public companies. According to SEC's John White, all of the technology is coming together to make electronic filing a true analytical tool. The staff has gathered valuable experience during the almost three years that public companies have been submitting interactive data in our voluntary filer program... The SEC's proposed schedule would require companies using U.S. Generally Accepted Accounting Principles with a worldwide public float over $5 billion (approximately the 500 largest companies) to make financial disclosures using interactive data formatted in eXtensible Business Reporting Language (XBRL) for fiscal periods ending in late 2008. If adopted, the first interactive data provided under the new rules would be made public in early 2009. The remaining companies using U.S. GAAP would provide this disclosure over the following two years. Companies using International Financial Reporting Standards as issued by the International Accounting Standards Board would provide this disclosure for fiscal periods ending in late 2010. The disclosure would be provided as additional exhibits to annual and quarterly reports and registration statements. Companies also would be required to post this information on their websites. The required tagged disclosures would include companies' primary financial statements, notes, and financial statement schedules. Initially, companies would tag notes and schedules as blocks of text, and a year later, they would provide tags for the details within the notes and schedules. XBRL is an XML-based schema that focuses specifically on the requirements of business reporting. XBRL builds upon XML, allowing accountants and regulatory bodies to identify items that are unique to the business reporting environment. The XBRL schema defines how to create XBRL documents and XBRL taxonomies, providing users with a set of business information tags that allows users to identify business information in a consistent way. XBRL is also extensible in that users are able to create their own XBRL taxonomies that define and describe tags unique to a given environment.

See also: the XBRL FAQ document

YANG: A Data Modeling Language for NETCONF
Martin Bjorklund (ed), IETF Internet Draft

A version -00 IETF Internet Draft has been published for "YANG: A Data Modeling Language for NETCONF." Today, the NETCONF protocol (IETF RFC 4741) lacks a standardized way to create data models. Instead, vendors are forced to use proprietary solutions. In order for NETCONF to be a interoperable protocol, models must be defined in a vendor-neutral way. YANG provides the language and rules for defining such models for use with NETCONF. YANG is a data modeling language used to model configuration and state data manipulated by the NETCONF protocol, NETCONF remote procedure calls, and NETCONF notifications. This document describes the syntax and semantics of the YANG language, how the data model defined in a YANG module is represented in XML, and how NETCONF operations are being used to manipulate the data. YANG models the hierarchical organization of data as a tree in which each node has a name, and either a value or a set of child nodes. YANG provides clear and concise descriptions of the nodes, as well as the interaction between those nodes. YANG structures data models into modules and submodules. A module can import data from other external modules, and include data from submodules. The hierarchy can be extended, allowing one module to add data nodes to the hierarchy defined in another module. This augmentation can be conditional, with new nodes to appearing only if certain conditions are met. YANG models can describe constraints to be enforced on the data, restricting the appearance or value of nodes based the presence or value of other nodes in the hierarchy. These constraints are enforceable by either the client or the server, and valid content must abide by them. YANG defines a set of built-in types, and has a type mechanism through which additional types may be defined. Derived types can restrict their base type's set of valid values using mechanisms like range or pattern restrictions that can be enforced by clients or servers... YANG strikes a balance between high-level object-oriented modeling and low-level bits-on-the-wire encoding. The reader of a YANG module can easily see the high-level view of the data model while seeing how the object will be encoded in NETCONF operations. YANG is an extensible language, allowing extension statements to be defined by standards bodies, vendors, and individuals. The statement syntax allows these extensions to coexist with standard YANG statements in a natural way, while making extensions stand out sufficiently for the reader to notice them. YANG modules can be translated into an XML format called YIN, provided in Appendix B, allowing applications using XML parsers and XSLT scripts to operate on the models. XML Schema files can be generated from YANG modules, giving a precise description of the XML representation of the data modeled in YANG modules.

Yahoo SearchMonkey Is Out of Its Cage
Clint Boulton, eWEEK

Playing off the belief that Web users turn to search engines to get information to complete tasks, Yahoo has released its new open developer platform to let programmers write applications that boost the relevance of search results. SearchMonkey comprises three layers: First, Yahoo partner publishers, such as The New York Times, Yelp, eBay and StumbleUpon, share structured data with Yahoo. Third-party developers then access this content through semantic markup languages, such as microformats and RDF, standardized XML feeds, Web services APIs, and page extraction, to create widgets. These widgets will include navigational links, reviews, contact information and locations to provide enhanced search listings. Finally, developers make these apps available in a gallery on Yahoo, from which consumers can grab them to customize their searches. According to the online SearchMonkey Guide: Site Owners have web sites containing the data retrieved by SearchMonkey applications. To be used by SearchMonkey applications, this data must be structured. Site owners can make structured data available to Yahoo! in any of the following ways: (1) Atom Feeds: Site owners push data to Yahoo! by submitting Atom feeds. (2) Markup: Site owners markup up their web pages with microformats or RDFa/eRDF, extracted by Yahoo! when crawling these URLs. Microformats are the leading established standard for web page markup. RDFa and eRDF are also widely accepted standards. Since these are all open standards, marking up your pages using any of these formats makes your content more easily reusable. (3) Web Services: Site owners create custom Web Services that provide access to their structured data... The Adjunct Syntax Specification (with Relax NG Compact Syntax specification) describes a method called DataRSS for embedding arbitrary metadata within feed vocabularies, including RSS, Atom, IDIF, and others. A "searchmonkey-profile Vocabulary Specification" defines a set of terms (classes, properties and data types) recommended for use in DataRSS feeds and in pages with embedded RDFa and eRDF. It builds on well-established vocabularies such as Dublin Core and the FOAF vocabulary, as well as common RDF vocabularies for microformats such as hCard, hCalendar and hReview. Its set of common terms can help developers to get started... The following microformats are supported: hCard, hCalendar, hReview, hFeed and XFN. You may use any RDF or OWL vocabulary. However, if you are publishing data using a custom-made vocabulary, make sure you make the schema definition easy to discover so that others can understand your data and build applications on it. The best approach is to follow the recommendations regarding "cool URIs" and serve both textual and machine processable vocabulary definitions at the locations where your URIs are pointing to... Microformats, eRDF and RDFa are different ways of embedding metadata inside Web pages. They represent different trade-offs in terms of ease of authoring versus expressibility. Microformats are the easiest to write and understand, but may not fill all your metadata needs. In particular, you may not find an appropriate vocabulary to represent your information. eRDF and RDFa allow you to work with any RDF or OWL vocabulary, and create your own vocabulary or reuse existing ones. eRDF is a subset of the full RDF model; for example, you can only make statements about the current page. RDFa offers all the features of RDF, making it the most complex of the three formalisms but also the most powerful one.

See also: the developer documentation

The State of the Service Component Architecture (SCA): An Update
David Chappell, Blog

"I moderated a panel on Service Component Architecture (SCA) at JavaOne last week. I was also the moderator for last year's SCA panel, and several of the same people were on the panel with me this time. While the things we talked about were broadly similar, two things stand out about what's changed in a year. The first is that SCA is real, or at least part of it is. One of the things the SCA specs define is an XML-based language called the Service Component Definition Language (SCDL). SCDL is meant to provide a vendor-neutral way to describe how components created in various technologies, such as Java, BPEL, and Spring, are configured and wired together to create applications. Vendors were showing SCDL in real products on the JavaOne floor—Oracle had an especially nice demo—and so it's clear that this part of SCA is seeing some success. Whether SCDL will in fact provide much cross-vendor portability remains to be seen. As usual, this depends on how many proprietary extensions vendors add. Still, a standard language for describing the components and assembly of an application is a useful idea, and the signs so far are promising. The second thing that stands out after a year is less promising: It's the confusion around how to write SCA components. Along with SCDL, the SCA specs define how to create components using several different technologies. Yet the various SCA vendors and open source projects can't agree on which of these to implement. SCA support for Spring components, for example, is hit or miss: some SCA offerings support it, some don't. BPEL is much the same: Oracle is a big fan, while the open source Fabric3 currently has no BPEL support. And just as it was a year ago, support for SCA's new programming model for creating Java components is uneven. As I've written before, I believe that this aspect of the spec is really important—it unifies the diverse approaches of Java EE much as Microsoft's Windows Communication Foundation (WCF) unified the diverse programming models in the original .NET Framework... The stated goal of SCA is to provide application portability. Widespread support for SCDL is an essential part of this, but so is agreeing on how to create SCA components. For SCA to really improve portability, the vendors and open source projects that support it need to agree on how their customers should create components.

Revision of the PREMIS Data Dictionary for Preservation Metadata
Brian F. Lavoie, D-Lib Magazine

Released in May 2005, the PREMIS Data Dictionary for Preservation Metadata was the first comprehensive specification for preservation metadata produced from an international, cross-domain consensus-building process. The PREMIS working group, jointly sponsored by OCLC and RLG, consisted of more than 30 experts from 5 countries, representing libraries, archives, museums, government agencies, and the private sector. After about two years, the Maintenance Activity felt that enough feedback had accumulated to warrant undertaking the first revision of the Data Dictionary and its XML schema. The revision process began in October 2006, and ended with the release of the PREMIS Data Dictionary 2.0 in April 2008. This article briefly describes the revision process and its outcomes, including a summary of the major changes appearing in the new version of the Dictionary. The Maintenance Activity is establishing a registry (standard vocabularies for particular semantic units) in the near future, populated initially by lists of suggested values for semantic units supplied in PREMIS 2.0. Implementers will be encouraged to contribute other vocabularies in use to the registry. A mechanism is under development to enable the identification of the source of these controlled vocabularies and to validate appropriate values using an XML schema. A registry of controlled vocabularies should be of considerable value to the community, both as a reference to inform implementation decisions, and as a means of encouraging convergence and standardization... The PREMIS schema has been endorsed by the Metadata Encoding and Transmission Standard (METS) Editorial Board as an approved extension schema for METS. The METS schema is widely used by digital repositories as a packaging mechanism for objects and their associated metadata. A number of questions have emerged as to how the PREMIS Data Dictionary and schema should be used in conjunction with METS. The Maintenance Activity has convened a group of experts to develop a set of guidelines and recommendations for using PREMIS and METS, and a working draft of their findings is now available online.

A Survey of Trust and Reputation Systems for Online Service Provision
A. Josang, R. Ismail, C. Boyd (eds), Decision Support Systems

This article preprint was contributed to the OASIS Open Reputation Management Systems (ORMS) Technical Committee document repository by TC Co-chair Nat Sakimura (Nomura Research Institute, Ltd). The OASIS TC was recently chartered to develop an Open Reputation Management System (ORMS) that provides the ability to use common data formats for representing reputation data, and standard definitions of reputation scores. Document abstract: "Trust and reputation systems represent a significant trend in decision support for Internet mediated service provision. The basic idea is to let parties rate each other, for example after the completion of a transaction, and use the aggregated ratings about a given party to derive a trust or reputation score, which can assist other parties in deciding whether or not to transact with that party in the future. A natural side effect is that it also provides an incentive for good behaviour, and therefore tends to have a positive effect on market quality. Reputation systems can be called collaborative sanctioning systems to reflect their collaborative nature, and are related to collaborative filtering systems. Reputation systems are already being used in successful commercial online applications. There is also a rapidly growing literature around trust and reputation systems, but unfortunately this activity is not very coherent. The purpose of this article is to give an overview of existing and proposed systems that can be used to derive measures of trust and reputation for Internet transactions, to analyse the current trends and developments in this area, and to propose a research agenda for trust and reputation systems."

See also: the OASIS ORMS TC

Defending the XML Angle Bracket Tax
Norm Walsh, Blog

"I've spent a couple of days trying to decide if I want to respond to Jeff Atwood's swipe at XML... Jeff tries to show how much better RFC 822 is for email. There's no question that it's more compact; I could learn to author email in XML, but I'm not anxious to do it. On the other hand, it's pretty obvious that XML is actually better. Jeff summarizes with a perfectly reasonable statement: 'I don't necessarily think XML sucks, but the mindless, blanket application of XML as a dessert topping and a floor wax certainly does. Like all tools, it's a question of how you use it.' I can't really disagree with that. XML may be my hammer of choice, but I don't hang picture hooks with a sledge hammer. If your data is really simple, maybe just a set of key/value pairs, and if both the key and the value are strings, and if the consequences of bad data are negligible, and if there's no possibility that there will ever be any additional complexity, then sure, maybe a flat text file is all you need... RELAX NG has both an XML syntax and a compact (non-XML) syntax. It's possible to author in both of them, and you can translate from one to the other without any loss of data, and with minimal loss of formatting. I author mostly in the compact syntax. Nevertheless, I absolutely rely on the XML syntax because having the XML syntax makes the entire schema amenable to processing with an enormous range of XML tools. General purpose tools that work equally well with RELAX NG and other XML languages. Tools that I did not have to write, test, debug, or document. The lesson, if there's a lesson, is that even if you think a non-XML syntax is better for one purpose or another, the ability to translate into (and back out of) an XML syntax is a good thing. Of course, devising two syntaxes, and making them isomorphic, and making it possible to translate back and forth without destroying one format or the other, is a huge amount of work. It's usually easier to just use XML... I don't necessarily think all the alternatives to XML suck, but the mindless, knee-jerk rejection of XML because it contains a small amount of additional syntax certainly does. Like all tools, it's a question of how you use it. Please think twice before subjecting yourself, your fellow programmers, and your users to more fragile, ASCII-only, ad hoc syntaxes.

See also: James Clark on JSON


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors