Reuters has announced an updated release of the NewsML Toolkit Version 1.0. "The NewsML Toolkit is an open-source Java library for reading and processing NewsML documents. NewsML 1.0 is a news-industry packaging and metadata standard for exchanging multi-part news and information in multiple media. This latest version of the Java based toolkit follows the alpha testing version released in December 2000. Developed in conjunction with XML specialists, Megginson Technologies Ltd., it is designed to simplify the processing of NewsML documents for subscribers to NewsML Services. NewsML Toolkit v1.0 now offers increased functionality including: (1) Access to all elements and attributes in a NewsML document via a Java Application Programming Interface; (2) Complete metadata extraction; (3) The ability to locate vocabularies which source content values for elements in a NewsML document; (4) Complete JavaDoc documentation. The NewsML Toolkit is available for use under the GNU Lesser General Public License. Reuters is also developing two complementary toolkits that build on NewsML Toolkit v1.0's document content extraction. These tools include: (1) A conformance checking tool, which will verify a NewsML document beyond visual basic DTD validation; (2) An application level toolkit, which will provide an intuitive level of parsing of a NewsML document. This allows the user to pose queries from a news perspective and 'prune' NewsItems in order to keep only selected content."
"NewsML is an XML-based standard that describes and packages news in various media formats for delivery to any platform. At the heart of NewsML is the concept of the NewsItem, which can contain various media, including text, pictures, graphics and video. NewsML is flexible and extensible and uses standard Internet naming conventions for identifying the news objects in a NewsItem. Content does not have to be embedded in a NewsItem; pointers can be inserted to content held on a publisher's website. This means subscribers retrieve the data only when they need to and makes NewsML bandwidth-efficient. NewsML is a standard of the International Press and Telecommunications Council (IPTC)."
From the NewsML Toolkit architectural overview document:
Like any markup specification, NewsML actually contains two parts: (1) a logical model for the structure of a NewsML package; and (2) rules for representing an instance of that model in XML markup. Developers of programs to work with NewsML need to have at least some familiarity with the first part -- they need to know, for example, that a NewsItem contains a NewsComponent, and that a NewsComponent can contain several other types of nodes -- but there is no reason that they need to learn the second part, since it can be handled automatically by an XML-aware Java library like the NewsML Toolkit.
The toolkit contains a collection of Java interfaces for the different structures that can appear in a NewsML document; these interfaces hide all of the details of XML processing, so that a Java programmer with little or no knowledge of XML markup can write programs to extract information from NewsML packages... The toolkit contains many interfaces, but it is designed so that it can be learned incrementally: a developer approaching the toolkit and NewsML for the first time should be able to create useful applications after learning only a few key interfaces such as NewsMLFactory, NewsML, NewsItem, NewsComponent, and ContentItem.
Base Interfaces: NewsML is a complex specification, and that complexity is mirrored in the large number of interfaces in the NewsML toolkit itself. The NewsML XML document type contains many structures that are almost but not quite the same, and those slightly-divergent structures can make it difficult to write generalised, reusable code to process NewsML documents. To help alleviate this problem, the NewsML Toolkit contains a series of more abstract interfaces that capture the simple, common patterns that do appear. These base interfaces do not correspond directly with specific markup structures, but they capture similar patterns that make up parts of many different substructures. These interfaces make it possible to write reusable code for common situations, such as navigating through the main structure of a NewsML document or processing a series of comments; they also simplify the process of learning both the NewsML XML document type and the Java interfaces in the toolkit.
Principal references:
- Announcement: "Reuters Launches Latest Newsml Toolkit."
- NewsML Toolkit Project on SourceForge
- Download, [cache]
- NewsML Toolkit (1.0) Architectural Overview [plain text version]
- Earlier description
- NewsML Website
- "NewsML and IPTC2000" - Main reference page.