Cover Pages: XML Daily Newslink: Wednesday, 11 March 2009

A Cover Pages Publication http://xml.coverpages.org/
Provided by OASIS and Sponsor Members
Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by:
Microsoft Corporation http://www.microsoft.com

Headlines

Antiphishing Group Develops E-Crime Reporting Tool
LISA TMX 2.0 Specification Draft Released for Public Comment
GNU Privacy Guard Version 2.0.11 (GnuPG-2) Released
A Feature Set for the Extensible Messaging and Presence Protocol (XMPP)
Assessing FRBR in Dublin Core Application Profiles
Use XQuery for the Presentation Layer
Can IT Solve the Electronic Health Records Challenge?

Antiphishing Group Develops E-Crime Reporting Tool
Jeremy Kirk, Network World

A group dedicated to fighting phishing scams has developed a way for police and other organizations to report e-crimes in a common data format readable by a Web browser or other application. The challenge facing law enforcement and security organizations is a lack of a coherent reporting system, said Peter Cassidy, secretary general of the Anti-Phishing Working Group (APWG). The Anti-Phishing Working Group (APWG) is the global pan-industrial and law enforcement association focused on eliminating the fraud and identity theft that result from phishing,pharming and email spoofing of all types. Until now, there was no standard way to file an e-crime report. That makes it hard to coordinate the vast amount of data that is collected on cybercrime... APWG decided to develop a terminal file format for e-crime incidents. APWG wanted reports to have unambiguous time stamps, support for different languages, support for attaching malware and the ability to classify the kind of fraud and the company brand that was being attacked. APWG couldn't find an existing data model that was perfect. But the group did see potential for the XML-based Instant Object Description Exchange Format (IODEF), which was already being used by computer incident response teams to report adverse network events. APWG has created some extensions to IODEF to cover its other needs... For example, if a law enforcement agency wants all the reports for a phishing scam attacking a certain brand, another agency can just do a search in their database and quickly share the data, enabling faster response to e-crime. APWG is also creating tools to let people convert existing reports in a different format to the new format without having to do their own programming..."

Note: the IETF Internet Draft "Extensions to the IODEF-Document Class for Reporting Phishing, Fraud, and Other Crimeware" extends "The Incident Object Description Exchange Format" (RFC 5070) to support the reporting of phishing, fraud, other types of electronic crime, and widespread spam incidents. Deception activities, such as receiving an email purportedly from a bank requesting you to confirm your account information, are an expanding attack type on the Internet. The terms phishing and fraud are used interchangeably in this document to characterize broadly-launched social engineering attacks in which an electronic identity is misrepresented in an attempt to trick individuals into revealing their personal credentials (e.g., passwords, account numbers, personal information, ATM PINs, etc.)... Fraudulent events are reported in a Fraud Activity Report which is an instance of an XML IODEF-Document Incident element with added EventData and AdditionalData elements. The additional fields in the EventData specific to phishing and fraud are enclosed into a PhraudReport XML element. Fraudulent activity may include multiple emails, instant messages, or network messages, scattered over various times, locations, and methodologies. The PhraudReport within an EventData may include information about the email header and body, details of the actual phishing lure, correlation to other attacks, and details of the removal of the web server or credential collector..."

LISA TMX 2.0 Specification Draft Released for Public Comment
Rodolfo M. Raya and Arle R. Lommel (eds), OSCAR Public Committee Draft

LISA (Localisation Industry Standards Association) announced the publication of a Committee Draft specification for public review: Feedback is requested by April 10, 2009. OSCAR (Open Standards for Container/content Allowing Reuse) is LISA's open standards body for the translation and localization industry. OSCAR develops and maintains technical standards related to the linguistic needs of international businesses and their partners. OSCAR standards address core technical and business issues, such as translation memory, terminology management, translation-related text processing, word and text volume counts, and multilingual content management. The "TMX 2.0 Specification" defines version 2.0 of the Translation Memory eXchange format (TMX). The purpose of the TMX format is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process. TMX is defined in two parts: (1) A specification of the format of the container (the higher-level elements that provide information about the file as a whole and about entries). In TMX, an entry consisting of aligned segments of text in two or more languages is called a Translation Unit (the 'tu' element). (2) A specification of a low-level meta-markup format for the content of a segment of translation-memory text. In TMX, an individual segment of translation-memory text in a particular language is denoted by a 'seg' element...

TMX is XML-conformant. The TMX vocabulary is defined using an XML Schema, but also uses various third party standards for date/time and language codes. TMX files are intended to be created automatically by export routines and processed automatically by import routines. TMX files are well-formed XML documents that can be processed without explicit reference to the TMX Schema. However, a valid TMX file must conform to the TMX Schema, and any TMX file about which there are concerns should be verified against the TMX Schema using a validating XML parser... TMX provides a mechanism for the exchange of translation memory data, not application-specific features or data. Transferring data alone may not transfer the knowledge of how to process data. As a result, although TMX provides a rich set of elements for exchanging Translation Memory data, sometimes it may be necessary to extend TMX vocabulary using XML Namespaces in order to support functions needed for specific tasks. It is possible to add non-TMX elements, as well as attributes and attribute values, to any TMX document. All foreign elements and attributes added to a TMX file must be defined using an XML Schema. All XML Schemas declared in a TMX document must be made available to permit validation of the foreign constructs included in the file. Although TMX offers this extensibility mechanism, in order to avoid difficulty in processing and increase interoperability between tools, it is strongly recommended to use TMX capabilities whenever possible, rather than to create non-standard user-defined elements or attributes. Applications that depend on the TMX format for exchanging Translation Memory data are not required to understand or support non-TMX elements or attributes. A TMX application can safely ignore foreign elements or attributes present in a TMX document..."

See also: LISA standards

GNU Privacy Guard Version 2.0.11 (GnuPG-2) Released
Werner Koch, GnuPG Announcement

The GnuPG-2 development team announced the availability of a new stable GnuPG-2 release: Version 2.0.11. The GNU Privacy Guard (GnuPG) is GNU's tool for secure communication and data storage. It can be used to encrypt data, create digital signatures, help authenticating using Secure Shell and to provide a framework for public key cryptography. It includes an advanced key management facility and is compliant with the OpenPGP and S/MIME standards. GnuPG-2 has a different architecture than GnuPG-1 (e.g. 1.4.9) in that it splits up functionality into several modules. However, both versions may be installed alongside without any conflict. In fact, the gpg version from GnuPG-1 is able to make use of the gpg-agent as included in GnuPG-2 and allows for seamless passphrase caching. The advantage of GnuPG-1 is its smaller size and the lack of dependency on other modules at run and build time. We will keep maintaining GnuPG-1 versions because they are very useful for small systems and for server based applications requiring only OpenPGP support. GnuPG itself is a command line tool without any graphical stuff. It is the real crypto engine which can be used directly from a command prompt, from shell scripts or by other programs. Therefore it can be considered as a backend for other applications. However, even when used on the command line it provides all functionality needed - this includes an interactive menu system. The set of commands of this tool will always be a superset of those provided by any frontends. GnuPG is distributed under the terms of the GNU General Public License (GPL version 3).

A Feature Set for the Extensible Messaging and Presence Protocol (XMPP)
Peter Saint-Andre (ed), IETF Internet Draft

This XMPP "Feature Set" document defines a protocol feature set for the Extensible Messaging and Presence Protocol (XMPP), in accordance with the concepts and formats proposed by Larry Masinter within the IETF Network Working Group: "Formalizing IETF Interoperability Reporting." The 'formalizing' document suggests another way of reforming IETF standards process by formalizing the mechanism for interoperability reporting, as a way of facilitating standards development. It establishes two kinds of reports: a 'Protocol Feature Set', which lays out the set of features from IETF specifications that constitute a protocol, and a 'Protocol Implementation Report', which is submitted by an individual or group to report on implementation and interoperability testing... The Extensible Messaging and Presence Protocol (XMPP) is an application profile of the Extensible Markup Language for streaming XML data in close to real time between any two (or more) network-aware entities. XMPP is typically used to exchange messages, share presence information, and engage in structured request-response interactions. The basic syntax and semantics of XMPP were developed originally within the Jabber open- source community, mainly in 1999. In late 2002, the XMPP Working Group was chartered with developing an adaptation of the core Jabber protocol that would be suitable as an IETF instant messaging (IM) and presence technology. As a result of work by the XMPP WG, RFC 3920 and RFC 3921were published in October 2004, representing the most complete definition of XMPP at that time. The XMPP developer community has garnered extensive implementation and deployment experience with XMPP since 2004. In addition, formal interoperability testing has been carried out under the auspices of the XMPP Standards Foundation (XSF)...

This document defines a protocol feature set for XMPP; it describes the set of specifications and the features defined therein that constitute the Extensible Messaging and Presence Protocol for the purpose of interoperability testing. The specifications considered to define XMPP are rfc3920bis and rfc3921bis. Although the core XML streaming layer specified in rfc3920bis is not necessarily tied to the instant messaging and presence semantics specified in rfc3921bis, this interoperability report treats them as a single protocol, since to date they usually have been implemented and deployed as such. Where appropriate, this interoperability report discusses the relevant feature as specified in RFC 3920 or RFC 3921, experience and testing results related to that feature, and modifications to the feature as specified in rfc3920bis or rfc3921bis...

See also: XMPP earlier references

Assessing FRBR in Dublin Core Application Profiles
Talat Chaudhri, Ariadne

This article provides a detailed assessment of the FRBR structure of the Dublin Core Application Profiles funded by JISC (UK Joint Information Systems Committee). FRBR (Functional Requirements for Bibliographic Records) provides a core entity modeling framework used by digital library projects to represent relationships needed to support queries. FRBR addresses three main groups of entities: Group 1 entities are "(Intellectual) Work, Expression, Manifestation, and Item; they represent the products of intellectual or artistic endeavour. Group 2 entities are person and corporate body, responsible for the custodianship of Group 1s intellectual or artistic endeavour. Group 3 entities are subjects of Group 1 or Group 2's intellectual endeavour, and include concepts, objects, events, places." Dublin Core Application Profiles describe how documentary standards relate to standard domain models and Semantic Web foundation standards. Article excerpt: "Efforts to create standard metadata records for resources in digital repositories have hitherto relied for the most part on the simple standard schema published by the Dublin Core Metadata Initiative (DCMI), the Dublin Core Metadata Element Set, more commonly known as 'simple Dublin Core'. While this schema, by and large, met the aim of making metadata interoperable between repositories for purposes such as OAI-PMH, the explicit means by which it achieved this, a drastic simplification of the metadata associated with digital objects to only 15 elements, had the side effect of making it difficult or impossible to describe specific types of resources in detail. A further problem with this 'flat' metadata model is that it does not allow relationships between different versions or copies of a document to be described. The extension of these 15 elements in DCMI Terms, known as 'qualified Dublin Core' was an effective admission that richer metadata was required to describe many types of resources. Arguably it remains feasible to use 'simple Dublin Core' fields for certain resource types such as scholarly works in cases where especially complex metadata are not required. The problem has been that almost every repository uses these elements to some extent according to local needs and practice rather than adopting a standard. Inevitably this has had a negative impact on interoperability. Consequently, the concept of application profiles (APs) was developed. In essence, these are [metadata] schemas which consist of data elements drawn from one or more namespaces and [are] optimised for a particular local application. As they were originally formulated, they were intended to codify [pragmatic] practices... The engagement of both standards makers and implementors led first to the development of the Scholarly Works Application Profile (SWAP) and subsequently a range of other JISC- funded Dublin Core Application Profiles (DCAPs)... The question remains as to whether FRBR is suitable for Web delivery within repositories, and for which specific resource types and DCAPs. Until this is answered with practical testing, it will be difficult or even impossible to frame a 'core DCAP' and subsequently analyse whether the concept would be of practical use. Metadata need to be flexible and re-usable in the fast-changing world of repositories. In order to make best use of the specific improvements to repository metadata that the DCAPs have provided, it may be to their advantage to re-analyse their entity models..."

Use XQuery for the Presentation Layer
Brian M. Carey, IBM developerWorks

Many Web applications use the Model-View-Controller (MVC) pattern to separate the three concerns. These applications frequently use PHP or JavaServer Pages (JSP) technology in the presentation layer. While those technologies are widely accepted and certainly effective, they do not represent a language-independent means of presentation. On the other hand, like Structured Query Language (SQL), XQuery is a lookup specification tied to the XML standard, which is language- and platform-independent. Using XQuery for presentation enables view-side developers to create robust presentation effects without tying the view to any particular underlying application server or programming language... As it stands, XML is the almost universally accepted means of information interchange within the software development community. This is especially true when platform and language independence is a requirement, such as with Web services. It is also used for Web feeds because both RSS and Atom rely on XML. The returned results from Representational State Transfer (REST) invocations are often in XML format. It is even frequently used for software configuration purposes. Given the ubiquitous nature of XML, it makes sense to use it as the model in an MVC pattern. And since the prevailing standard for querying XML is with XQuery, it makes even more sense to use that technology when establishing the view. XQuery further sells itself in this way because it also enables transformation. The developer can extract the necessary information from an XML document and also display it in a manner fitting to the application requirements... This article explores the advantages of XQuery over other view technologies, how XQuery is implemented in the presentation layer, and a realistic example of such an implementation.

See also: W3C XQuery

Can IT Solve the Electronic Health Records Challenge?
Ephraim Schwartz, InfoWorld

A key initiative in the stimulus plan is addressing the issue of establishing electronic health records to reduce health care costs and improve the care itself by making all medical information for a patient available to anyone who treats her. The federal government has advocated for such a system for a decade, but little visible progress has been made... The good news is health care IT is likely to become a magnet that will surely attract IT professionals in other industries who know how to manage and maintain big enterprise systems, whether used by large hospitals and medical groups, provider collectives, or cloud-based EHR providers. What exists today is an alphabet soup of governing bodies, protocols, standards, near-standards, suggested best practices, and competitors, all well-intentioned but also contributing to the complexity. That situation requires both simplification at the process and standards level, as well as serious integration work at the IT level... Standard communications and networking protocols also have to be agreed on to permit data exchange, as well as a file format or file standard for interoperability... At one time, the solution to all these issues seemed simple enough. The Veterans Administration pioneered electronic medical records with its Vista system, and many lawmakers thought the best idea was to mandate the adoption of Vista as the single, national system, making it available at no charge to health care organizations along with subsidies to speed its adoption....

The policy debate has shifted to a discussion over creating interoperability standards using SOA, middleware, and standard file formats, not unlike the idea behind the Open Document Format (ODF) and the Resource Description Format (RDF) standards for information mediation based on extensile but structured meaning, says Jeff Bauer, a partner in management consulting for ACS Healthcare Solutions. At the moment, an effort is under way to create the Continuity of Care Document, an XML-based standard intended to become the equivalent of an RDF or ODF file that lets the various EHR vendors write to the same file format... The part-private, part-government CCHIT (Commission for the Certification of Health IT) does the certification, and it creates interoperability and definitional guidelines in three key EHR areas: privacy, format, and content. But agreeing upon an interoperable framework doesn't address another key issue: the creation of a unique glossary of terms to describe both medical procedures done to a patient as well as to describe a diagnosis. Currently, most hospitals and practices use IDC-9, the International Statistical Classification of Diseases and Related Health Problems9th Revision, which has a highly limited language of about 17,000 terms. Its successor, IDC-10, has about 155,000 codes and will permit the tracking of many new diagnoses and procedures. But deploying IDC-10 will be yet another challenge for doctors, nurses, and IT personnel... There are two major questions around the reliance on health records from these providers, say industry analysts. One is whether users will trust a for-profit organization to care for the most personal kind of information. The second is whether each of us can be trusted to manage and keep such a life-and-death record up to date or if it's safer to leave that responsibility to organizations whose only job it is to keep the health data updated...


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

Headlines

Sponsors