[Cache from http://www.iptc.org/xn-2.htm; please use this canonical URL/source if possible.]

NewsML Requirements

NSM0002 (XN-2)

5th IPTC Draft

16 March 2000

Jo Rabin


Copyright © 1999, 2000 Reuters Limited, IPTC
All Rights Reserved

Revision History

5th IPTC Draft


Updated following Washington meeting 2000-01-28

4th IPTC Draft


For Public Comment, edited by D M Allen and Jo Rabin

3rd IPTC Draft


Updated following IPTC Standards Committee meeting at Heathrow 1999-11-22

2nd IPTC Draft


Presented to IPTC meeting at Amsterdam 1999-10-06


The latest version of this document in HTML format can be obtained from http://www.iptc.org/xn-2.htm.


NewsML is a media independent structural framework for representation of news.

Headline Requirements

NewsML will


support the representation of electronic news entities i.e. news-items, parts of news-items, collections of news-items, relationships between news-items and metadata associated with news-items;


be usable throughout the news lifecycle;


allow news-items to consist of arbitrary mixtures of media types, languages and encodings;


be usable either as a replacement for or allow the transport of all existing news formats and encodings;


support a number of different constructions of the same data;


support the management and development of news-items over time;


be simply extensible and flexible;


allow for authentication and signature of metadata and news-item content;


not be unduly verbose;


use XML and other appropriate standards and recommendations;


provide a lightweight facility for the representation of text.





A unit of news as defined by the editorial or product construction practice of a supplier.


A component of a news-item or of another part, which may consist of another news-item, a piece of text, a picture, another part, etc. which bears a specific named relationship to its container. Parts may contain alternatives, which fulfil that named role, or complements which bear subsidiary named relationships to the container.


Development and Explanation of Requirements


support the representation of electronic news entities i.e. news-items, parts of news-items, collections of news-items, relationships between news-items and metadata associated with news-items


Structure: News-items have structure, i.e. are a composite of component parts of arbitrary media types with named relationships to each other.


Links: News-items may have named relationships to other news-items.


Named Entity Identification: The presence of and the location of named entities (i.e. people, places, products etc) within news-items must be supported in a media independent manner.


Specific Features of News: News-items of all media types have specific features like by-line, dateline, slug,


Metadata: NewsML must be capable of representing metadata necessary to support news as it passes through its lifecycle (see note).


Metadata Standards: NewsML must be capable of representing metadata from both standard and non-standard schemes as well as alternative representations of the same metadata (e.g. organised according to different schemes).


External Metadata: NewsML must support the attachment of metadata that is relevant only to a specific application or production process. This metadata need not be XML formatted.


Electronic News: News ML does not natively support features required specifically for news production in a particular medium (e.g. print) but does support the reference to this information as an external scheme (cf. 152).


Presentation Semantics: NewsML does not natively convey presentation semantics but supports the use of standard mechanisms that do support such semantics, such as style sheets. It covers logical components but does not specify the physical or temporal relationship between components when presented in a publishing medium.


be usable throughout the news life cycle.


Syndication: NewsML must be capable of representing the semantics required of news which passes through an arbitrarily complex chain consisting of original publisher and any number of other integrators, aggregators and distributors. Value may be added at any point in this distribution chain.


Construction Tools: NewsML must be suitable for use with off-the-shelf editors and other applications for manipulation of news without undue configuration and with minimal customisation of those applications.


Display Tools: NewsML must be deliverable to commonly available consumer applications, especially browsers, without undue configuration and with minimal customisation of those applications.


Transformation: It must be possible to transform NewsML into and from a range of formats, especially IIM, and NITF. Since NewsML provides for considerably greater richness of structure than other formats such transformations may involve a loss of information. NewsML must also be transformable into delivery formats such as HTML and WML etc.


Workflow: NewsML does not natively support workflow semantics. It does support the use of standard mechanisms that do support such semantics to allow routing of news-items through the editorial and production processes. NewsML allows for the capture and retention of some aspects of workflow process information as "production metadata".


allow news-items to consist of arbitrary mixtures of media types, languages and encodings


No Text Bias: NewsML is not to assume that text is the primary vehicle for news. All media types are to be treated equally.


Alternatives: NewsML must support alternative renderings of the same part as equivalents - e.g. text in different languages, text in different encodings graphics in different encodings. Encoding refers to the application format of the data (e.g. HTML or MSWord) as well as what type of compression, encryption or technique for representation of the value of the data (base64, binary or whatever) It must be possible to determine the original representation of a part.


Source Language: Where more than one language alternative is available it must be possible to identify the original alternative from which a translations is derived.


Heterogeneous Parts: When considering relationships between the component parts of news-items, no assumptions are made as to the media types of those parts, nor is it to be assumed that alternative renderings of the same part are of the same media type.


International: All languages that can be represented using Unicode are supported equally


be usable either as a replacement for or allow the transport of all existing news formats and encodings


Replacement: NewsML must be able to act as a replacement for existing formats (e.g. IIM) where these formats fall within the requirements of NewsML


Transport: Where existing formats (including IIM) convey semantics beyond those of NewsML such as layout, NewsML must be able to transport objects in such formats as alternative representations.


Transmission Capability: NewsML is required to be suitable for streaming and broadcast (i.e. essentially one-way) environments as well as request/response environments of varying forms.


support a number of different constructions of the same data


Heterogeneous Representation: The constraints of time and space as well as the demands of specific applications and delivery environments make it necessary to support different physical representations of the same news-item. Some applications may demand the delivery of a news-item and its parts as a unitary entity (e.g. in a streaming or broadcast environment), whereas others require that only the minimum information be transferred and that additional information in the news-item be retrievable on demand (e.g. in a mobile WAP environment).


Inclusion/Exclusion: To provide this flexibility NewsML must support the following concepts:


Explicit Inclusion

Information whose values are encoded in a particular representation of a news-item.


Inclusion by Reference

Information about how to obtain the values of the information in question.



Systematic or temporary omission of a part or class of data in a news-item as a feature of an application. (e.g. exclusion of certain types of metadata when publishing, exclusion of categorization and text information when retrieving lists of headlines ).


Mandatory Features: NewsML will specify a minimal number of mandatory features. The absence of a non-mandatory feature does not imply that the feature does not exist.


Identification: Because the component parts of a news-item may be included and excluded in arbitrary combinations NewsML must provide an identification mechanism through which it can be determined that two news-items are the same or not, without referring to their content. The identification is to be unambiguous and unique. The decision on assignment of identity is an editorial decision.


Nomenclature: To assist with the discussion of varying forms of representing news the following terms are used with specific meanings:



Two entities with corresponding parts included and excluded have the same representation. Representation refers to the physical form something has.



Two entities that have the same identifier and the same revision label are at the same revision. Revision refers to how up to date something is and does not refer to the physical form of presentation.


support the management and development of news-items over time


Takes: NewsML must support the creation and incremental delivery of partial news-items as well as the consolidated delivery of partial news-items.


News-item Development: It must provide for the changing construction of news-items over time, which may result from the different timeliness of availability of different media types (e.g. text may precede audio may precede video).


Maintenance: NewsML must support news maintenance tasks such as deletions, corrections, overwrites, embargoes and so on.


Revisions: NewsML must provide a deterministic mechanism to allow applications to distinguish earlier revisions of news-items from later revisions


be simply extensible and flexible.


Evolutionary Approach: The requirements of NewsML will develop over time. It must be possible to develop NewsML to meet those requirements without major discontinuity in the applications that support it.


Include Non News Information: Provide for inclusion of advertisements, sponsorship notices and application specific information, - for example it must be possible to include processable information from non-news domains (such as stock quotes) for applications that need such things. Presence of this data must not present an obstacle to rendering or other processing of NewsML by applications that were not designed with such features in mind. Default behaviour should be established to allow sensible rendering in the absence of an application specific processor.


Engineering Principles: To facilitate coherent and consistent development of NewsML it must have stated design principles (which must include naming conventions, see 1020) .


Documentation: As far as is practical the rationale behind design decisions made during the course of its development must be available in the future. Changes should be identified through a suitable audit trail.


allow for signature and authentication of metadata and news-item content.


Signed Content: NewsML must support the digital signature and authentication of content. Different parts of the same news-item may have different signatures.


Signed Metadata: NewsML must support the digital signature and authentication of metadata. Since metadata is applied at various points in the development of a news-item multiple signatures must be supported.


not be unduly verbose.


Transmission Capacity: NewsML is to be used in a wide variety of contexts with varying transmission capabilities. NewsML needs to provide the ability to express semantics in as concise a way as is possible while retaining the required flexibility. This is particularly important in respect of the development of news-items (above). It should be possible, at a later time, to deterministically recover such information in a comprehensive manner at a news-item level


Defaults: Provide for the setting of defaults to minimise duplication and repetition of data.


Special Case for Text: Since most existing news-items have extremely simple structure (they have a single textual part), the overhead for representing extremely small news-items should be reduced as far as possible.


Binary Data: NewsML must support the inclusion of binary parts in their native form (e.g. as well as supporting a Base 64 encoded representations of this data native binary representations should also be possible).


Ease of Processing: So far as is possible the encoding should be designed with ease and efficiency of processing by receivers in mind.


use XML and other appropriate standards and recommendations.


Web Conformance: Be consistent with widely adopted techniques and published international standards and recommendations to gain maximum leverage from the intellectual effort that has gone into their creation. In addition to gain maximum leverage from utilities, components and tools that are widely available for their processing.


Evolution: Incorporate new standards as they become available.


Use of XML: A consistent naming convention is to be adopted for the naming of elements and attributes. The naming conventions must not be unduly verbose, while maintaining readability.


Universality: Use Unicode where practical


provide a lightweight facility for the representation of text


Content of news-lines: These "specific features of news" referred to in 120 are inherently textual and need to be represented in an expeditious manner.


Text Structure: Provide a means of identifying the structure and sub-structure within textual parts (paragraphs, tables etc.).


Text Markup: The determination of the scope of this facility will be through the application of the following tests:



Is there a parallel structure in NITF that can be used?



Can this mechanism be applied equally to all media?



Do the structures fall in with the overall NewsML design requirements especially of minimising the number of tags?




News-items are very frequently found together in collections. NewsML supports collections of news objects that have been assembled with journalistic intent (such as a composite news-item consisting of the top ten news-items of the hour) and collections of news objects that are constructed in a more arbitrary way (such as responses to queries).


Collections are represented as news-items.


The intention behind the construction of collections may be determined from the metadata.

Equivalence of News-items


Because of the equivalence of alternatives and the ability to exclude data from any instance of a news-item, the equivalence or otherwise of news-items cannot be established from their physical form. Hence also there is no standard representation [or canonical form] for a news-item.


NewsML provides a mechanism for providers to mark news-items so that they may be distinguished in a unique and unambiguous way without reference to their representation.



Permissioning is not considered to be part of news management, It will be applied through external applications.

Life Cycle


Reference is made to the notion of a news life cycle. This notion is intended to capture the characteristic processes through which news-items may pass between being created and being consumed. This lifecyle includes but is not restricted to assignment, authoring, editing, storage, archiving, distribution, being searchable and so on.


NewsML does not provide the ability to specify or support workflow. It does provide the ability to capture the creation and distribution history of a news-item.