NewsML Elements and Attributes

XN-5

Jo Rabin

Revision 7

12th November 1999

draft
Accompanies 1999-11-10 DTD

Describes the functions of NewsML

Copyright © Reuters Limited 1999 All Rights Reserved


Introduction

This document describes the elements and attributes of the NewsML DTD.

The requirements of NewsML are described in XN-2 and the intended functionality in XN-3. XN-8 describes various encoding choices that underpin the formulation of the DTD

Revision History

Revision 7 - 12th November 1999

Changes to accompany 1999-11-10 DTD (remove ambiguous content model).

Revision 6 - 22nd October 1999

Removed material now in XN-3 and XN-8. Corrections and additions to synchronise with 1999-10-12 DTD.

Contents

Introduction. 1

Revision History..... 1

Contents... 1

Structural Elements...... 4

newsitem 4

newsitempart......... 5

Newslines............. 5

sourcedata............. 6

newsobject............. 6

data...... 7

text....... 7

p........... 7

link....... 8

records. 8

record... 8

field...... 8

Metadata Elements. 10

codes.. 10

code... 10

things.. 11

thing.... 11

altthings 11

editdetail........... 11

thinglocation......... 12

name... 12

dc....... 13

News Management Elements. 15

handling 15

slug..... 15

product 15

service 15

routing 15

instructions........... 15

priority 15

urgency 15

status.. 16

permissions......... 16

cycle... 16

outcue. 16

action.. 16

Attribute values................. 17

Roles...... 17

Variants.. 17

xml:lang.. 18

things.class 18

codes.class.............. 18

Examples... 19

Example:                 Simple Story Encoding   19

Example - Multiple Part Story Encoding 20

Example - Categorized Story...... 21

Example - Categorization with Corrections.............. 22

Example – A Kill......... 23

Example – A picture with separations.............. 23

References. 24

Standards 24

NewsML References 24

 


Structural Elements

newsitem

<!ELEMENT newsitem       (title+,

       %newslines;,

       ((newsitempart+ | newsobject | text),

                     %newslines;)?,

       metadata?,

                     handling*,

sourcedata?)>

A newsitem consists of a title, followed by any number of newslines in any order, optionally followed by some content, optionally followed by any number of tagline copyright or citation - in any order, followed by optional metadata, optional handling and finally optional sourcedata.

The ordering of these items is accidental and results from limitations of DTDs. Hence the presence of %newslines; twice - to allow publishers to for example have headlines preceding content and copyright notices training the content.

The content can be:

·        One or more newsitemparts - for composite newsitems.

·        One newsobject - this allows for newsitems that are e.g. a picture alone.

·        One in-line text element - this allows for trivial textual story encodings.

Duplicated newslines are expected to contain different versions of the element for different languages. If two elements of this kind specify the same language the content of the later element takes precedence.

Attributes:

Attribute Name

Presence

Format

Comment

itemid

Required

Any

Uniquely identifies this newsitem in the publishers domain

date

Required

ISO Date

A date associated with the story. It is not defined as to what kind of date this is, it can be the story creation date, the publication date etc.

id

Optional

ID

Identifies this element

revision

default 0

Integer

The higher the number the later the revision.

publisher

Optional

URL

A means of disambiguating the id attribute and hence making it unique. Other data about the publisher, if needed, should be encoded as metadata.

xml:lang

Optional

RFC 1766

Sets the default for the newsitem, indicates that story is intended especially people who wish to read this language.

href

Optional

URL

Information about where to get the story and hence where to get elements that have not been included. Always provides latest revision of story.

parts

Default 1

Integer

How many parts there are in this newsitem. The actual number of parts present may be different as this figure identifies the total number of parts in the newsitem.

newsitempart

<!ELEMENT newsitempart       (%newslines;,

                     ((newsitem | newsobject+ | newsitempart+),

                     %newslines;)?,

                     metadata? ,

sourcedata?)>

Note as above the same structure of Newslines is used and the same semantics are imputed to repeated elements.

A newsitempart consists of any number of Newslines in any order, optionally followed by some content, optionally followed by any number of tagline copyright or citation - in any order, optional metadata and finally optional sourcedata..

The content can be:

·        One newsitem - this allows the construction of lists of stories.

·        One or more newsobjects - this provides the mechanism by which a number of alternatives fulfilling the same role in the newsitem may be listed.

·        One or more newsitemparts

Attribute Name

Presence

Format

Comment

id

Optional

ID

Identifies the element.

role

Required

named role

Some systematic means of identifying the role that a newsitempart can play in a story. There may be a number of schemes for this, in which case it will be necessary to have some kind of namespace mechanism to distinguish them. Some thoughts about roles detailed at the end of this document.

order

Optional

Integer

A precedence order to parts may be given by specifying a number in this attribute. Parts which do not specify this attribute have no precedence and their precedence order among themselves is inferred from their order of presentation in the XML encoding. They have lower precedence than any part specifying this attribute. Parts with this attribute specified have decreasing precedence the higher thevaslue. The highest precedence is 0.

alternatives

optional

true or false

This attribute affects the interpretation of parts embedded in parts. Objects embedded in parts are alwys alternatives to each other. When parts are embedded they are considered alternatives to each other irrespective of the value of their role attribute if this attribute is true. They are complements to each other if this attribute is false.

 

Newslines

All Newslines have PCDATA content and id and xml:lang attributes.

sourcedata

<!ELEMENT sourcedata       #PCDATA>

 

This element allows the transport of any XML compatible data or element structures. It is provided to allow applications that wish to take advantage of the capabilities of newsml but require additional application semantics above those developed for news. Sourcedata is not intended to extend NewsML semantics in an ad hoc way – it is for the expression of other semantics (quotes or whatever).

With the arrival of namespaces it will not be necessary to have this element and it will be removed in a later version of the spec. You have been warned!

Attribute Name

Presence

Format

Comment

encoding

optional

mimetype

What encoding has been used

compression

optional

mimetype

What compression has been applied

 

newsobject

<!ELEMENT newsobject       (%Newslines;,

                     ((data|text),

                     %Newslines;)?,

                     metadata? ,

sourcedata?)>

Note: once again same structure of Newslines.

A newsobject consists of any number of Newslines in any order, optionally followed by some content, optionally followed by any number of tagline copyright or citation - in any order, optional metadata and finally optional sourcedata.

The content can be one of:

·        One data element - this allows the in-line inclusion of non-textual NewsML encoded material.

·        One text element - this provides the means for including NewsML content encoding in-line.

Note that this is a prime area for turning into RDF.

Attribute Name

Presence

Format

Comment

id

Optional

ID

Identifies the element.

mimetype

Required

mime specification

what sort of object this is. If it is NewsML content encoding then this is "text/x‑newstext"

mediatype

Optional

Enumerated

Identify what type of object it is – e.g. GIF may be animated or an image

variant

Optional

string

the reason the object is present as an alternative, especially if this is not appraent from the other attributes

xml:lang

Optional

RFC1766

language and variant if relevant

href

Optional

URL

where to get the content from if not included in-line as data or text

height

Optional

integer

vertical space occupied by object, if relevant

width

Optional

integer

horizontal space needed by object if relevant

size

Optional

integer

the size in bytes of the object if it is specified as a URL

duration

Optional

integer

the time it takes to experience the object if this is relevant

colordepth

Optional

Integer

How many colors

characterset

Optional

String

Which character encoding is used (not which alphabet)

bandwidthtostream

Optional

Integer

Minimum number of bits per second sustained throughput required to be able to stream this object

 

data

<!ELEMENT data          (#PCDATA)>

 

The data element contains in-line content that has been encoded to meet the requirements of XML in respect of valid characters for PCDATA. The format of the data packaged in this element is described in the containing newsobject element.

The data attributes describe what compressing has been applied followed by what encoding scheme was applied to the compressed result.

Attribute Name

Presence

Format

Comment

id

Optional

ID

Identifies the element.

encoding

Required

Mimetype

Text/plain denotes none.

compression

Optional

Mimetype

What compression has been used.

 

text

<!ELEMENT text          (#PCDATA|p|link|records)* >

 

The text element allows the in-line encoding of textual content. The text can contain an arbitrary mixture of characters and p, link and records elements.

Attribute Name

Presence

Format

Comment

id

Optional

ID

Identifies the element.

 

p

<!ELEMENT p          (#PCDATA|link)*>

 

The p element encapsulates text as a paragraph. Link elements can span text in paragraphs.

Attribute Name

Presence

Format

Comment

id

Optional

ID

Identifies the element.

 

link

<!ELEMENT link          (#PCDATA)>

 

The link element denotes the text included in it as a hyperlink. More on this with the development of xLink.

Attribute Name

Presence

Format

Comment

id

Optional

ID

Identifies the element.

href

Required

URL

Where the link leads to

 

records

 

Records identifies a data structure consisting of data that can also be laid out without having to be interpreted by a computer. It is present to satisfy minimally the need for textual data to be organized in some way without defining layout elements like table. Records is more general than table because it does not require its rows to have the same columns. The application attribute allows a receiving program to determine what kind of program to use to interpret the data present (but several might be applicable for example if records contain data relating to closing prices then a graph application could be equally applicable as a straightforward tabular layout). The intention is that systems that do not understand the data are able to make some attempt at rendering it (as a table structure).

Attribute Name

Presence

Format

Comment

id

Optional

ID

Identifies the element.

application

Option

string

Some means of identifying what kind of data this is so it can be rendered appropriately by relevant applications. This may be a stylesheet reference …

record

<!ELEMENT record       (field+)>

 

A container for field elements to be grouped together

Attribute Name

Presence

Format

Comment

id

Optional

ID

Identifies the element.

 

field

<!ELEMENT field         (#PCDATA)>

 

A container for data

Attribute Name

Presence

Format

Comment

id

Optional

ID

Identifies the element.

name

Optional

string

A way of distinguishing fields from each other and identifying the data content.

 

 


Metadata Elements

codes

<!ELEMENT codes         (code*)>

 

Contains code elements that indicate the applicability of the code they contain to the content of the entity to which this metadata is attached.

It is intended that only one instance of any codes element has the same class/role values.

Attribute Name

Presence

Format

Comment

id

optional

ID

Identifies the element.