NewsML Elements and Attributes
XN-5
Jo Rabin
Revision 7
12th November 1999
draft
Accompanies 1999-11-10 DTD
Describes the functions of NewsML
Copyright © Reuters Limited 1999 All Rights Reserved
This document describes the elements and attributes of the NewsML DTD.
The requirements of NewsML are described in XN-2 and the intended functionality in XN-3. XN-8 describes various encoding choices that underpin the formulation of the DTD
Revision 7 - 12th November 1999
Changes to accompany 1999-11-10 DTD (remove ambiguous content model).
Revision 6 - 22nd October 1999
Removed material now in XN-3 and XN-8. Corrections and additions to synchronise with 1999-10-12 DTD.
Introduction. 1
Revision History..... 1
Contents... 1
Structural Elements...... 4
newsitem 4
newsitempart......... 5
Newslines............. 5
sourcedata............. 6
newsobject............. 6
data...... 7
text....... 7
p........... 7
link....... 8
records. 8
record... 8
field...... 8
Metadata Elements. 10
codes.. 10
code... 10
things.. 11
thing.... 11
altthings 11
editdetail........... 11
thinglocation......... 12
name... 12
dc....... 13
News Management
Elements. 15
handling 15
slug..... 15
product 15
service 15
routing 15
instructions........... 15
priority 15
urgency 15
status.. 16
permissions......... 16
cycle... 16
outcue. 16
action.. 16
Attribute values................. 17
Roles...... 17
Variants.. 17
xml:lang.. 18
things.class 18
codes.class.............. 18
Examples... 19
Example: Simple Story Encoding 19
Example - Multiple Part
Story Encoding 20
Example - Categorized
Story...... 21
Example -
Categorization with Corrections.............. 22
Example – A Kill......... 23
Example – A picture
with separations.............. 23
References. 24
Standards 24
NewsML References 24
<!ELEMENT newsitem (title+,
%newslines;,
((newsitempart+ | newsobject | text),
%newslines;)?,
metadata?,
handling*,
sourcedata?)>
A newsitem consists of a title, followed by any number of newslines in any order, optionally followed by some content, optionally followed by any number of tagline copyright or citation - in any order, followed by optional metadata, optional handling and finally optional sourcedata.
The ordering of these items is accidental and results from limitations of DTDs. Hence the presence of %newslines; twice - to allow publishers to for example have headlines preceding content and copyright notices training the content.
The content can be:
· One or more newsitemparts - for composite newsitems.
· One newsobject - this allows for newsitems that are e.g. a picture alone.
· One in-line text element - this allows for trivial textual story encodings.
Duplicated newslines are expected to contain different versions of the element for different languages. If two elements of this kind specify the same language the content of the later element takes precedence.
Attributes:
|
Attribute Name |
Presence |
Format |
Comment |
|
itemid |
Required |
Any |
Uniquely identifies this newsitem in the publishers domain |
|
date |
Required |
ISO Date |
A date associated with the story. It is not defined as to what kind of date this is, it can be the story creation date, the publication date etc. |
|
id |
Optional |
ID |
Identifies this element |
|
revision |
default 0 |
Integer |
The higher the number the later the revision. |
|
publisher |
Optional |
URL |
A means of disambiguating the id attribute and hence making it unique. Other data about the publisher, if needed, should be encoded as metadata. |
|
xml:lang |
Optional |
RFC 1766 |
Sets the default for the newsitem, indicates that story is intended especially people who wish to read this language. |
|
href |
Optional |
URL |
Information about where to get the story and hence where to get elements that have not been included. Always provides latest revision of story. |
|
parts |
Default 1 |
Integer |
How many parts there are in this newsitem. The actual number of parts present may be different as this figure identifies the total number of parts in the newsitem. |
<!ELEMENT newsitempart (%newslines;,
((newsitem | newsobject+ | newsitempart+),
%newslines;)?,
metadata? ,
sourcedata?)>
Note as above the same structure of Newslines is used and the same semantics are imputed to repeated elements.
A newsitempart consists of any number of Newslines in any order, optionally followed by some content, optionally followed by any number of tagline copyright or citation - in any order, optional metadata and finally optional sourcedata..
The content can be:
· One newsitem - this allows the construction of lists of stories.
· One or more newsobjects - this provides the mechanism by which a number of alternatives fulfilling the same role in the newsitem may be listed.
· One or more newsitemparts
|
Attribute Name |
Presence |
Format |
Comment |
|
id |
Optional |
ID |
Identifies the element. |
|
role |
Required |
named role |
Some systematic means of identifying the role that a newsitempart can play in a story. There may be a number of schemes for this, in which case it will be necessary to have some kind of namespace mechanism to distinguish them. Some thoughts about roles detailed at the end of this document. |
|
order |
Optional |
Integer |
A precedence order to parts may be given by specifying a number in this attribute. Parts which do not specify this attribute have no precedence and their precedence order among themselves is inferred from their order of presentation in the XML encoding. They have lower precedence than any part specifying this attribute. Parts with this attribute specified have decreasing precedence the higher thevaslue. The highest precedence is 0. |
|
alternatives |
optional |
true or false |
This attribute affects the interpretation of parts embedded in parts. Objects embedded in parts are alwys alternatives to each other. When parts are embedded they are considered alternatives to each other irrespective of the value of their role attribute if this attribute is true. They are complements to each other if this attribute is false. |
All Newslines have PCDATA content and id and xml:lang attributes.
<!ELEMENT sourcedata #PCDATA>
This element allows the transport of any XML compatible data or element structures. It is provided to allow applications that wish to take advantage of the capabilities of newsml but require additional application semantics above those developed for news. Sourcedata is not intended to extend NewsML semantics in an ad hoc way – it is for the expression of other semantics (quotes or whatever).
With the arrival of namespaces it will not be necessary to have this element and it will be removed in a later version of the spec. You have been warned!
|
Attribute Name |
Presence |
Format |
Comment |
|
encoding |
optional |
mimetype |
What encoding has been used |
|
compression |
optional |
mimetype |
What compression has been applied |
<!ELEMENT newsobject (%Newslines;,
((data|text),
%Newslines;)?,
metadata? ,
sourcedata?)>
Note: once again same structure of Newslines.
A newsobject consists of any number of Newslines in any order, optionally followed by some content, optionally followed by any number of tagline copyright or citation - in any order, optional metadata and finally optional sourcedata.
The content can be one of:
· One data element - this allows the in-line inclusion of non-textual NewsML encoded material.
· One text element - this provides the means for including NewsML content encoding in-line.
Note that this is a prime area for turning into RDF.
|
Attribute Name |
Presence |
Format |
Comment |
|
id |
Optional |
ID |
Identifies the element. |
|
mimetype |
Required |
mime specification |
what sort of object this is. If it is NewsML content encoding then this is "text/x‑newstext" |
|
mediatype |
Optional |
Enumerated |
Identify what type of object it is – e.g. GIF may be animated or an image |
|
variant |
Optional |
string |
the reason the object is present as an alternative, especially if this is not appraent from the other attributes |
|
xml:lang |
Optional |
RFC1766 |
language and variant if relevant |
|
href |
Optional |
URL |
where to get the content from if not included in-line as data or text |
|
height |
Optional |
integer |
vertical space occupied by object, if relevant |
|
width |
Optional |
integer |
horizontal space needed by object if relevant |
|
size |
Optional |
integer |
the size in bytes of the object if it is specified as a URL |
|
duration |
Optional |
integer |
the time it takes to experience the object if this is relevant |
|
colordepth |
Optional |
Integer |
How many colors |
|
characterset |
Optional |
String |
Which character encoding is used (not which alphabet) |
|
bandwidthtostream |
Optional |
Integer |
Minimum number of bits per second sustained throughput required to be able to stream this object |
<!ELEMENT data (#PCDATA)>
The data element contains in-line content that has been encoded to meet the requirements of XML in respect of valid characters for PCDATA. The format of the data packaged in this element is described in the containing newsobject element.
The data attributes describe what compressing has been applied followed by what encoding scheme was applied to the compressed result.
|
Attribute Name |
Presence |
Format |
Comment |
|
id |
Optional |
ID |
Identifies the element. |
|
encoding |
Required |
Mimetype |
Text/plain denotes none. |
|
compression |
Optional |
Mimetype |
What compression has been used. |
<!ELEMENT text (#PCDATA|p|link|records)* >
The text element allows the in-line encoding of textual content. The text can contain an arbitrary mixture of characters and p, link and records elements.
|
Attribute Name |
Presence |
Format |
Comment |
|
id |
Optional |
ID |
Identifies the element. |
<!ELEMENT p (#PCDATA|link)*>
The p element encapsulates text as a paragraph. Link elements can span text in paragraphs.
|
Attribute Name |
Presence |
Format |
Comment |
|
id |
Optional |
ID |
Identifies the element. |
<!ELEMENT link (#PCDATA)>
The link element denotes the text included in it as a hyperlink. More on this with the development of xLink.
|
Attribute Name |
Presence |
Format |
Comment |
|
id |
Optional |
ID |
Identifies the element. |
|
href |
Required |
URL |
Where the link leads to |
Records identifies a data structure consisting of data that can also be laid out without having to be interpreted by a computer. It is present to satisfy minimally the need for textual data to be organized in some way without defining layout elements like table. Records is more general than table because it does not require its rows to have the same columns. The application attribute allows a receiving program to determine what kind of program to use to interpret the data present (but several might be applicable for example if records contain data relating to closing prices then a graph application could be equally applicable as a straightforward tabular layout). The intention is that systems that do not understand the data are able to make some attempt at rendering it (as a table structure).
|
Attribute Name |
Presence |
Format |
Comment |
|
id |
Optional |
ID |
Identifies the element. |
|
application |
Option |
string |
Some means of identifying what kind of data this is so it can be rendered appropriately by relevant applications. This may be a stylesheet reference … |
<!ELEMENT record (field+)>
A container for field elements to be grouped together
|
Attribute Name |
Presence |
Format |
Comment |
|
id |
Optional |
ID |
Identifies the element. |
<!ELEMENT field (#PCDATA)>
A container for data
|
Attribute Name |
Presence |
Format |
Comment |
|
id |
Optional |
ID |
Identifies the element. |
|
name |
Optional |
string |
A way of distinguishing fields from each other and identifying the data content. |
<!ELEMENT codes (code*)>
Contains code elements that indicate the applicability of the code they contain to the content of the entity to which this metadata is attached.
It is intended that only one instance of any codes element has the same class/role values.
|
Attribute Name |
Presence |
Format |
Comment |
|
id |
optional |
ID |
Identifies the element. |