[This local archive copy is from the official and canonical URL, http://www.mediacenter.org/NML-NITF.htm; please refer to the canonical source document if possible.]


Proposal to incorporate

the NML tag initiative into

the NITF DTD

Revised March 5 version


February 25, 1999

 

Author:

Glenn Cruickshank, Director, Tribune Solutions

The Salt Lake Tribune, Salt Lake City, UT USA

 


Index

Introduction | History | Methodology

Highlights

Publication tagging | Workflow_issues | Element_identification | Archiving issues

NML / NITF mapping

Appendix: Tag descriptions

<assignment> <bibliography> <correction> <pubdata> <date.published> <editor-list> <editor> <edition> <first-character> <front-page> <issue> <lead> <size><summary> <package> <page> <start-page> <publication-title> <publication-number> <rest> <section> <no-run> <volume> <orgid> <zone> <docdata>

 

 


Introduction.

 

In January of 1999, the American Press Institute hosted a Media Center Grammar Conference in Dallas TX to discuss the need for a news markup language within the newspaper industry. This group, consisting of representatives from a number of US newspaper chains, universities and API, developed a list of 40 tags which they felt were needed to identify journalistic content in news stories.

Members of this first group, called the API Grammarians, enlarged on this tag set and developed a preliminary tag proposal and partial DTD. On February 16, 1999, in Atlanta, GA, they met with a group from the NAA Wire Service Committee and several other systems vendors and discussed whether to merge the NML effort into the NITF DTD.

The author, a member of the NAA committee, was asked to align the NML tags with the NITF standard, and propose changes to the NITF to include the intent of the NML tags that were not supported in the current release of NITF. This document is that merger proposal.

 

History

In 1992, under the auspices of the NAA and International Press Telecommunications Council, representatives from several news organizations began working on "an industry standard for the interchange of textual material between news agencies and their clients" which would replace the aging non-Y2K compliant ANPA 1312 wire transmission format. This work continued through the 1990's, and additional news organizations joined the process. While the wire services, AP, Reuters and UPI, lead the process domestically, U.S. participants included representatives from RTNDA, Special Librarians Association, The New York Times, Chicago Tribune, Miami Herald, Apalochicola (FL) Times, Dow Jones, Newark Star Ledger, Lewiston (ID) Tribune, Lexis-Nexis and others, including systems vendors.

This group also worked closely with international news organizations and newspapers, under the umbrella of IPTC, the international standards organization.

The original effort was to create a SGML-based language. In 1998, with the explosive growth of the world wide web, the base language was changed from SGML to a derivative, XML. The goal remained to create "a device-independent format for textual and tabular information within the global news industry" and "to mark up text once for a variety of uses, including traditional print publications, broadcast news, and electronic services such as Web sites and archival databases." This markup language was named News Industry Text Format, or NITF.

The US committee working on NITF is named the NAA Wire Committee, and as such, has focused on primarily wire service issues. The underlying NITF framework, however, was designed to allow additional tagging for downstream content creation and identification. According to the NAA's John Iobst , there has always been plans in the NITF development to create adjuncts for archiving, workflow, markup and multimedia.

The emergence of the API Grammarians work has spurred work on those adjuncts to NITF.

 

Methodology

The Grammarians proposed an original tag set of about 40 tags. The author realized that tags set was insufficient for a number of workflow and re-purposing uses, so he analyzed a number of standards used by the online industry.

The author reviewed the VISF standard from Lexis-Nexis, the SIF format from MediaStream, Dialog-B, a Dow Jones data transmission standard, a modified ANPA1312 transmission format created by the Washington Post and America Online's Rainman format. The author also reviewed a number of production system markup tagging schemes from Atex, Harris, CText, ACT, Dewar, SII, DTI and several Word-based systems. The author included data from a news assignment management system and from several photo archive systems. Last, the list server for the Special Librarians Association News Division was notified of this work, and subscribers to that list, about 1,000, were solicited to provide any additional tags they felt were left out.

The author developed a preliminary tag list consisting of the Grammarians tags and tags uncovered during his other-standards review process. Next, he tried to describe those tags using existing NITF tags and markup. In the majority of cases, NML tags were already supported in NITF.

Where NITF did not have elemental support for certain NML tags, we have proposed structures, elements and attributes to support those NML tags within NITF.

 

Highlights

Not surprisingly, NITF did not overlap NML in a number instances, most notably in areas of publication, work flow and archiving-related tagging. The following solutions have been proposed:

Publication tagging.

Print publications use a number of methods to identify a specific publication product or cycle. These include the publication number, volume, issue, publication title, publishing date, edition, zones of distribution and page. A document may be published several times, so a mechanism needs to be created to record publishing information for each instance of publication. The solution outlined by the author uses a <pubdata> container within the <head> of the document to support multiple instances of publication tags. Each <pubdata> instance has a event attribute which is a number from 1 to n. For each event, users can assign specific volume, issue, edition, publication title, zone, etc., attributes.

For print publications, the page number issue can be quite complex if the publication uses "jumps", or continuations of the story from one page to another. In the case of multiple publications, a particular paragraph of text may appear on one page in one usage and another page in a second usage. The author proposes an optional <page> tag within blocks of text. Key to this schema is an attribute of the <page> tag is pubdata.event attribute, which links the particular block of text to a page identifier which in turn links to a specific publishing event listed in the header.

The third part of solving the page problem is to create a <start-page> element with the <pubdata> structure. For stories which start and end on the same page, the application needs only to register information in this tag and ignore writing any <page> tags within blocks.

A second publication issue is size. The NITF DTD does not contain a vehicle for indicating size of the document. This issue is important to editors and researchers alike. A size attribute has been proposed for the <docdata> group. Users can use a variety of size pairs to indicate word count, character count, page count, and/or length (depending on certain print parameters).

Workflow issues

NITF has a mechanism for tracking the delivery of the document in the <del-list> container. A requested NML tag is editor, which relates to work flow. For the same reason it is important to track the flow of document from delivery agent, it is important in newsrooms to track the editing history of a document through multiple systems. The solution the author proposes is a <editor-list> container within the <docdata> structure. The <editor-list> container can hold an unlimited number of editor names. Each name has a time-stamp attribute that can be used to track the flow of the document.

It has long been the request of photo librarians to be able to search through the photo assignment information to be able to track additional background information about the photo that may not have been reflected in any caption. While this could be considered bibliographic material, it's really different. The contact information contained in news assignments has similar value over time. The author proposed a single <assignment> tag in the document header. This tag has a <a.meta> attribute structure allowing free form, site-specific storage of assignment information.

In a formal sense, though, there is need for a mechanism to store bibliographic information. The author proposes a <bibliography> tag within the <body.end> framework. This tag can contain <block> information, which would allow the full range of formatting and content identification tags.

 

Element identification

In both the print markup process and archiving process, it is often important to identify blocks of text by type. Several common identifiers are "lead" and "summary" (also known as the nut graf, in the U.S.) Most online archives allow users the ability to search just the lead of a story, and often web publishing of menu or index pages uses the lead of the story. Summary paragraphs often have different markup, and are extracted from the story to build brief digests. In some cases, a document will have blocks of text which have been edited and released for publication, but are not printed in some media for space or other reasons. These blocks of texts can be identified as "no-run" blocks. Last, since we can identify some blocks as lead and summary, we have to have a way to identify blocks that aren't. MediaStream has a tag called "rest". That would be the default block tag.

A second textual element that is used during the markup process is the "first character" of a block of text. Many publications apply stylistic markup to this character. Others replace the character altogether with a graphic element. While style sheets handle many of these processes, there remains a requirement to tag this element. Consider it similar to the <em> - emphasis tag.

 

Archiving issues

Corrections to document can occur through out the life of a document from author to a final resting-place in the archive or beyond. The <correction> tag in NITF v 2.0 only provides for basic text as a Corrections attribute. Corrections are richer than that and often contain important content that is of great value at a later date. Often it is the <person> in the correction who is most likely to litigate on the basis of some earlier error. Corrections should have all the potential attributes that they would have as part of the body of the original document. They should also carry a date attribute that could indicate an after-the-fact correction.

As editors assemble multiple documents into a news package, there is a need to link those documents together for later retrieval. While the <series> tag works for certain serial documents, it is not general enough to link together a wide variety of package components. The author proposes a <package> tag within the <docdata> component to allow for a package name and a thread number to indicate the relationship of the various components of the package with each other. The thread ID could be a serial number or a document ID number.

 

 

NML / NITF mapping

The following table illustrates how the NML tags can be expressed as NITF tags. The APX column refers to the appendix.

Implementation of NML into NITF

NML field

Apx

NITF markup

Abstract

<BLOCK.HEAD><ABSTRACT>

Accession Number

<DOCDATA><DOC-ID REGSRC="private" id-string="accession"/>

Assignment

1

<ASSIGNMENT><a.meta>

Bibliography

2

<BODY.END><BIBLIOGRAPHY>

Byline/Author

<BODY.HEAD><BYLINE.PERSON><BYLINE.BYTTL>

Column Name

<HEAD><DOCDATA><FIXTURE FIX-ID="column name"/>

Company

<ORG>

Contact (press release)

<DISTRIBUTOR>

Copyright

<BODY.CONTENT><COPYRIGHT.HOLDER><COPYRIGHT.YEAR>

Correction

3

<HEAD><DOCDATA><CORRECTION><BLOCK><DATE.RELEASE>

Country

<LOCATION><COUNTRY>

Credit

<BODY.HEAD><BYLINE.BYTAG> or <DISTRIBUTOR>

Date - Advance

<HEAD.DOCDATA><DATE.RELEASED>

Date Authored

7

<HEAD><PUBDATA><EDITION value="edition"/>

Editor

6,25

<EDITOR-LIST><EDITOR NAME="name" NORM="timestamp"/>

Editor's Notes (non-published)

<HEAD><DOCDATA><NOTE.TYPE><ED-MSG="editors note"/>

Editor's Notes (published)

<BODY><BODY.CONTENT><NOTE noteclass="editorsnote">

First Character of Text

8

<BODY><BODY.CONTENT><BLOCK><FIRST-CHAR>

Front Page

4,9

<HEAD><PUBDATA><front-page value="yes"/>

Geographic

<DOC-SCOPE scope="geographic area"/>

Headlines

<BODY.HEAD><HEDLINE><HL1><HL2><H1>-<H8>

Illustration Caption

<BLOCK.CONTENT><PHOTO><CAPTION>

Illustration Creator

<BLOCK.CONTENT><PHOTO><PRODUCER>

Industry

23

Lead

11

<BODY><BODY.CONTENT><BLOCK blocktype="lead">

Length/Word Count

12,25

<HEAD><DOCDATA><SIZE measure="words" val="size">

Memo

<BODY><BODYHEAD><NOTE.TYPE><NOTE.NOTECLASS=editorsnote>

Nut Graf / Summary

13

<BODY><BODY.CONTENT><BLOCK blocktype="summary">

Organization

<ORG>

Package ID

14,25

<HEAD><DOCDATA><PACKAGE NAME="package name" thread="thread id">

Page

15,16

<BLOCK><PAGE pubdata-element="1" value="2A"/>

Person

<PERSON><NAME.CONTENT><NAME.GIVEN><NAME.FAMILY><FUNCTION>

Poster Heads/Decks

<BODY.HEAD><HEDLINE><H1>- <H8>

Priority

<HEAD><DOCDATA><URGENCY ed-urg=1-9>

Product

22

<ORG><ORGID idsrc="PRODUCT" value="name"/>

Publication

<BODY.HEAD><DISTRIBUTOR.ORGID.VALUE="publication"/>

Publication title, by event

17

<HEAD><PUBDATA><pub-title="publication title"/>

Publication Number

4,18

<HEAD><PUBDATA><publication-number="number"/>

Pull Quotes

<BODY><BODY.CONTENT><BQ><heading,block,credit>

Region

SIC code

23

<ORG><ORGID idsrc="SIC" value="symbol"/>

Slug

<HEAD><DOC-ID ID-STRING="slug"/>

Source

<BODY.HEAD><DISTRIBUTOR>

State

<LOCATION><STATE>

Statistical Code

23

<ORG><ORGID idsrc="STATISTIC" value="code"/>

Story Type

<HEAD><TOBJECT.PROPERTY PROPERTYLIST="type">

SubHeads

<BODY.HEAD><HEDLINE><H1>- <H8>

Subject

<HEAD><DOCDATA><TOBJECT.SUBJECTLIST>

Text

<BODY.CONTENT>

Text that didn't run

21

<BODY><BODY.CONTENT><BLOCK blocktype="no-run"/>

Thread ID

14,25

<HEAD><DOCDATA><PACKAGE NAME="package name" thread="thread id"/>

Ticker Symbol

23

<ORG><ORGID idsrc="NYSE" value="symbol"/>

time stamp

<DOCDATA><DATE.ISSUE NORM=mm/dd/ccyy hh:mm:ss />

Version

<HEAD><DOCDATA><DU-KEY generation= part= version=/>

Volume

4,22

<HEAD><PUBDATA><volume value="volume"/>

Where

<BODY><HEAD><DATELINE><LOCATION>

Zones

4,24

<HEAD><PUBDATA><ZONE value="zone">?

 

 

 

Appendix

This appendix contains detailed descriptions of additions and changes to NITF v2.0 DTD to incorporate the proposed NML tag set.

1. <assignment> - assignment information

Container to hold source and assignment information related to the creation of the document.

Content model:

The element <assignment> contains zero or more <a.meta> elements.

Attributes:

None

Tag Source

NITF

Usage Example

<assignment><a.meta name="assignmentname" content="City Council Meeting"/>

XML Element and Attribute Declarations:

<!ELEMENT assignment (a.meta*)>

Parent:

head

<a.meta> assignment meta data

A tag to hold assignment meta information

Content model:

The <a.meta> element is defined as empty, meaning that it contains no content.

Attributes:

name, content

Tag Source

NITF

Usage Example

<a.meta name="assignmentname" content="City Council Meeting"/>

XML Element and Attribute Declarations:

<!ELEMENT a.meta EMPTY>

<!ATTRLIST a.meta

name NAME #IMPLIED

content CDATA #REQUIRED>

Parent:

assignment

2: <bibliography> - Bibliographic data

A method to include general bibliographical data that the author used in creating or researching a story.

Content Model:

The <bibliography> consists of one or more blocks of data.

Attributes:

None

Tag Source:

NITF

Usage Example:

<body.end><bibliography><block><h1>Anatomy of a Wire Story II/Data Transmission Guidelines</h1><p><org>Radio-Television News Directors Association<org></P></bibliography>

XML Element and Attribute Declarations:

<!ELEMENT bibliography (block+)>

Parent:

body.end

 

3. Correction information

While wire service data contains correction or clarification data along with the original story, news content corrections may appear at a later time. Corrections also can contain a number of data types, most notable is "person" . Further, some corrections can have multiple blocks of text.

The Correction element, which only has a "info" attribute in NITF version 2.0 should be expanded. The block element contains the necessary components for data types.

Content Model:

The <Correction> element contains parsed character data, <block> data and optionally a <date.release> tag.

Attributes:

None

Tag Source:

NITF

Usage Example:

<correction><p>The name Foo in the headline was misspelled. It should have been Food.</p><date.release norm="19990221"/> </correction>

XML Element and Attribute Declarations:

<!ELEMENT correction (#PCDATA | block* | date.release?)* >

Parent:

docdata

4. <pubdata> - general publication data

The initial release of NITF makes little provision for usage-specific distribution information, such as print publication parameters like date of publication, page, issue, volume, etc. This requires creation of a PubData structure within the <HEAD> of the document, similar to the <DOCDATA> area.

 

Content Model:

The <pubdata> element consists of a series of elements which provide the distribution meta data. It contains one event attribute, which would be a sequence of numbers. Each event attribute serves to group a number of pubdata elements.

Attributes

event

Tag Source:

NITF

Usage Example

<pubdata event="1"><issue issue="March 1999"/><volume volume=5/><date.published norm=19990225/><start-page value="2A"/><section value="sports"/><zone val="zone"/></pubdata>

XML Element and Attribute Declarations:

<!ELEMENT pubdata (issue | volume | start-page | publication-title | publication-number | front-page | date.published | section | zone | edition )*>

<!ATTLIST pubdata

event NMTOKEN "0" >

Parent:

head

5. <date.published> date document was published

This element contains the date and time when the information within a document is published. The information should be normalized to UTC. Attribute use is ISO 8601 based (YYYYMMDDThhmmssZ).

Content Model:

The <date.published> element is defined as empty, meaning that it contains no content.

Attributes:

norm

Tag Source:

NITF

Usage Example:

<date.published norm="19990223"/>

XML Element and Attribute Declarations:

<!ELEMENT date.published EMPTY>

<!ATTLIST date.published

norm CDATA #IMPLIED>

Parent:

pubdata

6: <editor-list> - Editor list

Container to hold a list of editors who have been associated with the document.

Content Model:

The element <editor-list> contains zero or more <editor> elements.

Attributes:

None

Tag Source:

NITF

Usage Example

<editor-list>

<editor name="john" norm="19990223 10:33:00"/>

<editor name="betsy" norm="19990223 10:35:00"/>

</editor-list>

XML Element and Attribute Declarations:

<!ELEMENT editor-list (editor*)>

Parent

docdata

 

<editor> - editor

Tag to hold the name of an editor, and a time stamp when the editor worked on the document

Content Model:

The <editor> element is defined as empty, meaning that it contains no content.

Attributes:

name, norm

Usage Example:

<editor-list>

<editor name="john" norm="19990223 10:33:00"/>

<editor name="betsy" norm="1999022310:35:00"/>

</editor-list>

XML Element and Attribute Declarations:

<!ELEMENT editor EMPTY>

<!ATTLIST editor

name CDATA #IMPLIED

norm CDATA #IMPLIED>

Parent

editor-list

7. <edition> edition of publication

Identification of a publication title related to an instance of publication

Content Model:

The <edition> element is defined as empty, meaning that it contains no content.

Attributes

value

Tag Source

NITF

Usage example:

<edition value="late city final"/>

XML Element and Attribute Declarations:

<!ELEMENT edition EMPTY>

<!ATTLIST edition

value CDATA #IMPLIED>

Parent

pubdata

 

8. <first-character> First character

A mechanism to identify the first character of a block of text. Often used to allow application of a specific style or graphical character to a text area.

Content Model:

#PCDATA - simple text composed of parsed character data

Attributes:

id ID #IMPLIED

class NMTOKENS #IMPLIED

style CDATA #IMPLIED

lang NMTOKEN #IMPLIED

dir (ltr | rtl) #IMPLIED

as contained in %attrs:

Tag Source:

NITF

Usage example:

<block><p><first-character>I</first-character>n the beginning, there was the tag.</p>

XML Element and Attribute Declarations:

<!ELEMENT first-character (#PCDATA)>

<!ATTLIST first-character

id ID #IMPLIED

class NMTOKENS #IMPLIED

style CDATA #IMPLIED

lang NMTOKEN #IMPLIED

dir (ltr | rtl) #IMPLIED >

 

Parent:

a, body, caption, credit, dt, fig.data, fn, h1, h2, h3, h4, h5, h6, h7, h8, hl1, hl2, note, p, q, tagline, td, th

 

9. <front-page> - front page of a publication

Identification of a publication issue related to the document. The front-page attribute indicates whether the document appeared on the front page of the publication

Content Model:

The <front-page> element is defined as empty, meaning that it contains no content.

Attributes

value

Tag Source:

NITF

Usage example

<front-page value="yes"/>

XML Element and Attribute Declarations:

<!ELEMENT front-page EMPTY>

<!ATTLIST front-page

value (yes | no) "no">

Parent:

pubdata

 

10. <Issue> - issue of publication

Identification of a publication issue related to the document.

Content Model:

The <issue> element is defined as empty, meaning that it contains no content.

Attributes

value

Tag Source:

NITF

Usage example

<issue value="March 1999"/>

XML Element and Attribute Declarations:

<!ELEMENT issue EMPTY>

<!ATTLIST issue

value CDATA #IMPLIED>

Parent:

pubdata

 

11. Lead (an attribute of a block)

Blocks of text can have certain attributes which have industry-specific meaning, such as lead and summary. The additional of a blocktype attribute to the block tag allows for content identification of type of blocks. Leads may stretch for several blocks.

Content Model:

The <block blocktype="lead"> identifies the block as a lead block of text.

Attributes

blocktype, %attrs;

Tag Source:

NITF

Usage example:

<block blocktype="lead"><p>This is the lead of a story</p>

XML Element and Attribute Declarations:

<!ELEMENT block ((%block.head;)?, (%block.content;)*, (%block.end;)?)>

<!ATTLIST block

blocktype (lead | summary | no-run | rest ) CDATA "rest"

%attrs;>

Parent:

body, bq, dd, FIG.data, fn, LI, note, td, th

 

12. <size> Size of document

Mechanism to provide measurement data of the document.

Content model:

The <size> element is defined as empty, meaning that it contains no content, only attributes.

Attributes:

measure, value

Tag Source:

NITF

Usage example:

<SIZE size.measure="words" size.value=345/>

XML Element and Attribute Declaration

<!ELEMENT size EMPTY>

<!ATTLIST size

size.measure CDATA #REQUIRED

size.value NMTOKEN #REQUIRED >

Parent:

docdata

 

13. summary (an attribute of a block)

Blocks of text can have certain attributes which have industry-specific meaning, such as lead and summary. The additional of a blocktype attribute to the block tag allows for content identification of type of blocks. Summaries may stretch for several blocks. Summaries in the U.S. are also called "Nut Graf".

Content Model:

The <block blocktype="summary"> identifies the block as a summary block of text.

Attributes

blocktype, %attrs;

Tag Source:

NITF

Usage example:

<block blocktype="summary"><p>This is the summary of a story</p>

XML Element and Attribute Declarations:

<!ELEMENT block ((%block.head;)?, (%block.content;)*, (%block.end;)?)>

<!ATTLIST block

blocktype (lead | summary | no-run | rest ) CDATA "rest"

%attrs;>

Parent:

body, bq, dd, FIG.data, fn, LI, note, td, th

 

14. <package> - Package of documents

Mechanism for linking a group of documents together, but documents which would not be considered to be part of a series. A package would have a name, and a thread element which indicates a sequence within the package.

Content Model:

The <package> element is defined as empty, meaning that it has no content.

Attributes:

name, thread

Tag Source:

NITF

Usage Example:

<package name="hurricane coverage" thread="12">

XML Element and Attribute Declarations:

<!ELEMENT package EMPTY>

<!ATTLIST package

name CDATA #IMPLIED

thread CDATA #IMPLIED>

Parent:

docdata

15. <page> - the page of the publication the document appeared in

Mechanism to attach a page attribute to blocks of textual data. <page> links to the <pubdata.element="n"> attribute so that blocks of text can have a page attribute which links to a specific publication instance. A block could have multiple page attributes, relating to multiple publication instances.

Content model:

The <page> element is defined as empty, meaning that it contains no content, only attributes. It can occur multiple times within a block

Attributes:

pubdata.element, value

Tag Source:

NITF

Usage Example:

<head><pubdata element="1"><volume value="22"/><start-page value="2B"/></pubdata>

<block><page pubdata.element="1" value="3B"/><p>This is some more text

XML Element and Attribute Declarations:

<!ELEMENT page EMPTY>

<!ATTRLIST page

pubdata-element NMTOKEN #REQUIRED

value CDATA #REQUIRED>

Parent:

block

 

16. <start-page> the starting page of publication

Identification of the starting page that a document appeared on in print form.

Content Model

The <start-page> element is defined as empty, meaning that it contains no content, only attributes.

Attributes:

value

Tag Source:

NITF

Usage Example:

<Start-page value="2B"/>

XML Element and Attribute Declarations

<!ELEMENT start-page EMPTY>

<!ATTLIST start-page

value CDATA #REQUIRED>

Parent

pubdata

17: <publication-title> - the title of the publication

Identification of a publication title related to an instance of publication

Content Model:

The <publication-title> element is defined as empty, meaning that it contains no content.

Attributes

value

Tag Source:

NITF

Usage example

<publication-title value="The Salt Lake Tribune"/>

XML Element and Attribute Declarations:

<!ELEMENT publication-title EMPTY>

<!ATTLIST publication-title

value CDATA #IMPLIED>

Parent:

pubdata

18: <publication-number> - the number of the publication

Identification of a publication number related to an instance of publication

Content Model:

The <publication-number> element is defined as empty, meaning that it contains no content.

Attributes

value

Tag Source:

NITF

Usage example

<publication-number value="22"/>

XML Element and Attribute Declarations:

<!ELEMENT publication-number EMPTY>

<!ATTLIST publication-number

value CDATA #IMPLIED>

Parent:

pubdata

 

19. rest (an attribute of a block)

Blocks of text can have certain attributes which have industry-specific meaning, such as lead and summary. The additional of a blocktype attribute to the block tag allows for content identification of type of blocks. The rest of the story would be blocks which do not have a lead, summary or not-run attribute. "Rest" would be the default of the blocktype attribute.

Content Model:

The <block blocktype="rest"> identifies the block as a rest block of text.

Attributes

blocktype, %attrs;

Tag Source:

NITF

Usage example:

<block blocktype="rest"><p>This is just a paragraph of a story</p>

XML Element and Attribute Declarations:

<!ELEMENT block ((%block.head;)?, (%block.content;)*, (%block.end;)?)>

<!ATTLIST block

blocktype (lead | summary | no-run | rest ) CDATA "rest"

%attrs;>

Parent:

body, bq, dd, FIG.data, fn, LI, note, td, th

 

20. <section> - the section of the publication the document appeared in

Identification of a publication issue related to the document. Sections relate to a physical or logical grouping of stories within a news product.

Content Model:

The <section> element is defined as empty, meaning that it contains no content.

Attributes

value

Tag Source:

NITF

Usage example

<section value="sports"/>

XML Element and Attribute Declarations:

<!ELEMENT section EMPTY>

<!ATTLIST section

value CDATA #IMPLIED>

Parent:

pubdata

 

21. no-run (an attribute of a block)

Blocks of text can have certain attributes which have industry-specific meaning, such as lead and summary. The additional of a blocktype attribute to the block tag allows for content identification of type of blocks. Some blocks of text are not run in certain instances, but should remain within the document.

Content Model:

The <block blocktype="no-run"> identifies the block as a no-run block of text.

Attributes

blocktype, %attrs;

Tag Source:

NITF

Usage example:

<block blocktype="no-run"><p>This is extra information that you can run if you have room</p>

XML Element and Attribute Declarations:

<!ELEMENT block ((%block.head;)?, (%block.content;)*, (%block.end;)?)>

<!ATTLIST block

blocktype (lead | summary | no-run | rest ) CDATA "rest"

%attrs;>

Parent:

body, bq, dd, FIG.data, fn, LI, note, td, th

22. <volume> - publication volume

Identification of a publication issue related to the document.

Content Model:

The <volume> element is defined as empty, meaning that it contains no content.

Attributes

volume

Tag Source:

NITF

Usage example

<volume value="55"/>

XML Element and Attribute Declarations:

<!ELEMENT volume EMPTY>

<!ATTLIST volume

value CDATA #IMPLIED>

Parent:

pubdata

23. <orgid> - organization identifier

Usage notes:

The IDSRC attribute is used to identify certain standard organization types. These include SIC, STATISTIC, PRODUCT, ISSC, NAICS

 

24. <zone> - zone of distribution of publication

Identification of a publication issue related to the document. A zone would be a regional distribution of a publication.

Content Model:

The <zone> element is defined as empty, meaning that it contains no content.

Attributes

value

Tag Source:

NITF

Usage example

<issue value="city,surban"/>

XML Element and Attribute Declarations:

<!ELEMENT zone EMPTY>

<!ATTLIST zone

value CDATA #IMPLIED>

Parent:

pubdata

 

 

25. <docdata> - General document data

A number of additions to the <docdata> structure require additions to the ELEMENT declaration.

XML Element and Attribute Declarations:

<!ELEMENT docdata (envloc | doc-id | del-list | urgency | fixture | date.issue | date.release | date.expire | doc.scope | series | ed-msg | du-key | doc.copyright | doc.rights | key-list | correction | size | package )*>

 


 

About the author

The author has worked 25 years in the newspaper industry in a variety of roles. He has been a photojournalist, reporter, production systems editor, production systems manager, manager of information systems, systems designer and newspaper research director.

He has spent the last 10 years as Director of Tribune Solutions, a research and development department of The Salt Lake Tribune. Tribune Solutions produces several commercial newspaper software applications, including NewsView, a text archiving system, PhotoView, an image archiving system, and Connections32, a digital asset management system. He is the chief architect of all those products, and has designed over 150 conversion filters for nearly every production system in use by the industry today, as well as every on-line vendor. Those products are used by more than 100 publications in 8 countries.

The NewsView product line has been sold and marketed for the last 8 years under the umbrella of several Reed-Elsevier companies, including Lexis-Nexis and Reed Technology and Information Services Inc. The Salt Lake Tribune is a wholly owned subsidiary of Tele Communications Inc, soon to be a wholly owned subsidiary of AT&T.

The author is a frequent speaker at industry gatherings and has conducted seminars on archiving and news production issues at the University of Missouri School of Journalism and Rhodes University, Grahamstown, South Africa.