[Mirrored from: http://www.ile.com/opentag/otspecs.htm, Feb-28-1997]
OpenTag Format Specifications | |
Draft version 3 - Feb-28-1997 |
OpenTag is not a finalized format, it is just a proposed base the members of the consortium will modify and build upon, taking in account the various experience the makers of existing extracted text formats have accumulated.
This document describes the current OpenTag format element by element. You can find more information on how to support the format in the OpenTag Implementation Notes document. You can also refer to the EBNF grammar and download other useful reference material, such as the SGML DTD for OpenTag, from the Resources page. If you need any additional information or for any questions that you may have, please send an e-mail to opentag@ile.com.
OpenTag is XML/SGML compliant. The markup rules of an OpenTag file follow the SGML/XML rules. You can find the XML working draft specifications wd-xml-961114 at http://www.textuality.com/sgml-erb/WD-xml.html and http://www.w3.org/pub/WWW/TR/.
The notion of a paragraph in an OpenTag file does not necessarily corresponds to the same definition of a paragraph in the original file. For example, an OpenTag paragraph generated from RTF can originate from a \row element and is not restricted to originate only from the \par element.
OpenTag uses character entities to specify the characters that do not exist in the default code set of the file. A character entity is composed of the prefix character '&', followed by the SGML name of the character, or its Unicode value (in hexadecimal, with a leading '#' and a trailing ';').
Example: <P id=1/> Lowercase "a grave" can be coded à or �e0;
The header is composed of the XML prologue that sets the default values for the code set and the locale of the file.
<?XML version='1.0' type='OpenTag1' encoding='ISO8859-1' locale='FR-FR' original='MIF4' ?>
Attribute | Description |
VERSION | XML version, currently 1.0 |
TYPE | OpenTag version, currently OpenTag1 |
ENCODING | Code set of the file. The value
should correspond to one of the code set names described in the
OpenTag Standard Code Set Names document. In Level 1 implementations, some other code sets can be used for specific elements, the code set defined here is then used as the default one. |
LOCALE | Locale of the file. The value should
correspond to one of the locale names described in the OpenTag
Standard Locale Names document. In Level 1 implementations, some other locales can be used for specific elements, the locale defined here is then used as the default one. |
ORIGINAL | Original format. Specifies the format and version of the file from which this OpenTag file has been extracted. The value of this attribute is not standardized. |
The body of an OpenTag file or a data stream coded in OpenTag is enclosed by the <OPENTAG> element.
The following table lists all the formatting elements defined in OpenTag:
Tag | Description | Attributes |
B <B id=1>...</B> |
Bold. Indicates bold text. | -- ID=Code ID Optional. Each code ID must be unique within the paragraph. |
I <I id=1>...</I> |
Italics. Indicates italicized text. | -- ID=Code ID Optional. Each code ID must be unique within the paragraph. |
U <U id=1>...</U> |
Underline. Indicates underlined text. | -- ID=Code ID Optional. Each code ID must be unique within the paragraph. |
D <D id=1>...</D> |
Double-underline. Indicates doubled-underlined text. | -- ID=Code ID Optional. Each code ID must be unique within the paragraph. |
G <G id=1>...</G> |
Generic group. Indicates a group of arbitrary codes. Should be moveable within the paragraph. | -- ID=Code ID Mandatory. Each reference ID must be unique within the paragraph. |
P <P id=1/> <P id=1 s="Body"/> <P id=1 cs=EUC-JA/> <P id=1 lc=JA-JP cs=EUC-JA/> <P id=1 seg=23/> |
Paragraph marker. Indicates the start of a paragraph. | -- ID=Paragraph ID Mandatory. Each paragraph must have an ID unique within the file, and the IDs must be sequential. -- S="name" Optional. Specifies the style name for the paragraph. -- SEG=Segment ID Optional. Specifies the segment ID for the paragraph. To use when the paragraph corresponds to one segment. If this attribute is used no <S> tags are allowed in the paragraph. -- CS=Codeset Optional, Level 1 only. Specifies the code set for the paragraph. The default code set is specified in the file header. -- LC=Locale Optional, Level 1 only. Specifies the locale for the paragraph. The default locale is specified in the file header. -- COND=Condition name Optional, Specifies the condition for that paragraph. |
PB <PB id=1/> <PB id=1 seg=45/> |
Paragraph break. Indicates a paragraph break inside a definition group. This tag must be used only inside a definition group. | -- ID=Code ID Optional. Each code ID must be unique within the paragraph. -- SEG=Segment ID Optional. Specifies the segment ID for the paragraph. Use it when the paragraph corresponds to one segment. If this attribute is used no <S> tags are allowed until the next paragraph marker or paragraph break. |
X <X id=1/> |
Generic code. | -- ID=Code ID Mandatory. Each reference ID must be unique within the paragraph. |
CELL <CELL id=1/> |
Cell break. Indicates the start of a new cell in a table. | -- ID=Code ID Optional. Each code ID must be unique within the paragraph. |
CB <CB id=1/> |
Column break. Indicates the start of a new column. | -- ID=Code ID Optional. Each code ID must be unique within the paragraph. |
OB <OB id=1/> |
Object | -- ID=Code ID Mandatory. Each code ID must be unique within the paragraph. |
OPENTAG <OPENTAG>...</OPENTAG> |
OpenTag group. Indicates the body of an OpenTag file. | None |
FN <FN id=1/> |
Footnote marker. Indicates an inserted footnote. The text of the footnote itself is in the corresponding <FND> tag. | -- ID=Footnote ID Mandatory. Each footnote ID must be unique within the file. |
IX <IX id=1/> |
Index marker. Indicates an index position. The text of the index itself is in the corresponding <IXD> tag. | -- ID=Index ID Mandatory. Each index ID must be unique within the file. |
RF <RF id=1/> |
Reference marker. Indicates a reference. The definition of the reference itself is in the corresponding <RFD> tag. | -- ID=Reference ID Mandatory. Each reference ID must be unique within the file. |
FND <FND id=1 seg=1>...</FND> |
Footnote definition. The real position of the footnote in the text is indicated by the corresponding <FN> tag. | -- ID=Footnote ID Mandatory. Each footnote ID must be unique within the file. -- SEG=Segment ID Optional. Specifies the segment ID for the footnote definition. |
IXD <IXD id=1>...</IXD> |
Index definition. The real position of the index in the text is indicated by the corresponding <IX> tag. | -- ID=Index ID Mandatory. Each index ID must be unique within the file. |
RFD <RFD id=1 seg=1>...</RFD> |
Reference definition. The real position of the reference in the text is indicated by the corresponding <RF> tag. | -- ID=Reference ID Mandatory. Each reference ID must be unique within the file. -- SEG=Segment ID Optional. Specifies the segment ID for the footnote definition. |
OC <OC cs=EUC-JA>...</OC> |
Original code set. Indicates a text which, in the original document, is in a code set different from the default original document code set. The text in the OpenTag file is in the default code set of the OpenTag file. | -- CS=Code set name Mandatory. The value should correspond to one of the code set names described in the OpenTag Standard Code Set Names document. -- LC=Locale name Optional. The value should correspond to one of the locale names described in the OpenTag Standard Locale Names document. -- ID=Code ID Optional. Each code ID must be unique within the paragraph. |
CS <CS cs=EUC-JA>...</CS> |
Code set. Level 1 only. Indicates text in a code set different from the default file code set. | -- CS=Code set name Mandatory. The value should correspond to one of the code set names described in the OpenTag Standard Code Set Names document. -- LC=Locale name Optional. The value should correspond to one of the locale names described in the OpenTag Standard Locale Names document. -- ID=Code ID Optional. Each code ID must be unique within the paragraph. |
PR <PR id=1>...</PR> |
Protected text. Indicates a text that should not be translated. | -- ID=Code ID Optional. Each code ID must be unique within the paragraph. |
CT <CT cond="doc" id=1>...</CT> |
Conditional text. Indicates a text that is marked as conditional in the original file. | -- COND=Condition name Mandatory. Specifies the condition. -- ID=Code ID Optional. Each code ID must be unique within the paragraph. |
LVL <LVL id=1 seg=1>...</LVL> |
Index level. Indicates the different entry levels in an index definition (<IXD> element). | -- ID=Code ID Optional. Each code ID must be unique within the paragraph. -- SEG=Segment ID Optional. Specifies the segment ID for the level. |
SO <SO id=1 seg=1>...</SO> |
Sort order. Indicates the text that should be used to sort an index entry. You can use the <SO> element only inside a <LVL> group. | -- ID=Code ID Optional. Each code ID must be unique within the paragraph. -- SEG=Segment ID Optional. Specifies the segment ID for the level. |
The following table lists all the property elements defined in OpenTag. Property elements are used to process information. They are not used for merging back the text in its original form.
Tag | Description | Attributes |
S <S id=1/> |
Segment. Delimits a segment (a sentence) | -- ID=Segment ID Mandatory. Each segment ID must be unique within the file. |
TERM <TERM>...</TERM> |
PROPOSED ONLY - Term. Delimits a terminological unit. For example, this tag could be used to mark expressions that have been found in a glossary, and should be formatted accordingly in the translator interface. | None |