This document records all known errors in the Extensible Markup Language (XML) 1.0 Specification (W3C Recommendation 10 Feb 1998); for updates see the latest version.
The errata are numbered, classified as Substantial, Editorial or Clarification and listed in reverse chronological order of their date of publication. Early errata (1999-02-17 and before) are neither numbered, classified nor dated.
Please email error reports to xml-editor@w3.org.
[6] Names ::= Name (#x20 Name)*
"[8] Nmtokens ::= Nmtoken (#x20 Nmtoken)*
"
Note that if the unnormalized attribute value contains a character reference to a whitespace character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a whitespace character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a whitespace character; being recursively processed, the whitespace character is replaced with a space character (#x20) in the normalized value.
There may be any number of Subcode segments; if the Langcode is an ISO639Code, and if the first subcode segment exists and consists of two letters, then it must be a country code from [ISO 3166], "Codes for the representation of names of countries."
In an encoding declaration, the values "UTF-8
",
"UTF-16
", "ISO-10646-UCS-2
", and
"ISO-10646-UCS-4
" should be used for the various encodings and
transformations of Unicode / ISO/IEC 10646, the values
"ISO-8859-1
", "ISO-8859-2
",
... "ISO-8859-9
" should be used for the parts of ISO 8859, and the
values "ISO-2022-JP
", "Shift_JIS
", and
"EUC-JP
" should be used for the various encoded forms of JIS
X-0208-1997. It is recommended that character encodings registered (as
charsets) with the Internet Assigned Numbers Authority [IANA], other than those just
listed, should be referred to using their registered names; other encodings
should use names starting with an "x-" prefix. XML processors should match
character encoding names in a case-insensitive way and should either interpret
an IANA-registered name as the encoding registered at IANA for that name or
treat it as unknown (processors are of course not required to support all
IANA-registered encodings).
In the column headed "Character", "Not recognized" hyperlinks to "#not recognized" instead of "#not-recognized" (missing dash). In the columns headed " Internal General" and "External Parsed General", "Forbidden" hyperlinks to "#not-recognized" instead of "#forbidden".
choice ::= '(' S? cp ( S? '|' S? cp )* S? ')'
"choice ::= '(' S? cp ( S? '|' S? cp )+ S? ')'
"The second possible case occurs when the XML entity is accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML. In particular, please refer to [IETF RFC2376] "XML Media Types" which defines the text/xml and application/xml MIME types and provides some useful guidance. In the interests of interoperability, however, the following rule is recommended.
With a Byte Order Mark: 00 00 FE FF: UCS-4, big-endian machine (1234 order) FF FE 00 00: UCS-4, little-endian machine (4321 order) FE FF 00 ##: UTF-16, big-endian FF FE ## 00: UTF-16, little-endian EF BB BF: UTF-8 Without a Byte Order Mark: 00 00 00 3C: UCS-4, big-endian machine (1234 order) 3C 00 00 00: UCS-4, little-endian machine (4321 order) 00 00 3C 00: UCS-4, unusual octet order (2143) 00 3C 00 00: UCS-4, unusual octet order (3412) 00 3C ## ##, 00 25 ## ##, 00 20 ## ##, 00 09 ## ##, 00 0D ## ## or 00 0A ## ##: Big-endian UTF-16 or ISO-10646-UCS-2. Note that, absent an encoding declaration, these cases are strictly speaking in error. 3C 00 ## ##, 25 00 ## ##, 20 00 ## ##, 09 00 ## ##, 0D 00 ## ## or 0A 00 ## ##: Little-endian UTF-16 or ISO-10646-UCS-2. Note that, absent an encoding declaration, these cases are strictly speaking in error. 3C 3F 78 6D: UTF-8, ISO 646, ASCII, some part of ISO 8859, Shift-JIS, EUC, or any other 7-bit, 8-bit, or mixed-width encoding which ensures that the characters of ASCII have their normal positions, width, and values; the actual encoding declaration must be read to detect which of these applies, but since all of these encodings use the same bit patterns for the ASCII characters, the encoding declaration itself may be read reliably 4C 6F A7 94: EBCDIC (in some flavor; the full encoding declaration must be read to tell which code page is in use) other: UTF-8 without an encoding declaration, or else the data stream is corrupt, fragmentary, or enclosed in a wrapper of some kind
Add the following to the second paragraph after the list (this also takes care of the previous erratum on UTF-7): "Note: Since external parsed entities in UTF-16 may begin with any character, this autodetection does not always work. Also, because of the overloaded usage it makes of ASCII-valued bytes, the UTF-7 encoding may fail to be reliably detected."
standalone='yes'
", they must not process entity
declarations or attribute-list declarations encountered after a
reference to a parameter entity that is not read, since the entity may
have contained overriding declarations."standalone='yes'
"', there
is no guarantee that making a document standalone will cause all XML processors
to reports the same results to the application.--->
'. The
following example is not well-formed." and an
example: "<!-- B+, B, or B--->
"Before the value of an attribute is passed to the application or checked for validity, but after the end-of-line normalization described in section 2.11 has been performed, the XML processor must normalize the attribute value as follows:
If the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.
"Validity Constraint: Unique Notation Name: only one notation declaration can declare a given Name."
"For interoperability, if a parameter-entity reference appears in a choice, seq, or Mixed construct, its replacement text should not be empty, and neither the first nor last non-blank character of the replacement text should be a connector (| or ,)."
to
"For interoperability, if a parameter-entity reference appears in a choice, seq, or Mixed construct, its replacement text must contain at least one non-blank character, and neither the first nor last non-blank character of the replacement text should be a connector (| or ,)."