SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors
NEWS
Cover Stories
Articles & Papers
Press Releases
CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG
TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps
EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
|
SGML. Duplicate tokens in an attribute definition list |
Why are duplicate tokens in an attribute definition list forbidden in ISO 8879:1986 SGML?
Question:
Can I use the following (attribute definition list declaration)?
<!ELEMENT para - - (#PCDATA) >
<!ATTLIST para reviewed (yes | no) no
private (yes | no) yes >
Answer: No.
Who says? See Clause 11.3.3 of the Standard: "A token cannot occur more than once in an attribute definition list, even in different groups."
The design of SGML on this point reflects a decision by the ISO committee to make allowance for "mimimizing" (hand) markup for SGML attributes: in qualifying cases, the name of the attribute may be omitted from markup if the value is supplied. This minimization rule is found in Clause 7.9.1.2 of the Standard (page 330 of the SGML Handbook):
If "SHORTTAG YES" is specified on the SGML declaration, the name and vi can be omitted if the attribute value specification is an undelimited name token that is a member of a group specified in the declared value for that attribute. NOTE - A name token can occur in only one group in an attribute definition list (see 11.3 3)."
According to the minimization rule of 11.3.3, and the Standard's particular interpretation of attributes and attribute values, the following markup would be judged ambiguous:
<para yes>[some text]</para>
A justification for the ISO committee's design is offered by Charles F. Goldfarb in the SGML Handbook, page 424, as follows: "The requirement in the following paragraph [viz., see clause 11.3.3 cited above] supports the omitted name markup minimization described in 7.9.1.2. It also represents another compromise between humans and computers. While some unambiguous rule could no doubt be created that would allow unminimized specification of an attribute where the tokens are duplicated between attributes, such a rule would not be in the best interests of the user. The intention is that these name tokens be self-describing attribute values. They should imply the attribute name as well as the value. When properly designed, they should be meaningless as values of a different attribute. (I might add, in general, that markup minimization is an appropriate area for making such compromises in favor of people, since markup minimization is there principally to support direct human markup. A computer program would be happier with a fully marked up document and no need to check for minimization.)"
As the following discussion illustrates, not all SGML users are satisfied with the design and its justification. Personaly, I do not find the argumentation entirely persuasive. Numerous requests have been received by the review committee for revision of this rule.
Examples and Discussion
The good news (Summer 1997) is that the ISO 8879 design motivated by concerns for minimization is now officially approved for revision. See the posting from Steve Pepper.
The following text contains copies of some postings sent to CTS (Usenet News comp.text.sgml) -- examples of users wanting to use a token more than once (and being perplexed by the restrictions imposed by the Standard), together with explanations written to these users. A couple postings discuss possible solutions to this problem, perhaps to take effect in a revision to the Standard. Your further contributions are welcome.
---------------------------------------------------------------------------
From: kevan@redted.demon.co.uk (Kevan)
Newsgroups: comp.text.sgml
Subject: Attributes question
Date: 26 Aug 1996 12:50:50 GMT
Organization: Home
Lines: 37
Hi,
I have hit a stumbling block while adding an element to my computer
collection DTD. The background is that I want to be able to record
all the cartridges I have for a particular computer or game system.
So I have added the following to my DTD:
<!entity % yesno "yes | no">
<!element cartridge - O (#PCDATA)>
<!attlist cartridge manufacturer CDATA "-"
name CDATA #REQUIRED
partno CDATA "-"
box (%yesno) "no"
manual (%yesno) "no">
With the hope that I will be able to do something like this:
<cartridge manufacturer="Atari" name="Bug Hunt"
partno="RX8087" box="yes" manual="no">
The problem is that nsgmls gives me the following error:
catalogue.dtd:112:19:E: token `YES' occurs more than once in attribute definition list
catalogue.dtd:112:19:E: token `NO' occurs more than once in attribute definition list
If I change box and manual to CDATA then everything works just fine,
but I would like to be able to restrict them to just yes/no values.
Is there any way of doing this?
Many thanks in advance,
--
Kevan
Old Computer Collector <http://www.ftech.net/~kevan/collection/>
--------------------------------------------------------------------------
From: cmsmcq@tigger.cc.uic.edu (C M Sperberg-McQueen)
Newsgroups: comp.text.sgml
Subject: Re: Attributes question
Date: 26 Aug 1996 15:34:16 GMT
Organization: University of Illinois at Chicago
Lines: 51
Kevan (kevan@redted.demon.co.uk) wrote:
: ... I want to be able to record
: all the cartridges I have for a particular computer or game system.
: So I have added the following to my DTD:
:
: <!entity % yesno "yes | no">
:
: <!element cartridge - O (#PCDATA)>
:
: <!attlist cartridge manufacturer CDATA "-"
: name CDATA #REQUIRED
: partno CDATA "-"
: box (%yesno) "no"
: manual (%yesno) "no">
:
: ...
: If I change box and manual to CDATA then everything works just fine,
: but I would like to be able to restrict them to just yes/no values.
: Is there any way of doing this?
There's no way to restrict them to the literal values yes and no; you
can, however, do this:
<!ATTLIST cartridge
manufacturer CDATA "-"
name CDATA #REQUIRED
partno CDATA "-"
box (box | nobox) nobox
manual (manual | nomanual) nomanual >
The reason for the restriction is to ensure that you can say (if
SHORTTAG is YES) something like this:
<cartridge manufacturer="Atari" name="Bug Hunt"
partno="RX8087" box nomanual>
Since the members of enumerated sets don't overlap, omitting the
attribute name and equals sign is unambiguous.
The view is sometimes taken that your way would be simpler, in the
long run, and would allow the declared values of attributes to
represent a more useful set of data types -- which are, after all,
nothing but descriptions of sets of values which can be taken by
variables or attributes. It's a view I happen to share, but for now,
8879 takes the other view.
Perhaps in the revision.
-C. M. Sperberg-McQueen
tei@uic.edu
University of Illinois at Chicago
--------------------------------------------------------------------------
Subject:Reply: [Q] name tokens for attribute values
Submitted to: COMP.TEXT.SGML
Submitted by: Erik Naggum (erik@naggum.no )
Date Of Submission: 15 Jul 1995 17:08:54 GMT
Organization: Naggum Software; +47 2295 0313
Newsgroups: comp.text.sgml
Reference: Jay Wood
[Jay Wood]
| There's a problem with the following, but I can't seem to find anything
| in the standard that forbids it. If you want to have an element with
| attributes like
|
| <!ELEMENT para - - (#PCDATA)>
| <!ATTLIST para level (1|2|3|4|5) #REQUIRED
| indent (1|2|3|4|5) #REQUIRED>
|
| you should be able to, right?
the operating keyword is "should". however, you can't. yet.
| The text at the top of page 330 of the _Handbook_ states that it is
| forbidden to have duplicate name tokens as valid attribute values in a
| single ATTLIST...
the definitive word on this is found at [424:12]. page 424 also has a
rationale for this annoying restriction that is not quite on the mark.
unfortunately, when minimization is not used, these arguments still apply,
and that's just plain bad design, as you have discovered. it is hoped that
this restriction will be lifted in the revised version. the statement
under contention is Goldfarb's comment: "While some unambiguous rule could
no doubt be created that would allow unminimized specification of an
attribute where the tokens are duplicated between attributes, such a rule
would not be in teh best interests of the user." many disagree with this
statement, and the sentences following it that explain a specific design
philosophy for attributes and values that should have been open to user
choice, instead of being forced upon them. SGML tries to stay out of the
design business everywhere else, so there's no reason it should mandate a
particular usage in this case.
#<Erik 3014816934>
--
NETSCAPISM /net-'sca-,pi-z*m/ n (1995): habitual diversion of the mind to
purely imaginative activity or entertainment as an escape from the
realization that the Internet was built by and for someone else.
-----------------------------------------------------------
Subject: Re: Same attribute values for two different attributes
Date: 1 Mar 1996 16:21:48 GMT
From: <a href="mailto:lprice@ix.netcom.com">lprice@ix.netcom.com(Lynne Price)</a><br>
Newsgroups: <a href="/news-bin/wwwnews?comp.text.sgml">comp.text.sgml</a><br>
-------------------------------------
Aren't any of the regulars going to answer this?
In <31366D7C.628E@po.cisnet.or.jp> Akihiko Shigemoto
<akihiko@po.cisnet.or.jp> writes:
..
>The SGML book I had consulted says that it is possible to
>have same attribute values for different attributes.
>But validating a document with nsgmls 1.0.1 complains that
>there are attribute values difined more than once. And the
>SGML document instance is supposed to be invalid.
...
>
>THE SAMPLE FILE "TEST.SGM" USED WITH NSGMLS:
>--------------------------------------------
>
><!DOCTYPE test
>[
><!ELEMENT test o o (para) >
><!ELEMENT para - - (level1,(level2)+) >
><!ATTLIST para level1 (Roman|Bullet|Alpha|Normal) Normal
> level2 (Roman|Bullet|Alpha|Normal) Bullet
...
A name token cannot appear in more than one declared value in
the same attribute definition list. As a result, the same attribute
value can be used for different attributes, but not if the
declared values are name token groups. Thus
<!ATTLIST para level1 CDATA Normal
level2 CDATA Normal
>
is OK, but the original example is not.
(The motivation for this unfortunate restriction was to enable
parsing of minimized start-tags such as
<para Normal>
in which the attribute name is omitted. This minimization is
only permitted if the declared value is a group. Forcing the values
to be unique means that the attribute name is uniquely determined
from the value. There's no need to explain that there are other
solutions, WG8 has received numerous requests to ease the restriction.)
--Lynne
--------------------------------------------------------------------------
Subject: Re: Same attribute values for two different attributes
Date: Fri, 01 Mar 1996 10:01:17 -0900
From: "W. Eliot Kimber" <kimber@passage.com>
Newsgroups: comp.text.sgml
References: <31366D7C.628E@po.cisnet.or.jp>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
===========================
Akihiko Shigemoto wrote:
>
> This is a reposting of my problem to this group.
> I was told that the earlier posting was garbled by my
> Netscape2.0's news reader. I hope this will be fine this
> time.
>
> HERE IS THE ORIGINAL POSTING:
>
> The SGML book I had consulted says that it is possible to
> have same attribute values for different attributes.
> But validating a document with nsgmls 1.0.1 complains that
> there are attribute values difined more than once. And the
> SGML document instance is supposed to be invalid.
>
> What's wrong with this? I am confused.
When the value of an attribute is a name group (a list of keyword values),
then the names must be unique across *all* attributes of a given element
type. The reason for this is to allow the omission of the attribute name and
value indicator ("=") can be omitted, allowing you to just specify the
attribte value. (See pages 330 to 332 in *The SGML Handbook*, clause 7.9.3 in
the standard.)
For example, if you have this declaration:
<!ATTLIST Foo
security (conf | internal | unclas) conf
status (draft | final | approved) draft
>
Then then a minimized element can look like this:
<foo internal final>
This rule does have the side effect of disallowing the sort of construction
you wanted, such as a series of attributes representing switches where you'd
like to specify the keywords "yes" or "no" or something.
As Erik said, the choice of minimization over flexibilty may not have been
the most appropriate in hindsight. Being intimately familiar with GML, the
primary precursor to SGML, I can understand where this requirement came from,
as GML allows this sort of minimization.
One problem is that the distinction is only meaningful when OMITTAG or
SHORTTAG is YES, so even if you don't use markup minimization, you're still
stuck with the limitation.
One possible solution to this problem would be to allow the choice of
minimization on an element-type basis (but this might introduce too much
parsing complication).
This is an example of an issue where the desire to both simplify SGML parsing
(by elimination of some or all markup minimization features) and fixing what
some consider to be a flaw conflicts with backward compatibility. For
example, if a revised SGML eliminated the ability to minimize attribute
markup, existing minimized instances would not parse against the new
standard. An obvious solution would be to make attribute markup minimization
an option by itself. Of course, you can also argue that migration to a
version of the standard that doesn't support minimization would only require
normalizing any existing documents using tools like SP Add Markup (SPAM). Who
knows at this point what the right answer is?
The point of this is that seemingly small and/or obvious fixes may have
wide-ranging implications. It will be a significant challenge to manage the
dependencies and interrelationships among these changes while trying to
balance and compromise between conflicting requirements. That's why it's
difficult to say that any particular change is, *a-priori* a good idea--each
one has to be thought through carefully in the context of the others.
--
<Address HyTime=bibloc>
W. Eliot Kimber(kimber@passage.com) Sr SGML Consultant and HyTime Specialist
Passage Systems, Inc., 2608 Pinewood Terr., Austin TX 78757 (512)339-1400
10596 N. Tantau Ave, Cupertino CA, 95014, (408) 366-0300
</Address>
"Mr. Thought Policeman, I don't wanna do no wrong..." -- "1984 Blues",
Austin Lounge Lizards (http://www.webcom.com/~yeolde/all/lllhome.html)
-----------------------------------------------------------------------
Subject: Re: Same attribute values for two different attributes
Date: 1 Mar 1996 12:00:37 -0800
From: jenglish@crl.com (Joe English)
Newsgroups: comp.text.sgml
References: <31366D7C.628E@po.cisnet.or.jp>
<3137497D.5CD8@passage.com>
========================
W. Eliot Kimber <kimber@passage.com> wrote:
>This is an example of an issue where the desire to both simplify SGML parsing
>(by elimination of some or all markup minimization features) and fixing what
>some consider to be a flaw conflicts with backward compatibility.
Not necessarily: since duplicate name tokens in attribute declared values
are only disallowed in order to prevent ambiguity in the case of attribute
name minimization, the revised Standard could specify that the
parser check this condition in the document instance instead of
in the <!ATTLIST ...> declaration. For example, it could allow:
<!ATTLIST foo
a1 (v1|v2|v3) #IMPLIED
a2 (v3|v4|v5) #IMPLIED
>
then <foo v1> and <foo v5> would be legal, but <foo v3> would
not be.
This would not break existing DTDs or instances, since they
currently follow stricter rules. The only danger is if
someone decided to add a duplicate attribute value to an existing
DTD when there were document instances which use the minimized
form; but designers face that kind of problem whenever they
change a DTD anyway.
>An obvious solution would be to make attribute markup minimization
>an option by itself.
That would be nice, as would the ability to selectively enable or disable
all the other minimization features individually, AS LONG AS doing so
doesn't silently change delimiter recognition.
--Joe English
jenglish@crl.com
------------------------------------------------------------------
Update as of August 1997: the June 1, 1997 WG8 TC, viz.,
"Proposed TC for WebSGML Adaptations for SGML"
indicates that the restriction is marked for deletion;
see the posting from Steve Pepper.
------------------------------------------------------------------
Title: Re: Attribute Values
Author: tadmc@flash.net (Tad McClellan)
Date: Wed, 24 Sep 1997 17:56:35 -0500
Mike Torrence (mst@fadavis.com) wrote:
: I am attempting to have a list of attributes which act as boolean
: options. For example:
: <ATTLIST foo
: bar (yes|no) "yes"
: print (yes|no) "no"
: correct (yes|no) "no" >
: dummy me... I thought that logically this was valid. As long as the
: attribute names were unique, their values could be the same.
That is logically valid, but not valid SGML ;-)
: Currently Rules Builder is complaining about the name token already
: having been used.
As it should.
: Can someone please confirm that this is indeed invalid DTD design...
It is indeed invalid DTD design.
Clause 11.3.3 says:
"A token cannot occur more than once in an _attribute definition list_,
even in different groups"
Why is this requirement there, you may ask?
Because with minimization, you are allowed to omit the attribute name
(and the VI):
<foo yes>
If the tokens are not unique, the application will not be able to tell
if that is supposed to be the value for 'bar', 'print', or 'correct' ;-(
: as well as possibly some ways around it (other than using yes|no, 1|2,
: 3|4, etc.)
I know of no better way around it other than that.
I usually go with something like:
<ATTLIST foo
bar (bar|no-bar) "bar"
print (print|no-print) "no-print"
correct (correct|no-correct) "no-correct">
<foo bar no-print no-correct> would then not confuse the application.
Ugly, I agree. But it will parse ;-)
--
Tad McClellan SGML Consulting
tadmc@flash.net Perl programming
Fort Worth, Texas
------------------------------------------------------------------------
Title: Re: Attribute Values
Author: David Megginson <dmeggins@sprynet.com>
Date: 24 Sep 1997 20:55:39 -0400
Mike Torrence <mst@fadavis.com> wrote in article
<3429707e.24733358@news.idt.net>...
> I am attempting to have a list of attributes which act as boolean
> options. For example:
> <ATTLIST foo
> bar (yes|no) "yes"
> print (yes|no) "no"
> correct (yes|no) "no" >
>
> dummy me... I thought that logically this was valid. As long as the
> attribute names were unique, their values could be the same.
Nope, not with token groups -- the idea is that the token names have
to make sense by themselves. Try this:
<!ATTLIST foo
bar (bar|nobar) "bar"
print (print|noprint) "noprint"
correct (correct|wrong) "wrong">
or this
<!ATTLIST foo
bar NAME "yes"
print NAME "no"
correct NAME "no">
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
------------------------------------------------------------------
Title: Re: Attribute Values
Author: Paul Madsen <paul_madsen@newbridge.com>
Date: Thu, 25 Sep 1997 06:51:50 -0400
Mike, the attlist you show is illegal.
The standard allows for the possibility of enumerated attribute values
appearing without being prefaced by their name and the equal sign, eg.
<foo yes> rather than <foo bar="yes">. With multiple
attributes sharing the same list of possible values, the tag above is
ambiguous; the parser wouldn't know to which attribute the "yes" referred.
The only way around it is to declare unambiguous attribute value lists.
<ATTLIST foo
bar (yesbar|nobar) "yesbar"
print (yesprint|noprint) "noprint"
correct (yescorrect|nocorrect) "nocorrect" >
which would allow
<foo yesbar noprint yescorrect>
paul
----------------------------------------------------------------------
Title: Re: Attribute Values
Author: pepper@infotek.no (Steve Pepper)
Date: Thu, 25 Sep 97 14:52:35 GMT
In article <m2202eyzxw.fsf@sprynet.com>, David Megginson <dmeggins@sprynet.com>
<01bcc931$85fa86e0$2502a8c0@gbourgui.lavasys.com> wrote:
>Nope, not with token groups -- the idea is that the token names have
>to make sense by themselves. Try this:
>
> <!ATTLIST foo
> bar (bar|nobar) "bar"
> print (print|noprint) "noprint"
> correct (correct|wrong) "wrong">
or this:
<!ATTLIST foo
bar (nobar) #IMPLIED
print (noprint) #IMPLIED
correct (wrong) #IMPLIED>
If the attribute isn't specified, the application can imply
a value that is the same as the name of the attribute.
Since we're talking about binary decisions, specify/imply
is enough:
<foo nobar>
is correct and gets printed (but not barred :-)
Steve
--
Steve Pepper, Senior Information Architect <pepper@infotek.no>
STEP Infotek AS, Gjerdrums vei 12, N-0486 Oslo, Norway
http://www.infotek.no/ phone://+47 22021680/ fax://+47 22021681/
direct://+47 22021687/ GSM/cellular://+47 90827246/
Whirlwind Guide to SGML Tools: http://www.infotek.no/sgmltool/
----------------------------------------------------------------------
Posted to CTS:
Title: Re: Token once per list.
Author: davep@acm.org (David Peterson)
Date: Fri, 24 Oct 1997 10:33:59 -0400
In article <877665266.8500@dejanews.com>, Agnew.Robert.A@sd.littondsd.com wrote:
> In article <344FA686.588F@boeing.com>,
> Brad Held <bradley.c.held@boeing.com> wrote:
> >
> > I am looking for the __REASON__ why two different attributes of
> > the same element cannot have the same value.
> This has been discussed many times here over the years. As I remember, it
> was to allow the use of "SHORTTAG" to assign the value without the "
> ATTNAME = " part. Thus the value must be unique or the attribute semantic
> analyzer won't be able to decide which attribute to assign it to. ( Still
> pretty silly, don't ya think?).
Not silly per se--necessary for that shorttag feature; just silly to make
it apply when you're not using the feature. It will be removed--if I
recall correctly, in the next Corrigendum. Then, as quickly as parsers
can be modified to permit it, you'll get no error--but of course you cannot
then use that particular shorttag feature on attributes for which you
wish to specify a duplicated name. Given
<!ATTLIST x b (c | d | e) #IMPLIED
a (b | c) #IMPLIED >
you'll be able to use "<x d>" and "<x e>" and "<x b>" (necessarily meaning
"<x a='b'>"), but not "<x c>" because the parser wouldn't know whether you
were specifying a value for a or b.
Dave Peterson
SGMLWorks!
davep@acm.org
----------------------------------------------------------------------
My own comments:
Having been burned by this non-intuitive restriction in clause 11.3.3
(once through inexperience, once by an act of forgetfulness), I offer
the following appraisal.
The argumentation found on page 330 of the SGML Handbook does not
seem compelling, except in light of certain assumptions the ISO
committee appears to have made, unnecessarily. The text says:
"It is a requirement of SGML (even without [enabling the]
SHORTTAG [feature]) that if you have more than one such attribute
[from a name group] in a single definition list, name tokens must
not be duplicated within the list. Otherwise you would not know
which attribute the value applied to if the name were omitted."
In the final sentence, "you" must apply to a human who is inspecting
a marked-up text, or an encoder, since certainly a piece of software
would "know" by virtue of the knowledge built into it. However: "which
[one] attribute" already appears to embody a prior commitment, and one that
is unnecessary. Several possible interpretations would be available if we
allowed duplicated tokens. The human (encoder) could then indeed know,
deterministically, by virtue of the alternative interpretation authorized
by the standard. The interpretation assuming that name tokens be "self-
describing attribute values" and that "when properly designed, they should
be meaningless as attributes of a different value" (p. 424) is one possible
interpretive posture, but hardly one that deserves (in my judgment)
to have been taken as determinative in the design.
But the fact that a construct like <p yes> might be less than clear
to a human inspecting just an instance ( in light of a declaration
<!ATTLIST p reviewed (Y | N) Y private (Y | N) N > )
-- is this argument implied in the text of pages 330 and 424? --
is hardly any argument, since in *many* cases, where attribute values are
defaulted,the human encoder cannot know -- apart from knowldge of the DTD,
knowledgeof the #CURRENT or "CONREF state -- what attribute values are in
fact set for a given instance. Thus, it appears to me that several other
options were open in the design. For example, would any general design
principle undergirding SGML (have) render(ed) this interpretation infeasible:
"for an element having an attribute definition list with duplicated tokens,
it shall be an error in the case of the relevant attributes not to include
the attribute name with the attribute value, if an explicit value need be
given in markup"? (Or some such).
I quit here, since of the several ways in which SGML attributes appear
to me broken, this is one of the least problematic, and quite easily
circumvented. --- Robin Cover
|
| Receive daily news updates from Managing Editor, Robin Cover.
|
|