By Robin Cover. Draft opinion 2005-05-23. See disclaimers and request for comments.
Someone has suggested that we should accommodate the fallible memory and poor spelling habits of users who try to fetch online resources using any (case) spelling whatsoever. "Mixed case in URIs and case-sensitive matching rules are a royal pain for users, so we should configure the web servers to honor requests for all variations of upper- and lower-case, regardless of canonical spelling of the resource's mixed-case URI. If there's a case-insensitive match, that's close enough: just forget about spelling and ship the resource, without further ado." Hmmmmm...
Conceptually, instructing a server to use case-insensitive string matching on URIs is like implementing myriad URI aliases for each resource. Some use cases for URI aliases are quite valid. For example, a resource that's versioned but needs to be available at a predictable location may be referenced both by a version-specific URI and by a latest-version URI, where the latter might redirect to the former. Some people think it's both useful and harmless to hand over a resource requested as example.com/faq when the canonical URI is example.com/FAQ. However, occasional URI aliasing is vastly different than programming server support for an arbitrarily large number of case-insensitive spellings, matching all permutations of upper-case and lower-case spellings: is that kind of URI aliasing a good thing? Is it benign?
A distinct set of concerns arises in connection with machine-readable formal specification components identified by URIs, where these (dereferenced) resources are fetched and processed from deep within XML applications: XML schema files, WSDL files, XML catalogs, etc. What about RDDL documents advertised by their owners as Namespace Documents, which live at the end of a dereferenced official namespace URI? In such cases, where URIs clearly function as authoritative, canonical names as well as locators, encouraging promiscuous spellings and supporting silent server resolution of mis-spelled URIs may result in name corruption — unintentional or intentional. Corrupt names supported by a server then proliferate as published name corruptions in a process very toxic to data integrity and application interoperability.
Computers are Particular: Why Pretend Otherwise? Server support for agent requests that ignore case-sensitive spelling encourages an attitude that case does not matter, or it shouldn't matter. Encouraging people to be sloppy about spelling when they are interacting with computers is arguably a bad idea, because a great many computing operations are (in fact) case-sensitive. Examples:
XML is case-sensitive; you get a bad experience if you pretend otherwise.
Because conforming XML and XHTML applications use markup constructs (names) that are case-sensitive, it's important to reinforce this reality. Contrariwise, it's destructive to undermine the XML rules by welcoming "any case will do!" in (de-referencing) URIs associated with XML components.
The most prominent public specifications for markup component naming rules prescribe the use of UCC (upper camel case) and LCC (lower camel case), and composition rules for filename construction/generation rely upon these UCC/LCC case distinctions. Filenames predictably find their way into use as canonical elements in URIs. Consistent spelling is critical in the design architecture; a server's processing of references to UBL/CCTS artifacts based on a premise that "case does not matter" in URIs may contaminate these XML applications. See details on rule-based mixed case component naming in the note.
It's a bad idea to (passively) condition users to believe that character case in URIs does not matter because in some respects, it matters critically. Once accustomed to the idea that case does not matter in the path portion (hierarchical part) of the URI, will they remember that bad things can happen if you treat the fragment as case-insensitive? For example, what happens if you fold case in XPointers? Will users take it for granted that Foo and foo are equivalent fragment identifiers in file.html#Foo and file.html#foo? You are now reading in document section #foo, the title for which asserts that case-insensitive matching is not a great idea; try out section #Foo. Big difference. Case matters critically.
Web Server Common Practice Although we see a growing tendency to use all lower-case characters in URLs and to instruct URL rewriting engines to perform case-folded (case-insensitive) matching on paths, many web servers are configured to treat URIs case-sensitively. Such server configurations respect, rather than disrespect, the URI owner's decision to use mixed-case in the spelling for resource identifiers. Here are some examples, for which you can try to guess the correct spelling, if you want, but these links fail [as of 2005-05-23] if you request a resource using a URI that disregards the case-sensitive spelling assigned the resource by the URI owner:
Web Architecture Good Practices The Architecture of the World Wide Web, Volume One explains why "Avoiding URI aliases" and "Consistent URI usage" are both good practices. Web servers that capitulate to users' demands for an arbitrarily large number of (case-insensitive) aliases and inconsistent usage ("because case should not matter") arguably are not encouraging good practices. The rationale is presented in the Architecture document section 2.3.1. URI aliases. While the guidelines are articulated in terms of behavior by document designers/authors (URI owners) and agents as URI consumers, they are applicable to server behavior as well: "URI aliases are harmful when they divide the Web of related resources... The problem with aliases is that if half of the neighborhood points to one URI for a given resource, and the other half points to a second, different URI for that same resource, the neighborhood is divided. Not only is the aliased resource undervalued because of this split, the entire neighborhood of resources loses value because of the missing second-order relationships that should have existed among the referring resources by virtue of their references to the aliased resource..." [credits to Norm Walsh for citing the relevance of this passage]
Case Folding in Evaluation of IRIs What happens when the URIs are IRIs? Hmmmm... I don't know (not completely understanding the significance of the Section 5.3.2.2 Note), but I would not count on the average web site administrator getting this right unless there are already publicly available resources (e.g., POSIX/Perl regex routines for IRI/URI rewriting engines). It's apparently tricky, as Section 5.1 declares: "Because IRIs exist to identify resources, presumably they should be considered equivalent when they identify the same resource. However, this definition of equivalence is not of much practical use, as there is no way for an implementation to compare two resources unless it has full knowledge or control of them... Even though it is possible to determine that two IRIs are equivalent, IRI comparison is not sufficient to determine whether two IRIs identify different resources..." Elliotte Harold wrote: "Going beyond ASCII and English, case insensitivity is very tricky. For instance the lower case of I is not the same in Turkey as it is in the United States. Ditto that the upper case of i is not the same in Turkey as it is in the United States. The upper case of é is different in Quebec and France. IRIs all get encoded as ASCII URIs; but would such URIs be recognized and would percent encoded letters be upper cased or lower cased? Both percent encoded ASCII and percent encoded non-ASCII?"
Contaminating Effects The following scenario illustrates how server support for case-insensitive matching on URI references can lead to loss of interoperability, not to mention user confusion. Suppose your technical committee creates an XML specification which includes an XML schema, living canonically at http://www.example.com/QVML/2005/01/Proto/qv.xsd, with a declared namespace URI http://www.example.com/QVML/2005/01/Proto. But the host standards body for your TC has "helpfully" implemented server case-folding heuristics. Now, an influential book or web site incorrectly publicizes that the XML Schema lives at http://www.example.com/QVML/2005/01/proto/qv.xsd, and notes that a RDDL Namespace Document lives at http://www.example.com/QVML/2005/01/proto. Bogus versions of the XML schema emerge containing an incorrect namespace declaration: developers conclude that the namespace URI is http://www.example.com/QVML/2005/01/proto — because that's what these RDDL documents do. The error propagates silently but swiftly: since the web servers transparently resolve HTTP requests based upon this incorrect information, disinformation persists and spreads; you don't notice initially. Now: what breaks? XML catalogs, maybe? Which XML applications fail to interoperate? Document instances with corrupted namespace declarations proliferate. Which sets of applications interoperate with respect to processing malformed data instances, but are non-compliant with the TC's specification?
A case similar to that given above involves the spread of an error in a filename spelling, rather than in the upper parts of the path hierarchy. If a server silently resolves the URI given as http://example.com/Schemas/PLML.xsd, the user who fetched the schema under this URI will be invited by a web browser to "SaveAs" PLML.xsd. The user then creates XML instances which use this local XML schema filename, and they nominally validate. A different person using the draft example files discovers that the sample instances fail in an application, and thinks an error has crept into the namespace in the schema file, which does not match the schema filename (dang it!) or instance spellings — and so "corrects" it case-wise to 'PLML'. Only problem: the real namespace is lower-case 'plml'. This kind of incorrect correction is attested through scribal history in manuscript transmission: a scribe "corrects" an apparent error to an incorrect (corrupted) but plausibly correct exemplar. In this sample instance involving PLML/plml, the web server configured to transparently deliver the schema requested under an incorrect URI (case spelling) seeded the chain of corruptions.
Surrendering Control Over Your Name Most people will take offense at reckless misspelling of their personal name, and will not tolerate confusion that would come from allowing an arbitrary number of variant spellings of their name in public documents. Why would you want to surrender control over a URI you own? Server support for case folding allows users (worldwide) to create and publish arbitrary variant spellings for (canonically) case-sensitive URIs — with impunity. Even deliberately, with malice. The URI owner, who cannot prohibit the publication of unauthorized and possibly undesirable variant spellings, then loses control over his/her ability to effect stability in the naming orthography. URI stability may be critical for a variety of reasons — some unanticipated.
Identity of an Identifier From a philosophical perspective, the power of naming derives from the ability to discriminate in a manner sufficient to allow unique identification (identity), whether of a class or an instance in a class. According to this model, identifiers express identity not only for the (abstract/concrete) object signified, but recursively, within themselves, through unique naming: identity of the identifier. To forfeit the right to identity in the expression of an identifier (colloquial: "case does not matter") is to forfeit a core principle. One does not stand up and shout "What...??!" in a baseball stadium when a random idiot screams out "Hey there, buttface!" URI aliasing needs to consider the consequences of surrendering the identity of the identifier by saying "OK, yeah, I'm not buttface, but I think I know what you're asking for, so here, happy to oblige... go ahead and tell the world I answer to "buttface" as one of many vulgar names, and hell, I don't even know if I have a real name or not, probably not..."
Conclusion: Millions of currently maintained resources use mixed case in the path and query portions of URIs, and in fragment components. We could argue that use of mixed case in URIs is bad practice, but many projects have made this choice and defended it on the basis of concern for usability and semantic clarity. In the end, whether we think mixed case is good or bad is a moot point; it's there in URIs. What should servers do?
This memo argues that servers should not be configured to use case-insensitive string matching on a URI request and then (if successful in finding an approximate match) transparently deliver the resource to the agent. If there is no exact case-sensitive match, one reasonable server response other than returning HTTP status code 404 might be to implement HTTP 1.1 code "300 Multiple Choices" in such a way as to prompt the user/agent with suggestions about "near matches", requiring (?) however that the agent not automatically GET (one of) the possible candidate URI(s). As mentioned in a note, W3C servers sometimes behave in this fashion when a case-munged URI is sent in a GET request. Requiring that a human intervene to initiate a successive fetch of the resource represents one minimal protection against the silent proliferation of erroneous URIs.
Bob DuCharme adds a note about case-folding in relation to the heightened interest in IRIs: "It's ironic that people would take automatic case mapping in URIs seriously now that people are taking IRIs more seriously. The fact that the upper-case of école is ÉCOLE in Paris and ECOLE in Montreal should tell people something..." In January 2005, Internationalized Resource Identifiers (IRIs) became a new RFC in the IETF Standards Track. It is a major achievement on the road to a truly "World Wide" Web that supports the languages and writing systems of countries that do not (natively) use Roman alphabts and scripts. The IRI specification "defines a new protocol element called Internationalized Resource Identifier (IRI) by extending the syntax of URIs to a much wider repertoire of characters. It also defines 'internationalized' versions corresponding to other constructs from RFC 3986, such as URI references."
Elliotte Harold writes to make the important observation that direct typing of URLs in browser address windows is becoming less and less frequent a practice: Users almost never type in full URLs for anything but the home page, if that. And the home page is case insensitive because the scheme and the host name are cases insensitive. For all other URLs, users will (1) ask Google; (2) choose a bookmark; (3) type in part of the URL and let the browser auto-complete it; (4) follow a link; (5) copy and paste. The number of times users actually type in full URLs like http://xml.coverpages.org/caseIgnorance.html is small and decreasing. Google and auto-complete were probably the last nails in that particular coffin. Case-sensitive URLs are the last millennium's problem; they really aren't an issue in 2005..."
This section is file.html#Foo; not the same as section file.html#foo.
Warning: No need to waste time reading this section; the important arguments were made in the first section. Proceed at your own risk...
<sarcasm setTo="yes" /> "You know, spelling rules are a big pain in the butt, especially when it comes to remembering what to type in a browser address box for a URL. We need to change all web server behaviors so that, at a minumim, it never matters whether you type a capital or a lower-case letter. Wouldn't that be a lot simpler?
The requirements for perfect spelling are so fascist: why should it matter? Probably we should get rid of capital letters, or better yet, revert to computing practices of the paper-tape era, when EVERYTHING, INCLUDING SMALL FUNCTION WORDS LIKE "AND" AND "OR" WERE REPRESENTED ONLY BY UPPER CASE LETTERS, ALONG WITH PROPER NOUNS. JUST THINK HOW MUCH EASIER IT WOULD BE IF WE DIDN'T HAVE TO PAY ATTENTION TO DIFFERENCES BETWEEN UPPER CASE AND LOWER CASE LETTERS. THIS WOULD PROBABLY IRRITATE THE GERMANS, WHO TEND TO CAPITALIZE NOUNS (BECAUSE GERMAN USES ORTHOGRAPHY IN WRITTEN LANGUAGE FOR WORD DIFFERENTIATION), BUT MAYBE THAT'S A GOOD THING, JUST TO GET EVEN. ;-)
Come to think of it, getting even with the Wiki (that's WIKI) developers would be a good idea. Who can abide all this ugly CamelCaseWriting anyhow? We need to train people to believe that correct spelling in URIs does not matter, so that we can punish the WIKI-people who think it does matter. Here's how: We note that the canonical URI for the Atom syntax web site WIKI is: http://www.intertwingly.net/wiki/pie/FrontPage. We train users to believe that case-sensitivity in URIs is stupid, and to expect that enlightened web sites indeed will allow any case whatsoever. These users will then be infuriated at resources like the Atom WIKI! When they type in http://www.intertwingly.net/wiki/pie/frontpage, they will have one kind of bad experience: "Forbidden to you, you don't have permission to access /wiki/pie/frontpage on this server, according to Apache/2.0.46 (Red Hat) Server at intertwingly.net on Port 80; go away nasty person, Atom WIKI hates you!" Go away fool: "go and boil your bottom, son of a silly person. Your mother was a hamster and your father smelt of elderberries!" Not much better luck when they try http://www.intertwingly.net/wiki/pie/frontPage; just a different kind of bad experience. So: by this means, we will stamp out all stupid web sites that insist on fascist, throw-back exact spelling rules: enlightened users will just not put up with them.
These people who want to insist on correct spelling in URIs are the same bunch of anal types who think it's way, so wrong to make gratuitous use of the apostrophe to form plurals of English words. Why should it matter if we say two day's ago or two days ago, OR Three organization's are participating or Three organizations are participating? Everybody knows what you mean, so who cares? DTD's or DTDs; schema's or schemas; Nut's for sale! or Nuts for sale! — who cares about spelling perfection? These spelling freak's who rite about correct plural's and propur formashun's of currekt akronim's jest dont git it.
Application context. In the context of this document, "we" refers specifically to OASIS, but my concern is broader. OASIS (draft) guidelines for naming artifacts allow the use of mixed case, and indeed, many of the OASIS technical committees use camel case (UCC, LCC) for naming XML schema components and for constructing filenames. Filenames, of course, are incorporated into URIs. Technical committee "artifacts" include requirements documents, prose technical specifications, XML schema definitions, DTDs, attribute identifiers, profile identifiers, (probably also XML catalogs, catalog entries) and other resources which are often assigned URLs as well as URNs. Maybe soon IRIs.
URI components in scope. This discussion is not related to the scheme portion of a URI, which is treated case-insensitively, nor to the authority portion of a URI (userinfo subcomponent, host, port) — though case-mucking with userinfo would probably be a bad idea). The discussion relates chiefly to disrespect for case-sensitivity in the path, query, and fragment components of a URI, each of which may have special considerations. See for convenience the hyperlinked copy of RFC 3986: (a) URI Syntax Components, 3.3 Path, 3.4 Query, and 3.5 Fragment.
URIs that are names as well as locators. As noted in Uniform Resource Identifier (URI): Generic Syntax, "Instances of URIs from any given scheme may have the characteristics of names or locators or both, often depending on the persistence and care in the assignment of identifiers by the naming authority..."
Promiscuous spellings corrupt names and reduce interoperability. The mechanism whereby official, canonical identifiers lose integrity, leading to loss of interoperability, is illustrated in an example above. Thus, I only partly accept what Jakob Nielsen said: he wrote in his Alertbox "URL as UI" (March 21, 1999): "do not use MiXeD case text in URLs since people can't remember the difference between upper-case and lower-case characters: all-lowercase URLs are usually preferred. Domain names are less of a problem since they are case-insensitive — usability would increase if webservers would ignore case in resolving URLs." Forget whether using mixed case is a bad idea or not: the reality is that mission-critical XML applications do use case-sensitive names (obviously) and derived URIs for identifying formally-defined machine-readable components. In this use case, usability will not increase if web servers ignore case in resolving URLs." Just the opposite.
NDR specifications. On case-sensitivity in the naming guidelines documents, see the summary XML Naming and Design Rules Specifications Published by OASIS, UN/CEFACT, and Navy CIO; these specifications follow ebXML, as explained in the UBL NDR: "The ebXML architecture document specifies a standard use of upper and lower camel case for expressing XML elements and attributes respectively. UBL will adhere to the ebXML standard. Specifically, UBL element and type names will be in UpperCamelCase (UCC). [GNR8] The UpperCamelCase (UCC) convention MUST be used for naming elements and types. UBL attribute names will be in lowerCamelCase (LCC). [GNR9] The lowerCamelCase (LCC) convention MUST be used for naming attributes..." UBL XSD schemas are implementations of the document assembly models defined by UBL, generated by transformation rules; for UBL customization, schema generation should be compliant with the UBL Naming and Design Rules to promote compatibility with other UBL component libraries.
In the UBL 1.0 Standard, spelling consistency is reinforced through the use of identical case-sensitive spelling in diverse identifier contexts. The UBL "Order Response Simple" document type, for example, uses the closed compound case-sensitive string OrderResponseSimple in at least ten (10) key contexts, including the XML Schema filename (UBL-OrderResponseSimple-1.0.xsd), the XML schema namespace declaration (in part xsd:OrderResponseSimple-1.0), the XML root element name (OrderResponseSimple), the XML type (OrderResponseSimpleType), a directory for formatting specification examples (/fs/OrderResponseSimple/), UML diagram filenames (UBL-OrderResponseSimpleImplementationDiagram-1.0.gif), formatting specification filenames (FS-UBL-OrderResponseSimple.html), spreadsheet model filenames (UBL-OrderResponseSimple-1.0.xls), XML instance examples (FS-UBL-OrderResponseSimple.xml, OfficeOrderResponseSimple.xml), assembly model diagram filenames (UBL-1.0-OrderResponseSimpleDocumentAssembly.jpg), etc. See the UBL 1.0 specification file list.
Canonical URIs for some of these instance occurrences (canonical URIs produced from filenames) are:
The UBL architecture and methodology supports automatic filename generation for XML schemas, documentation files, and other resources. Appendix B of the UBL 1.0 Standard describes composition rules used to produce UBL Schemas. From B.5 Schema Generation: "The UBL 1.0 XSD schemas are the output of a transformation that applies schema construction rules to the Data Model represented by the UBL spreadsheets described in B.3 [Document Assembly Models: To define different types of documents, the components described in 'Component Model' are assembled into hierarachical structures based on the requirements of the context and the metadata requirements of the Core Components Technical Specification (CCTS). Document assembly starts with the definition of each of the business documents comprising UBL 1.0 as an Aggregate BIE (object class) for the document type. All the other Aggregate BIEs (object classes) for the document type are derived by traversing the associations from this Aggregate BIE to form the required hierarchy.] The transformation process consists of the following steps: (1) Reading in the data model spreadsheets; (2) Building from each spreadsheet an internal UML-based model; (3) Identifying external standards for code lists and including standard code list values as appropriate; (4) Applying UBL Naming and Design Rules, where formulas are applied to naming rules; (5) Outputting conformant XSD schemas." See also: (1) UBL naming based upon ebXML rules and (2) UBL schemas and other components.
Case matching in IRIs. On the possible complications of case-insensitive matching with IRIs, see Section "5.3.2.2. Character Normalization" in the RFC: "Note: Because it is unknown how a particular sequence of characters is being treated with respect to character normalization, it would be inappropriate to allow third parties to normalize an IRI arbitrarily. This does not contradict the recommendation that when a resource is created, its IRI should be as character normalized as possible (i.e., NFC or even NFKC). This is similar to the uppercase/lowercase problems. Some parts of a URI are case insensitive (domain name). For others, it is unclear whether they are case sensitive, case insensitive, or something in between (e.g., case sensitive, but with a multiple choice selection if the wrong case is used, instead of a direct negative result). The best recipe is that the creator use a reasonable capitalization and, when transferring the URI, capitalization never be changed..."
UBL schemas and other components. The OASIS UBL specification includes an elaborate architecture which supports component maintenance and schema evolution based upon strict naming rules and generation of component names according to composition rules. As mentioned in the NDR note, UBL follows ebXML in the use of both UCC (upper camel case) and LCC (lower camel case), concerned for semantic perspecuity in the compound names and for human readability of composed compounded names. Here are some sample filenames, which are appended to mixed-case PATH subcomponents as part of the hierarchical portion of the resulting URIs. Most readers will find the 'Case-folded' examples hard to read, and the actual 'Case-sensitive' exemplars tolerable. Of course, the design could have opted for the use of hyphen to mark juncture rather than the use of camel case and closed compounds; that's not what UBL did for codes and related constructs.
Case-folded, not what UBL does:
ubl-codelist-allowancechargereasoncode-1.0.xsd
ubl-codelist-acknowledgementresponsecode-1.0.xsd
ubl-codelist-countryidentificationcode-1.0.xsd
ubl-codelist-currencycode-1.0.xsd
ubl-codelist-documentstatuscode-1.0.xsd
ubl-codelist-latitudedirectioncode-1.0.xsd
ubl-codelist-linestatuscode-1.0.xsd
ubl-codelist-longitudedirectioncode-1.0.xsd
ubl-codelist-operatorcode-1.0.xsd
ubl-codelist-paymentmeanscode-1.0.xsd
ubl-codelist-substitutionstatuscode-1.0.xsd
ubl-1.0-orderresponsesimpledocumentassembly.jpg
Case-sensitive, what UBL actually does:
UBL-CodeList-AllowanceChargeReasonCode-1.0.xsd
UBL-CodeList-AcknowledgementResponseCode-1.0.xsd
UBL-CodeList-CountryIdentificationCode-1.0.xsd
UBL-CodeList-CurrencyCode-1.0.xsd
UBL-CodeList-DocumentStatusCode-1.0.xsd
UBL-CodeList-LatitudeDirectionCode-1.0.xsd
UBL-CodeList-LineStatusCode-1.0.xsd
UBL-CodeList-LongitudeDirectionCode-1.0.xsd
UBL-CodeList-OperatorCode-1.0.xsd
UBL-CodeList-PaymentMeansCode-1.0.xsd
UBL-CodeList-SubstitutionStatusCode-1.0.xsd
UBL-1.0-OrderResponseSimpleDocumentAssembly.jpg
UBL naming based upon ebXML rules. Mixed case for XML components as used in UBL and clarified in the Universal Business Language (UBL) Naming and Design Rules Standard is based upon principles of readability, visualization mechanisms, and semantic clarity. Excerpt from the UBL NDR: "XML is case sensitive. Consistency in the use of case for a specific XML component (element, attribute, type) is essential to ensure every occurrence of a component is treated as the same. This is especially true in a business-based data-centric environment such as what is being addressed by UBL. Additionally, the use of visualization mechanisms such as capitalization techniques assist in ease of readability and ensure consistency in application and semantic clarity. The ebXML architecture document specifies a standard use of upper and lower camel case for expressing XML elements and attributes respectively. UBL will adhere to the ebXML standard. Specifically, UBL element and type names will be in UpperCamelCase (UCC)." For ebXML, see "4.3 Design conventions for ebXML specifications" in Technical Architecture Specification Version 1.0.4, UN/CEFACT and OASIS 2001. "ebXML DTD, XML Schema and XML instance documents SHALL have the effect of producing ebXML XML instance documents..." Details in UBL Naming and Design Rules Checklist, published as part of the UBL Version 1.0 Standard.
OASIS servers behave in different ways. For documents listed as OASIS Standards, server response to an erroneous mixed-case spelling is inconsistent, but predictable depending upon whether the document is served "directly" from the UNIX file system or via a PHP script ('download.php'). In the former case ('docs.oasis-open.org' subdomain, uddi.org, other oasis-open.org[docs] path), incorrect spelling for the URI path components will yield a status code 404 or equivalent. In the latter case, the PHP implementation is programmed to serve the document if the erroroneous URI reference involves a UC/LC case error. This paper argues that the PHP script behavior is ultimately detrimental to data QA and interoperability. Examples, tested 2005-05-28:
Standards not served using PHP: case error yields 404 or equivalent
DSML 2.0: http://www.oasis-open.org/committees/dsml/docs/dsmlV2.xsd [try this]
DocBook 4.1: http://www.oasis-open.org/docbook/sgml/4.1/changelog [try this]
ebXML RIM 2.0: http://www.oasis-open.org/committees/regrep/documents/2.0/specs/ebRIM.pdf [try this]
ebXML RS 2.0: http://www.oasis-open.org/committees/regrep/documents/2.0/specs/ebRS.pdf [try this]
SAML 1.0: http://www.oasis-open.org/committees/security/docs/Draft-SSTC-SAML-Reqs-01.pdf [try this]
SAML 2.0: http://docs.oasis-open.org/security/saml/v2.0/SAML-core-2.0-os.pdf [try this]
SAML Token Profile 1.0: http://docs.oasis-open.org/wss/OASIS-wss-saml-token-profile-1.0.pdf [try this]
UBL 1.0: http://docs.oasis-open.org/ubl/cd-ubl-1.0/ [try this]
UDDI http://uddi.org/pubs/ProgrammersAPI-v2.04-Published-20020719.pdf [try this]
WSDM-MOWS 1.0: http://docs.oasis-open.org/wsdm/2004/12/WSDM-1.0.zip [try this]
WS-Security 2004: http://docs.oasis-open.org/wss/2004/01/Oasis-200401-wss-soap-message-security-1.0.pdf [try this]
XACML 2.0: http://docs.oasis-open.org/xacml/2.0/XACML-2.0-OS-Normative.zip [try this]
Standards served using PHP: case error does not matter for these incorrect URIs [Update 2005-07]
AVDL 1.0: http://www.oasis-open.org/committees/download.php/7145/AVDL%20Specification%20v1.pdf
CAP 1.0: http://www.oasis-open.org/committees/download.php/6334/OASIS-200402-CAP-Core-1.0.pdf
OpenDocument 1.0: http://www.oasis-open.org/committees/download.php/12569/OpenDocument-strict-schema-V1.0-os.rng
UBL NDR 1.0: http://www.oasis-open.org/committees/download.php/10323/cd-UBL-NDR-1.0Rev1C.pdf
WSRP 1.0: http://www.oasis-open.org/committees/download.php/3343/OASIS-200304-WSRP-Specification-1.0.pdf
200507 Update on OASIS server On July 28, 2005 it was revealed (to those who did not already know) that the behavior of the OASIS server in ignoring case was just a deceptive subset of a more questionable behavior: honoring URI requests that contain utter gibberish. TC members noted that a (canonical URL) for the WS Reliable Messaging Working Issues List advertised as http://www.oasis-open.org/apps/org/workgroup/ws-rx/download.php/13758/ReliableMessagingIssues.xml nominates a resource that's available from the OASIS server under an apparently infinite number of different (and nonsense) URIs, including:
The last example URI illustrates why it's a bad idea to allow the use of the UNDERSCORE character in filenames: web browsers are typically configured to underline hyperlinks, so when a human simply looks at this URI as displayed, s/he cannot tell whether the "blank" characters here displayed between ##m and ed are UNDERSCORE or SPACE. Of course, one might conclude that they must be UNDERSCORE "because [one reasons] SPACE is not recommended in filenames" — that's faulty reasonsoning, however, because OASIS TC Members actually [are estimated to] use SPACE in TC document filenames more often than they use UNDERSCORE. Various methods can be used in a web browser to disambiguate UNDERSCORE vs. SPACE in a displayed HTML hyperlink, but when documents are printed on paper or (often) in PDF format, the user is unable to determine whether any underlined blank character is UNDERSCORE or SPACE. Sometimes, OASIS documents use both UNDERSCORE and SPACE in the same filename. One characteristic of a high-quality URI is that it's unambiguous in all contexts: scrawled on a paper napkin in a pub; printed by inkjet on paper; displayed on a computer screen; etched in marble. Because there are no universal cultural conventions for writing SPACE/UNDERSCORE appearing as a blank, the ambiguity is inherent. As a matter of data integrity (this document argues), it's important that identifiers be stable and that servers prosess URI requests for resources in a manner that reinforces integrity of canonical spellings for URIs. That the OASIS (Kavi) collaboration tool in conjunction with the server violates this principle (promiscuous in the extreme) is no valid argument to the contrary.
W3C servers behave in different ways. In some cases, sending a request with inexact (case-sensitive) spelling returns a 404. In other cases, especially in the case of prominent, often used URIs, the W3C servers match on the URI reference case-insensitively and return a document if a LC/UC variant qualifies. For example, if we lower-case all UC characters in an attempt to GET http://www.w3.org/2001/07/REC-SMIL20-20010731-errata, we get back a polite failure message that suggests possible matches, implemented with HTTP/1.1 status code 300: "Multiple Choices: The document name you requested (/2001/07/rec-smil20-20010731-errata) could not be found on this server. However, we found documents with names similar to the one you requested. Available documents: [nominations are provided]..." If we lower-case all UC characters in http://www.w3.org/WAI/, we get the document via redirection and an HTTP/1.1 status code 301 (Moved Permanently). I have yet to determine what policy is followed at W3C, and what principles were used to design the policy.
RDDL as a Namespace Document. "It is the consensus of the [W3C] TAG that RDDL is a suitable format for use as a 'Namespace Document', that is to say as a representation yielded by dereferencing a URI in use as an XML Namespace Name." See "Resource Directory Description Language (RDDL) 2.0."
This document is not an official part of the Cover Pages web site, and may not represent the interests of anyone other than the author. It is an experimental opinion piece, for which feedback is requested. Please send email with your critique, corrections, suggestions for improvement, and use cases for/against the practice of instructing servers to ignore case. Being completely neutral about the matter, I am especially interested in use cases illustrating the deleterious effects of case-insensitive matching on URIs.
The canonical URI for this document is http://xml.coverpages.org/caseIgnorance.html, featuring one obligatory upper-case I. Content is brought to you by a Netscape-Enterprise/4.1 server configured to respect case in URIs. No URI aliases are provided, though you could create an arbitrary number of them using redirect hacks like those provided by tinyurl.com.
Empty Space