Kavi Document Repository: Examples of Filenames using the SPACE Character and Other Cruft

Date: 2005-08-02. Random extracts from the OASIS TCs' document repositories (Kavi).

Context for this display document: deleterious effects of using the UNDERSCORE character in filenames/URIs. Just because computing systems allows one to create filenames containing control characters and other non-displayable characters does not mean it's a good idea.

Promiscuous use of SPACE in filenames exacerbates the problem of ambiguity when SPACE (0020) and SPACING UNDERSCORE (005F) appear in the resulting URIs: (X)HTML hyperlinks are typically underlined in screen displays and in paper print renditions, so that both SPACE and UNDERSCORE appear as blank characters.

When such URIs are printed on paper and (often) in PDF format, the reader cannot determine whether any given blank character is SPACE or UNDERSCORE — at least, not without appeal to other information sources (which may not be available) or experimentation. An incorrect assumption or guess may lead to the publication and propagation of an erroneous URI. More critically: a human reader unable to determine whether a blank character is SPACE or UNDERSCORE in a paper-print or PDF representation may be unable to access an emergency- or safety-related online resource within a short timeframe when essential online information is needed.

Concern for perspecuity of data, human safety, and information integrity (preservation of the exact spelling of a URI resource identifier) dictates that users prevent this ambiguity and potential corruption by NOT including SPACE or UNDERSCORE in filenames that will become visible — and thus, ambiguous — as URIs.


In the following examples, the reader will observe (mouse-over the links) a mixture of SPACEs and UNDERSCOREs, along with other inadvisable crufty filename (URI) characters like comma, at-sign (@), ampersand (&), left and right parentheses, tilde (~), pound-sign (#), dollar-sign ($), left and right square-brackets, plus-sign(+), colon, semicolon, etc. Sometimes, SPACE and UNDERSCORE are mixed in the same URI. These blank characters typically will be ambiguous when printed on paper or when printed in a PDF document, modulo device configurations.


Comments to Robin Cover — here not speaking officially for OASIS.