Toward a Standardized Format for ASCII Text Documents
A Working Paper of
The ICADD Subcommittee on Standardization of ASCII Text
Documents
Prepared at the Trace Research and Development Center
Gregg C. Vanderheiden, Ph.D.
Neal Ewers
Keywords: Document access, ASCII, text documents, standard,
print disabilities, alternate formats, braille
DRAFT
Table of Contents
1. The Need for A Standard Electronic Format for Electronic
ASCII Text Files 1
1.a. The Need 1
1.b. Current ASCII Format 1
1.c. Requirements of the New ASCII Format 1
2. Overall Goal 3
3. Formal versus Informal Documents 3
3.a. Type 1 -- Informal Documents (ICADD-0 Format) 3
3.b. Type 2 -- Informal Documents (ICADD-8 Format) 3
3.c. Type 3 -- Formal Documents (ICADD-22) 3
4. Specific Goals 4
5. Constraints 4
6. Proposed Format for Type 1 Documents 5
7. Proposed Format for Type 2 Documents
(ICADD-8) Format 7
7.a. Tag Rationale 9
8. Request for Input 9
Toward a Standardized Format for ASCII Text Documents
A Working Paper of
The ICADD Subcommittee on Standardization of ASCII Text
Documents
Prepared at the Trace Research and Development Center
Gregg C. Vanderheiden, Ph.D.
Neal Ewers
1. The Need for A Standard Electronic Format for Electronic
ASCII Text Files
1.a. The Need
Individuals who are blind or who have other print
disabilities have difficulty in accessing and effectively
using documents in print form. One approach to addressing
this is to provide the documents in electronic form.
Individuals using microcomputers and other electronic
reading aids can then access and have the information
presented to them in speech, braille, large text, or other
suitable form. Because of the large number of different
formats in which electronic text can be stored, specifying
that a document must be in "electronic form" will not
necessarily result in an electronic document which can in
fact be accessed or read. Some standard format which can be
read by all software is therefore necessary.
Unless a standard definition of an "ASCII text document" is
created, it will not be possible to create tools which can
easily work with these documents. Further, it is difficult
to specify that people must provide their information as an
ASCII text file if no definition as to exactly what that
means is provided.
1.b. Current ASCII Format
Currently, the most common format available is what might be
called an ASCII text file. This is a file which contains
only standard ASCII text characters (Table 1). To
accommodate foreign languages, this standard has been
revised by the International Standards Organization (ISO) as
shown in Table 2.
In either case, ASCII or ISO, the text file does not include
any formatting information. Thus, any information that was
encoded in an original document by using boldface,
underlining, italics, footnote designations, etc., is lost
in a document that is changed into ASCII text form. Since
the boldface, underlining, etc., may contain convey
important information, converting a document into a straight
ASCII file may in fact cause some important information to
be lost and therefore unavailable to the individual using
the ASCII text file.
1.c. Requirements of the New ASCII Format
One requirement of standard ASCII text file format therefore
would be that it provide some mechanism for preserving
essential formatting information that might otherwise be
lost.
1
A second requirement is that the standard must clearly
define how the ASCII text file would be formatted. For
example, is there a carriage return at the end of each line,
or only at the end of paragraphs? (Documents with carriage
returns only at the end of paragraphs cause a problem for
some screen reading programs.) If there is a carriage
return at the end of each line, how does one identify the
end of a paragraph, so that screen readers can read smoothly
across lines, but stop at the end of a paragraph?
Table 1: ASCII Characters
The ASCII value is listed to the left, and its corresponding
character to the right.
33 !
34 "
35 #
36 $
37 %
38 &
39 '
40 (
41 )
42 *
43 +
44 ,
45 -
46 .
47 /
48 0
49 1
50 2
51 3
52 4
53 5
54 6
55 7
56 8
57 9
58 :
59 ;
60 <
61 =
62 >
63 ?
64 @
65 A
66 B
67 C
68 D
69 E
70 F
71 G
72 H
73 I
74 J
75 K
76 L
77 M
78 N
79 O
80 P
81 Q
82 R
83 S
84 T
85 U
86 V
87 W
88 X
89 Y
90 Z
91 [
92 \
93 ]
94 ^
95 _
96 `
97 a
98 b
99 c
100 d
101 e
102 f
103 g
104 h
105 i
106 j
107 k
108 l
109 m
110 n
111 o
112 p
113 q
114 r
115 s
116 t
117 u
118 v
119 w
120 x
121 y
122 z
123 {
124 |
125 }
126 ~
127
Table 2: ISO Characters
Table 2 will go here2
2. Overall Goal
The purpose of the ICADD ASCII Text Format Standard is to
provide a standard format for ASCII text documents. This
effort to define a standard ASCII text format is a subset of
the overall goals of the International Committee for
Accessible Document Design (ICADD). This group, which was
formed in 1992, has an overall scope of work which includes
both the development of a format for simple ASCII text
documents and the development of a standard for more formal
publications. The standard for more formal publications is
not covered in this subcommittee report.
3. Formal versus Informal Documents
Currently, the ICADD efforts cover three types of documents:
two informal and one formal.
3.a. Type 1 -- Informal Documents (ICADD-0 Format)
With the proliferation of computers, there has been a
corresponding increase in the number of letters, memos, and
other informal written communication which are prepared
using word processors rather than typewriters. This makes
it possible for a large amount of this material to be sent
to people as an ASCII text file when this is their
preference. Type 1 documents include all of those informal
documents where there is no formatting (boldface, italics,
footnotes, etc.) which is necessary to understand the
documents (or where the loss of boldface, italics, etc.,
would not alter the reader's ability to understand the
document). For this type of information, a very simple
ASCII Text Standard has been defined, and is described
below. It includes no formatting information, and does not
support the use of boldface, underlining, etc., in a
document.
3.b. Type 2 -- Informal Documents (ICADD-8 Format)
In addition to informal correspondence and documents, there
are also a number of other informal or semi-formal documents
and reports which are prepared using standard word
processors. In these documents, however, formatting (such
as boldface, italic, underline, etc.) is often used to
convey important information in the document. In addition,
these documents often contain footnotes, side-bars, or boxed
text which is interspersed with the running text of the
document. Converting these documents into simple ASCII text
files (without preserving the formatting information) can
cause both confusion and loss of information. Where text
formatting conveyed information, the information would be
lost. When footnotes, boxed text, or side-bars suddenly
appear intermixed with running text (without any type of
marker), the resulting text file can be very confusing and
even misleading. For these types of documents, a set of
eight tags is defined which allow users to mark common
attributes. Specifically, these tags allow the user to mark
boldface, italicized, or other emphasized text, as well as
to mark list items, picture captions, side-bars or boxed
text, and page numbers. . This Type II document format is
referred to as ICADD-8 and is described below.
3.c. Type 3 -- Formal Documents (ICADD-22)
The third type of document defined by the ICADD effort is
formal documents, including books, journals, and other
formal publications. Such documents can often contain
multiple sections or chapters as well as specially formatted
text. In addition, these documents may also include
equations, tables, columns, and other specially formatted
information. A set of 22 tags have been defined by ICADD to
allow these documents to be more effectively accessed and
read. In addition, further specialized tag sets are being
explored to handle scientific, mathematical, and
3
other types of specially formatted text. The purpose of
these tags is to allow special commercial document readers
to translate documents which are in the standard ICADD
format into documents that are structured for use in the
document reader. The result is a document which can be
accessed and used by a person with a print disability in a
manner which is both complete (contains all of the
information in the original) and efficient (allows rapid
movement about and within the text). Specifications for
Type 3 documents are provided in a separate document.
4. Specific Goals
This document outlines the current draft of the ICADD
specifications for Type 1 and Type 2 informal documents. It
is a first draft, and is being released so that persons with
print disabilities and others interested in this problem can
review it and offer input concerning the proposed
specifications. This documents was prepared based upon
questionnaires answered by and conversations with members of
the ICADD subcommittee charged with arriving at the design
of ASCII formats. Members of this committee include:
Jim Allan
Texas School for the Blind
1100 W. 45th Street
Austin, TX 78756
512/454-8631
Internet: jallan@tenet.edu
Charles Crawford, Commissioner
Executive Office of Human Services
Commission for the Blind
Boston, MA 02111-2227
617/727-5550
Judith Dixon
Consumer Relations Office
National Library Service for the Blind and Physically
Handicapped
Library of Congress
Washington, DC 20542
202/707-5100
Internet 74036.2101@Compuserve.com
Neal Ewers
Trace Research and Development Center
Room S-153, Waisman Center
1500 Highland Avenue
Madison, WI 53705
608/263-5485
fax 608/262-8848
John Hernandez
New York Institute for Special Education
9999 Pelham Parkway
Bronx, NY 10469
718/519-7000, extension 348
fax 718/231-9314
David Holladay
Raised Dot Computing
408 S. Baldwin Street
Madison, WI 53703
608/257-9595
Gregg Vanderheiden
Trace Research and Development Center
Room S-151 Waisman Center
1500 Highland Avenue
Madison, WI 53705
608/262-6966
Internet vanderhe@macc.wisc.edu
fax 608/262-8848
5. Constraints
This section contains a listing of the constraints which a
standard format in this area must meet.
4
a) Any proposed guidelines must work easily on a wide
variety of computer platforms.
b) The guidelines must be easy to implement, even on the
most rudimentary word processor.
c) The guidelines should use terminology and strategies
which can be understood by any person responsible for
preparing documents in this format (secretaries, students,
etc.).
d) Each level of format should be internally consistent
with the higher level formats (e.g., Type 2 must be
consistent with Type 3).
6. Proposed Format for Type 1 Documents
This section presents the proposed format for Type 1
documents (ICADD-0 Format). A summary of the format
rules is presented, following by a rationale for each of the
rules.
1) Text should be broken up into lines with hard
carriage returns at the end of each line.
2) Each line should be no longer than 78 characters.
(65 characters is preferable for documents which are short
and where short lines do not cause layout problems.)
3) There should be two carriage returns at the end of
each paragraph.
4) All titles within the document text should be
preceded by an extra carriage return (for a total of three
carriage returns) if they are not at the top of a page or
the document).
5) All carriage returns should be followed by a line
feed character.
6) Text in an ICADD-ASCII formatted document is limited
to printable ASCII characters with codes between 33 and 127,
plus Space (32), Tab (09), Carriage Return / Line Feed (13,
10) and Form Feed or New Page (12). The basic characters
for 33 to 127 include (in order):
! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >
? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \
] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z
{ | } ~
5
1) Text should be broken up into lines with hard
carriage returns at the end of each line.
Rationale: Some text readers are not able to scroll
past the end of a screen line. Thus, hard carriage returns
at the end of each line are necessary in order to keep these
programs from crashing.
Comment to Reviewers: The number of programs which
cannot handle text without carriage returns at the end of
each line is decreasing. Some people felt that we might
lean into the future on this, and not specify carriage
returns at the end of every line. This simplifies some
other document interpretation. Most of the people we talked
to, however, felt that many individuals trying to access
these ASCII text files are not yet using the more
sophisticated tools, and that at least for the foreseeable
future it was better to stick with the hard carriage return
on each line format. This is therefore included in the
current version of the format. Additional comments, pro and
con, are invited.
2) Each line should be no longer than 78 characters.
Rationale: Using an 80-character line can cause some
computer displays to automatically word-wrap after the 80th
character. If this is then followed by a carriage return,
it would result in all of the lines being double-spaced. A
78-character limit eliminates this problem. All modern
computers support an 80-character display. Thus, adhering
to this format would result in documents which display
without distortion on any standard screen. For printouts,
this would also fit in 6.5" at 10-point Courier, and thus
would print out on standard 8 1/2" x 11" paper with 1"
margins. For documents which are short, and where short
lines will not create layout problems, a 65-character line
is more convenient for some users.
3) There should be two carriage returns, with no spaces
(or other characters) between them, at the end of each
paragraph.
Rationale: More than one carriage return is needed in
order differentiate the carriage return at the end of a
paragraph from the carriage return at the end of each line.
It is important that there be no characters between the two
carriage returns in order to facilitate machine
identification of the dual carriage return.
6
4) All titles within the document text should be
preceded by an extra carriage return (for a total of three
carriage returns).
Rationale: Providing the third carriage return after
paragraphs which precede titles makes it easy to identify
titles automatically in a document.
5) All carriage returns should be followed by a line
feed character.
Rationale: MS-DOS and other environments provide a
line feed following each carriage return in the document.
Documents in the Apple Macintosh environment, however, do
not provide any line feed following the carriage return. A
document with line feeds in either environment is quite
readable, although in the Apple Macintosh environment each
line is preceded by a square bracket on the screen. If the
line feeds are left out in MS-DOS documents, however, some
software will have difficulty with the document. The
recommendation is therefore to provide a line feed with
every carriage return. For any environments in which the
line feed is superfluous, it can be very easily removed
using a search-and-replace command. It is expected that
translation programs will also be developed that will remove
all ICADD format tags from a document and change them
directly into format commands for popular word processors
(WordPerfect, Microsoft Word, MacWrite, etc.). When this is
done, the linefeeds can also be removed if appropriate.
6) Text in an ICADD-ASCII formatted document is limited
to the ASCII characters with codes between 33 and 127, plus
SPACE, TAB, CARRIAGE RETURN (and LINE FEED), and FORM FEED
(new page).
Rationale: Characters above ASCII 127 are not
standardized. They are also not supported by many programs
and readers.
7. Proposed Format for Type 2 Documents (ICADD-8)
Format
The ICADD-8 format includes the six guidelines listed above,
plus eight additional tags that cover bold, italic, and
other emphasized text, as well as lists, footnotes, figure
descriptions, side-bars, and page numbers. These tags are:
1. BOLD: text to be bolded
2. ITALICS: text to be in italics
3. OTHER: Other emphasized text
"Other" includes all emphasized text that is not bold,
italic, or bold & italic; for example, underlined text.
7
4. LIST ITEM:
item in listitem in listitem in list
The principal reason for tagging items in a list is to
differentiate a list of single-spaced items (with a carriage
return at the end of each line) from a paragraph of running
text (which would also have a carriage return at the end of
each line). Without some way of easily distinguishing a
list, screen reading and other automatic processing software
may strip out the carriage returns and change a list into a
stream of running text. This would be devastating to most
lists, and particularly to lists such as Table of Contents.
Two options for handling lists are supported. This first
option is to place standard SGML list item tags before and
after each item in a list.
Option 2:
Item 1
Item 2
Item 3
Item 4
The second option is a special ICADD-8 tag to be placed at
the beginning and end of a list. With this option, instead
of putting a tag before and after each item in the list, a
tag is placed before and after the entire list. This option
is provided to make it easier to read lists if a person is
not using a program that removes the tags. It also makes it
easier for hand-tagged text to be created. This second
option is particularly handy when dealing with Tables of
Contents and other similar lists, where each item in the
list occupies its own line, and the list items can occupy an
entire line. (Adding tags before and after each item would
cause all of the lines to wrap and break up or be longer
than 78 characters.)
Reviewers Note! Note that this is not a
standard SGML tag. It also violates one of the constraints
stated above, which says that all of the ICADD
specifications should be subsets of each other. It does
appear, however, to be a very useful option. Comments pro
and con are invited.
5. FOOTNOTE: footnoted text
Footnoted text should be placed in the text and not at the
bottom of the page so that it is close to the item it refers
to. .
6. FIGURE DESCRIPTION: Text in a figure
descriptionFigure caption
This tag is used both for figure captions and for
descriptions of figures. Descriptions should be provided
for all figures, pictures, or other illustrations which are
not completely redundant with the text of the document.
8
7. BOXES AND BLOCKED TEXT: Text in a box, side-
bars, etc.
Tag all boxed text (e.g., Sidebars, Historical Notes and
other miscellaneous inserted text), and place them within
the running text of the document at a location similar to
their location in the printed document.
8. PRINT PAGE REFERENCE: print page reference
When a document is converted to ASCII text, it almost always
ends up on a different page number than the original, or it
appears as a continuous text file with no page delimiters.
In both cases, it is not possible to make any sense out of
page references in the original text document (e.g., "See
page 5") or the index on a Table of Contents. It is also
difficult to discuss the document with people using a print
copy. Preserving the page boundaries of the original
printed document is therefore often important.
7.a. Tag Rationale
These tags (with the exception of the second list option)
are all taken directly from the standard SGML tags that are
used in the formal Type 3 documents. The purpose of this
minimal set of eight tags is to allow tagging of very common
formatting information in the informal documents, in order
either to preserve formatting information important to
understanding the text or to make it easier for automatic
text readers to deal with these documents.
8. Request for Input
This is a working document, and input of all types is
solicited. Because of the pressure to put out a first
release of this standard, however, please send comments
sooner versus later. Also, in order to get the widest
possible review and input to the document, please code and
redisseminate to any people or forums you think would be
interested.
You can send comments directly to the subcommittee chair via
e-mail, regular mail, or fax:
ICADD ASCII Subcommittee
c/o Gregg Vanderheiden (chair)
S-151 Waisman Center
1500 Highland Avenue
Madison, WI 53705
608/262-6966 voice
608/263-5408 TT/TDD
608/262-8848 fax
vanderhe@macc.wisc.edu
.