[Mirrored from: http://www.hookup.net/~joeclark/sgml.html]
I've identified what I consider a grave need for Standard Generalized Markup Language document-type definitions (SGML DTDs) to handle four types of accessible media-- captioning, audio description, subtitling, and dubbing.
I assume here that you know what SGML is but are not up to speed on what those four media are. (One useful link for SGML information is sil.org. Also try SoftQuad and the newsgroup comp.text.sgml.)
First, definitions:
Note: Captioning and subtitling have as little in common as bicycles and motorcycles. Three big differences are:
In all cases, we are talking about fonts reminiscent of dot-matrix printers circa 1982. Most fonts in Line 21 systems do not offer descenders in the lowercase gypqj, making the lowercase so poorly readable that, since Day 1 of closed-captioning, captioners have used uppercase for nearly all text even though uppercase is also hard to read. In Line 21, activating or deactivating italics, underlining, or the like inserts a space. Italics simply are not available in PAL World System Teletext captioning. Alignment in Line 21 systems is poor but, by industry agreement, by the year 2002 captioners will have available to them new codes that will permit niceties like true centering and right justification. (For further information on this topic, which I should really write a full treatise about, check my article "Typography and TV Captioning," Print, January/February 1989. Also look at the bibliography of captioning articles I've written.)
Also, if captions were stored as part of an SGML structure, they could be automatically reformatted in real time for different display devices, like an LED screen (with a character set different from TV and/or inverted for viewing in a mirrorized display), TV pop-up captions, TV scroll-up captions, a continuous text-only stream without paragraph and caption breaks destined for computers, or an offscreen large-print display for visually-impaired viewers. Or captions created with one software package could be read and understood by another-- or another country's system. Right now it is quite tedious to reformat Line 21 CC for PAL CC, and there are various typographic issues that come up here.