Electronic Texts The Day After Tomorrow

Yuri Rubinsky
SoftQuad Inc.

Having been invited to this conference some months ago, I've had time to think a little about scholarly presses and libraries, and have decided that the business and social model that most closely relates to what you do (particularly in the library side of your collective business) is that of Disneyland. There are many good reasons for this comparison, but I'm only interested in six today:

  1. Both libraries and Disneyland have an urgent need to keep abreast of technology. The first microfiche I ever saw was in a library. The first online database I ever logged onto was in a library. For that matter, the first card catalog I used -- the first little tiny file drawer -- was in a library. The first talking statue was at Disneyland. The first tiny train.

  2. Both need sound underpinnings: they deal with volumes (quantities, of course) that require an absolute confidence in the workings of the parts.

  3. Both absolutely require invisible services. I'll get back to this again at the end of this talk. My area of expertise (and that of our company, SoftQuad) is the invisible underpinnings. This should also be true of scholarly publishing in general.

  4. Both libraries and Disneyland can tell, with relative certainty, whether something is working or not, and can tell this statistically. Accordingly, both must decide on thresholds: When do you throw out this old book, when do you shut down this old attraction? When do you stop subscribing to this journal? How, in the absence of usage, do you decide to subscribe to one? Particularly to some new-fangled electronic one.

  5. Both have to grapple with the issue of payment: What's just part of the overall cost of admission? What's extra? Does one have to pay to ride the little train, or is that the default mode of transport? Does one have to pay to search 200 or 300 card catalogs, but not the local one? How about catalogs within 30 miles? Or within the cost of a local phone call? In scholarly publishing, similar questions: What's just part of the overall cost of admission? Electronic access to content beyond what might have appeared in print? Articles that have or have not yet been peer reviewed? Raw data? The software to analyze it?

  6. Both the library and Disneyland have the same competition: Electricity, satellites and phone lines, and the wonders they deliver. And of course both must harness this potential competition to keep fresh, to remain attractive, to serve their public in the ways being demanded -- whether the public recognizes or not that it is demanding this freshness. Disneyland embraces television and the movies; libraries -- along with the sources of their holdings, the publishers -- embrace the satellite and the computer.

The title of this talk, "Electronic Texts the Day After Tomorrow" is quite interesting, because it suggests that there is a technology gap. The technologies that will be popular and useful five years from now exist today, in the research labs, in the garages, in the twinkle of some software developer's eye. Well, that's "tomorrow".

Clearly this talk assumes that if the technology already exists, then it's not part of this talk. So: "the day after tomorrow". What do we think that means? Ten years from now? Twenty? How about four centuries, say to the time of the Star Trek television series? In that case, the real question becomes: Is the federation starship Enterprise the model of the research library of the future? Why are there so few books, magazines, or journals on the Enterprise? Why are the only books one ever hears of by Shakespeare, Sherlock Holmes or Mark Twain?

This surely leads us to imagine that sometime shortly after Sherlock Holmes, books either ceased being written or ceased being worth referring to. (When I say `books' here, let them stand for all paper publications.) I'm skeptical enough of this position that we can safely assume that the interesting aspect arising from the lack of books on the Enterprise is only that the writers of this particular piece of science fiction assume that books must vanish in order that a show about the future appear futuristic.

Ah ha! So books are old-fashioned. Our first instinct is to agree with this position. I went off recently and bought some books about Virtual Reality, and Cyberspace, and related trendy topics. I was delighted that someone was willing to use the old-fashioned technology of print to produce a digest of information about what are ostensibly dangerously modern ideas. But, look inside one of these books, a 317-page reference text called Mondo 2000: A User's Guide to the New Edge, alphabetically organized by very modern topics (such as "Hyperreality" and "Evolutionary Mutations"). I was astonished and somewhat disappointed to discover that the book had a perfectly normal table of contents (albeit hard to read red on darkish grey). I don't know what I had expected in place of the contents page, but it is certainly true that I was surprised to see one. Throughout the book, someone has reinvented the marginal note: In the text, certain phrases or titles are boxed with a one-point coloured line, each box a different colour. Off to the right, in a separate column, text related to that highlighted phrase appears as a title for the sidenote. This is a big step forward from footnotes: there are no numbers.

To be truthful, there are footnotes too, sort of, endnotes really. A small green triangle -- which happens to include the eye on the pyramid from the US dollar bill -- appears from time to time, signifying that the current referent may be purchased, and that at the back of the book there is a list of recommended purchases, "an annotated directory of source materials". Interestingly, these references are organized by chapter, and remind one of any scholarly text with its footnotes all bunched up at the back of the book, chapter by chapter.

One could certainly argue that Mondo 2000 is at another extreme of publishing from the staid, one-colour academic journals we're all used to, notwithstanding that this strange modern collage thinks of itself as a reference text. So why am I bringing it up here? For a straightforward reason: I believe that the discussions going on at this symposium and that have been going on for three or four years at events like this are more fundamentally related to the electronic texts of the future than this trendy new book in every way but one: For some strange reason or reasons, Mondo 2000: The User's Guide to the New Edge is selling a lot of copies and, most important from our point of view here today, it feels futuristic.

So I'm willing to believe that there are lessons here to be learned. Now, I bought this book because it is a work of reference and because I thought it would be (by virtue of its reputation) somehow representative of the future of publishing. If it is representative, it's for four reasons:

  1. Because, first and foremost, all the information is in short bites. It's hard to get bored with any topic; the only option is to get bored with the whole book.

  2. Because it's very colourful, beyond where the colour has meaning or usefulness. It's more subtly and more exquisitely designed than the USAToday newspaper, but certainly genetically related to it, by way of the more chaotic, so-called "Women's Magazines", a real showcase of moderndesign.

  3. Because it feels topical, even throwaway. It feels like a magazine even though it looks, weighs and costs like a book. The closest previous example to this strange "book of the moment" sensation is probably The Whole Earth Catalog.

  4. Because it acts as if paper hypertext is a big deal. The book, by virtue of the variety of kinds of cross-references and the colourful, mayhemic design, feels like a multi-media extravaganza. This is good design. In reality, Mondo 2000 is just glossy pictures at a variety of sizes and sharp crisp colourful text on very glossy paper.

This, of course, brings up the issue of commerce and markets. Definitely, looking crisp and futuristic is critical for Mondo 2000 no matter how old-fangled it is secretly. I raise this issue here because I don't know anything about the marketing aspects of your world and feel, intuitively, that marketing is somewhat taboo. Whenever I've used on-line services in a library, even the very first time, I never had the sense that this was new and exciting. The experience was marketed to me, as it were, as if it was a standard, almost dusty, library service. To this day I have no sense whether I was the first person on campus using the service or the 10,000th. On the other hand, I know I was one of the first three people in Canada to have a Macintosh -- and much fuss was made about that. Let me be a little clearer about this: Except for some friends in the academic community, I'm probably the only person I know who thinks libraries and scholarly publishers are an exciting hotbed of technological advancement, and, to randomly mix metaphors, if not a cornerstone of next generation information-providing, then at least a bell-wether in the flock.

My goal now is to talk about six dimensions of seven technologies and then to get to the tunnels under Disneyland, and then answer the big questions of the information decade: What Persists? What Should Persist? How Do You Make Things Persist? And there is one final question, of course, which all of you who have ever heard me speak before are now asking yourselves: Why Hasn't He Even Mentioned SGML? Maybe later.

The first of the seven technologies that will impact electronic text the day after tomorrow is Simulcast Publishing. Many of you will have heard of those broadcasts that happen simultaneously on TV and FM radio. Each stands alone, but together there's something synergistic afoot. From all the hype about audio, video, and so on, one might wonder why this talk even has "text" in the title. Surely, maybe, we should be preparing for a multi-media world and talking about the impact on scholarly publishing of accessible photographs and video clips on CD.

Certainly I've run into people who treat all the differing media as equivalent -- in the sense of "interchangeable". They argue that in preparing a work for publication, one must decide, early on, whether the goal is a book or a video or a software program. All other decisions -- or at least many of them -- they suggest, arise from the choice of medium.

But this no longer makes sense. The medium is no longer the message. In a controlled experiment in which subjects were bombarded, over a period of time, with USA Today, People Magazine, certain kinds of chatty radio announcers and episodes of the Entertainment Tonight television show, I would be willing to wager that three days later, the subjects would be hard-pressed to tell you whether the source of certain information was newspaper, magazine, radio, TV, or hearsay at the coffee break. The specific properties of the medium, much of the time, seem to border on irrelevant. Everything comes in sound bites. I've recently engaged in a project in which we took the electronic manuscript of a book and generated typeset pages, braille, a large print typescript and the input file for a voice-synthesized edition --all from one source file.

Now this was a very interesting prototype, in part because the book's publisher often wins awards for its designs. Three of the four editions of the book were "undesigned"; that is, in each of those cases, the design was prescribed by forces larger than typographic. Interestingly, those designs were dictated entirely by the structure -- some might call it logical structure -- of the book.

What is important, given the characteristics above, is that we provide information in a way that others can use the message (or contents, call it what you will) in the presentation form that makes the most sense. That is, it used to be that the designers, production people or traditions dictated the medium -- acting as if on behalf of the content. A one-way flow -- content to medium/presentation. Now, the contents meets the user first (in a sense) and together they choose the medium.

I've come up with six tests or axes around which I'll discuss the seven technologies I'm interested in. The six tests are:

Describe in twenty words or less
Is it built on something I already understand?
Does it have a revolutionary component?
Does it do good, either for me or the world?
Does it provide value, save or make money?
How is it public; how is it private?

Technology 1: Simulcast Publishing

Twenty words or less: Optimizing the value of investment in content, acquisition, preparation, production and promotion through multiple simultaneous use.

Is it built on something I already understand? Yes. We all recognize that secret codes in a computer file can make a phrase turn italic or bold either on paper or the screen. We recognize also that the hierarchical structures in a document -- the chapters, sections, subsections and so on -- help us understand and navigate the content. We recognize that there are logical structures in a document -- tables, charts, footnotes, lists, etc. -- that mean something to us, perhaps irrespective of the medium in which they're presented.

Does it have a revolutionary component? Insofar as it's a big deal to recognize that content can be simulcast, the way that previously it was only marketing that was exploiting simulcasting, yes.

Does it do good, either for me or the world? The Braille and large print example says yes. I'm not so sure whether the same can be said for the more traditional simulcast of Batman movie, book, T-shirt and toy tie-ins.

Does it provide value, save or make money? Yes to all those questions. Assume that the price at which you buy a chunk of information is divided up into these general Cost areas:

Delivery & Sales

The tricky part in simulcast publishing is in distinguishing which costs must be duplicated for every medium and which not. The interrelationship between the above areas shows you why there are now such tight fiscal ties between movie studios and television networks and book and magazine publishers. They are able to exploit the cost of the sale -- specifically the promotion and marketing -- which for them is a very large chunk of the total cost. I don't know anything about it from the inside, but look at the National Geographic Society and the extraordinary ways in which it exploits its costs of context and content. An example for us all.

How is it public? How is it private? How is it a broadcasting tool? How is it personal? All of the above. That is, in fact, one of the strengths of Simulcast Publishing. The same structures (administration, editing, production, marketing, sales) can produce something which is broadcast or personal. (See also Custom Publishing.)

Technology 2: Content Beyond Text

Twenty words or less: Redefining content to incorporate the most useful, raw or eloquent data in user-processable forms.

Is it built on something I already understand? Several years ago -- maybe even 10 -- business magazines started providing 1-2-3 spreadsheets. Journals started talking about making available to subscribers laboratory results and raw data and even software. Such steps are built on the notion that you provide information in a form most appropriate to its meaning, use or display. The American Astronomical Society now lets you dial into its telescope facilities to propose projects.

Does it have a revolutionary component? The leap, I think, was into fodder, or grist. Until recently publishing assumed, much of the time, passive activity on the part of a subscriber or buyer, that is, reading. And then, long long ago, computer hobbyist magazines moved from reprinting page after page of code you could key in yourself, to code on a cassette tape, and then on a floppy. That was a tiny step, because the real leap was in providing data for one's Lotus 1-2-3 spread sheets -- well actually two leaps: first, providing data, and second, providing data to which one could do anything, manipulating in ways not foreseen by the compilers or publishers.

Does it do good, either for me or the world? Yes, as above. Insofar as more accurate expression of intent is good, certainly. Insofar as greater ability of others to recreate experiments, test hypotheses, re-organize contents meaningfully is good, yes.

Does it provide value, save or make money? How is it public? How is it private? How is it a broadcasting tool? Yes, it is optimized nicely for private investigation, corroboration, and/or exploration even though it is a broadcast technique.

Technology 3: Reading on Fast Forward

Twenty words or less: Dealing with overload by using structural, logical, visual, and editorial cues to move one's attention quickly to important information.

Is it built on something I already understand? Yes. Sighted people glance at things -- including important content -- and get what they need from a very quick assessment. Blind and dyslexic people need to find explicit clues in what they read or hear to achieve the same level of skim. Like the ramps on public buildings designed originally for wheelchairs but now used by anyone with a baby stroller or luggage cart, skimming tools for the visually disabled based on logical text structure will benefit all of us in ways we can't yet imagine. When one uses fast forward on a video tape, one stops at a change in colour, tempo, scene. On an audio tape: at a blank or change in timbre etc. This is exactly the same technique that needed in electronic books to get to "section/subsection" breaks, or indeed, to any change in the format.

Does it have a revolutionary component? Several years ago at the University of Toronto, the late professor Paul Kohlers conducted tests which showed that the comprehension and retention of test articles was raised if one increased the point size of the most important words in a selection, and decreased the size of the least significant words. This is a technique that deserves further study and probably deserves mass exploitation.

Does it do good, either for me or the world? Without optimizing the skimmability of information, we'll drown in it.

Does it provide value? Yes, absolutely.

Does it save or make money? How valuable is your reading time?

How is it public? How is it private? How is it a broadcasting tool? How is it personal? Structured skimming appeals to and supports the individual's ability to read/learn/absorb/react at one's own pace.

Technology 4: The Internet and Its Ilk

Twenty words or less: Capitalizing on electronic networks to overcome time and geography; building communities and webs of information in thin air.

Is it built on something I already understand? These are just dull ASCII mail and news services. Sure, everyone knows you can send data on phone lines.

Does it have a revolutionary component? It's everywhere. The services themselves are dispersed around the planet. It's a leveler. You can send e-mail to anyone. It scales well.

Does it do good, either for me or the world? The dissemination of information irrespective of boundaries must be a good thing. China, the collapse of the Soviet Union -- there are many famous examples of news breaking on the Internet.

Does it provide value, save or make money? I've never asked. Or figured it out.

How is it public? How is it private? How is it a broadcasting tool? How is it personal? All of the above. No two people subscribe to the same groups and use e-mail the same way to the same community. The challenge: to use it wisely and well and optimally for the content that's appropriate. Do we need to send 3D graphics and video clips over the Internet?

Technology 5: Custom Publishing

Twenty words or less: Designing a database geared to retrieval, conciliation and delivery of anthologized component units, in small quantities.

Is it built on something I already understand? Must be. Everybody seems to "get it" right away.

Does it have a revolutionary component? One gets to architect a unique building out of what are more or less standard parts. The architecture of the information is the foundation for the complexity of the individual pieces, that allows one to build tables of contents and indexes on the fly, that lets one build a new "whole". Also: the notion of "chapter-length" ideas is very interesting. One no longer has to write an entire textbook to sell into the classroom.

Does it do good, either for me or the world? Fewer missing trees. Emptier warehouses. More timely delivery.

Does it provide value, save or make money? Yes. Custom publishing adds money to royalty pool which might not have been there otherwise.

How is it public? Custom publishing workstations/laser printers/binders will be available at campus libraries and bookstores. They will spread to any location where they make sense. How is it personal? Customized but uses cost savings of mass production.

Technology 6: Demographic Delivery or Hands-free Publishing.

Twenty words or less: The often frightening spectre of demographic modelling must be tameable, using our experience, disguised as a "statistical usage report", of a previous publication.

Is it built on something I already understand? Demographic delivery is just like custom publishing except automatic.

Does it have a revolutionary component? The system extrapolates. It knows what you've been reading. It knows if you're awake. It knows if you've been bad or good.

Technology 7: Virtual Library

Enough of you know the concepts behind the Virtual Library -- libraries unlimited by the physical limits of a local library -- that I won't spend time on this notion except to say that I have one concern: I believe the real goal of library publishers is to be invisible. Not disappear but to be a medium in the old-fashioned-sit-around-the-crystal-ball sense: To be a channel, a conduit, or to be a lot of channels perhaps. To be like the tunnels beneath Disneyland. To be the wind behind the back of anyone seeking information. Invisibility does not mean not adding value -- the values of selection, pruning, assuring quality, making accessible in a timely fashion, supporting future accessibility -- notice how one can't tell from this list whether I'm talking about publishers or libraries -- on the contrary, in periods of abundant information, selection services are at premium.

Perhaps the Starship Enterprise is a silly model, but 20 or 30 years of building a social paradigm for a floating world inevitably achieves useful insights. I ask the computer a question. It answers me. Sometimes it alerts me to conditions or information without my asking. Is there a better model anywhere for how I, as a reader or researcher, want to deal with information? And therefore with libraries and publishers?

Admittedly sometimes others ask questions for me, and I put myself in their hands.

A conference or a symposium is actually a magazine whether its proceedings are published or not. The agenda is the table of contents. The conference organizers, obviously, are the editors and reviewers with a strange twist or two: They publish articles into the air, as it were, and in most cases, without reading them. There's an odd element of trust in their publishing technique. Ann Okerson published my talk today -- that is, invited me to speak not knowing what I'd say. More than most editors, she helped guide the creation of this article by describing what she was looking for, generally, in an opening talk, providing a statistical breakdown of attendance, and giving me a strong sense of where this community stands in and contributes to the evolution of electronic text.

A month ago I chaired the annual SGML conference in Danvers, MA. Some 270 people took part, among them Michael Sperberg-McQueen of the University of Illinois at Chicago. He later posted to the Internet a long "trip report" on the Conference, and at my urging, the text of his remarkable closing keynote. It's archived now among the electronic archives of the newsgroup comp.text.sgml and has been seen -- if not read -- by a large number of the 16,000 people who, according to one survey, subscribe to that newsgroup.

What was your role in all this? What should it be? If I wander into a University of Toronto library and want to learn about SGML, having electronic access to 300 card catalogs isn't going to help me. Michael's article may never appear on paper, has been reviewed by no one (with the slight "selection" aspect that I invited him to talk and, as Ann did with me, spent time discussing the contents) and aren't indexed anywhere. Yet, this work is both very good, and like millions and millions of well-organized words around the world, should not be lost. With whom does a publisher or a library partner to get me access to material I should at least have the opportunity to ignore?

What about invisibility anyway?

I believe every home should be a library. Every desktop in fact.

You should not be able to tell the difference between 'asking a question' and 'doing research'.

You should never know what software you're using. The computer's job is to answer your questions and work with you to make your answers accessible to you and others.

You shouldn't be able to tell the difference between a computer and a library. (Nonetheless, I happen to believe books and journals and magazines must persist. I'm less sure about newspapers and I'm certain that telephone directories, classified ads and airline schedules should disappear. I want to see dusty old maps and have ancient volumes displayed to me on stands with instructions that I must only touch the pages to turn them.) But I want a computer to point me to the right book.

I saw a revealing photograph of Disneyland in a United Airlines magazine, a shot of Mickey Mouse -- who is enormous in real life -- talking to a street cleaning person in a very tall, very wide tunnel underneath Disney World. A complex network of tunnels is what lets the Peaceful Kingdom function as well as it does and why you never see Mickey or Minnie or Goofy or Donald ducking into a washroom or eating lunch. The analogy is pretty rich. The architecture of the tunnels is the same no matter what public facility they support. The services they provide are constant, and silent. They keep complications -- like transport vehicles and emergency personnel -- out of the visitors' way, while providing an underpinning to the whole operation.

On one level, publishing is like those tunnels, making available the attractions above ground with subterranean structures. But for me the most interesting aspects of the Disneyland tunnels are their dimensions and their materials and their layout. Why? Because they are completely consistent wherever they go. They're the same beneath a pirate ship and beneath a hotdog stand, providing the consistent system services below which support and enable the mad variety of extravaganza above.

That, incidentally, is what SGML is all about. Several of the speakers over the next few days will describe projects built on a foundation of SGML. The reason I haven't mentioned SGML in this whole talk is because it's there throughout. That's the common storage format -- and language to build new formats -- that allows the Braille simulcast, that will support the next generation of custom publishers, that can subsume and extend MARC. That provides the necessary hub for multimedia and content beyond text. That marks the highlights and hierarchies for skimming and fast forward displaying. And that, by co-incidence, lets you optimize the long-term value of every keystroke you own today, whether you're a publisher, a librarian, a conference organizer, a teacher, a learner or a reader.

Without information storage structures, retrieval and inter-operable access at the levels we're talking about here are meaningless. All the technologies I've described today really become exhilarating only when you smoosh them all together. If SGML didn't exist, the foreword momentum of so many exciting possibilities would force us to invent it.

And now, the Big Question: What Persists?

  1. Data persists, and also opinion, both informed and not. Together they create information. Information persists if it is accessible -- and it is only accessible if it looks like Swiss cheese. That is, it must have entry points, and tunnels, and one must recognize its shape be overwhelmed by it.

  2. The imagination persists. But in this case I mean both imaginations: the imagination of the creator/supplier/presenter of information -- the imagination to select wisely, to present eloquently, to charge appropriately. And the imagination of the reader/user/seeker -- to ask the right questions, to appreciate the best possible answers, and to imagine requirements that push the supplier to meet one halfway. Flash and splash without a foundation of real information are meaningless.

Yes new skills are required. This symposium will teach some, will hint at others, will introduce you to people who obviously have them. Yes you will form new partnerships and collaborations. Remember that 92% of what you need to know today for your work will stand you in good stead no matter how weird the technology gets.

Now, I have a confession to make. I have withheld information pertinent to this case. There is an episode of Star Trek in which a famous research scientist comes aboard the Enterprise and the chief engineer greets her by saying what an honour it will be to have the opportunity to work with her.

"I've read your work on poly-nucleating fluctuators," he says (or whatever it was).

As far as I can see there are only two explanations for his comment: Either some scriptwriter thought a lot about how experimental results would be propagated in the future and recognized that at some point, one must admit that scholarly publishing would, of necessity, continue to operate as it does now, although perhaps with greater variety in its delivery; or some scriptwriter didn't think about the issue, but recognized that in the context of the 24th century such a comment didn't seem out of place at all.

Either way, we win.

