Access to Primary Sources: During and After the Digital Revolution

Keynote Address for the Berkeley Finding Aids Conference
April 4, 1995
Berkeley, California

by Patricia McClung

The problem with being smack, dab in the middle of one of the biggest revolutions in human history is that it is really hard to make decisions. Every day there is a new vision of the future that wipes out all the assumptions the last one made. In the meantime we information professionals have cultural institutions to run. They are big, tradition-bound beasts, with enormous amounts of material to be organized, processed, stored, preserved, and made available to the publics we serve. And if that isn’t bad enough, our public is reading the same prognostications we are, and expects us to deliver on that promised land where all information is going to be online, seamlessly integrated and readily accessible to users all over the world. A land where there are little “knowbots” much smarter than any librarian or archivist we know (at least since Alan Tucker passed away)--which will go out and cruise the Information Infrastructure and bring back everything of interest to the desktop.

Given the existing technology and the expectations of our users, the question I’m going to focus on is: how can we take advantage of this digital revolution to make primary source materials widely available.

There are thousands of special collections, museums, archives, and historical repositories around the country, whose purpose is to preserve and make available a representative selection of materials (in all formats) that document our history, culture, and knowledge. Cultural institutions serve a critically important function in our society. Over the years we have developed systems and standards for acquiring, describing, storing, preserving, and providing access to these materials. We used to do these things manually, but increasingly the functions have migrated to an online environment. Libraries have been pioneers in using technology to manage collections.

Up until now, we have been using new technology to handle familiar routines in more efficient ways. The technology supports, more or less, what we did before, doing it faster and better (though not cheaper, as we had originally hoped).

Paul Saffo of the Institute for the Future in Menlo Park describes the typical 30 year cycle it takes “any truly new idea to fully seep into our culture.” He calls this first phase in a major transition the “cowpath” decade. That’s the period in which new technology is put to old uses, perhaps in a slightly different way.[1] Although it’s hard to be sure from the eye of the storm, it seems that we are just completing that stage.

We are now entering the second decade in the cycle of change. We are starting to reconceive the way we do things and experiment with new solutions. That makes decision-making much harder. There’s less certainty and more risk. This is the period in which, (and I quote) “everyone with a crazy idea [runs around] pursuing their dreams on a shoe string budget and without a lot of adult supervision”--a stage Saffo calls the “creative skunkworks.”[2] He uses the analogy from the film industry of the breakthrough made by “The Great Train Robbery.” As you’ll recall, it was the first film to be shot outside, to use camera movement (or panning), as well as to cut from one scene to another. It absolutely transformed the medium.

In libraries and archives around the country there are lots of projects that fit this description. I think the Berkeley Finding Aids is one (my apologies to Daniel, but it’s a far cry better than the cowpath). And there are many others: Red Sage, TULIP, Helios, American Memory, Making of America, the Muse Educational Site Licensing Project, and the National Science Foundation Projects to name just a few. This is a critical time when users try a lot of new things in order to figure out how to make best use of the technology.

Just so you’ll know what to expect when we get to the third stage: Saffo describes a sorting out of the creative chaos from the skunkworks period. That paves the way for the “moguls, financiers, and lawyers, whose job it is to fully integrate the maturing technology into society.”[3] Now there’s a scary thought.

Let’s leave the third stage to the futurists, and return to where we are now. I’m going to take a broader perspective for a moment and consider the impact information technology is having on our society. (Misery loves company.) I think all major information and communications industries are entering this middle phase.

During the last 10 years, U. S. companies spent more than a trillion dollars on information technology. During the 1980s, overall productivity increased by only 1%, and white collar productivity actually decreased by about 3%.[4] (Perhaps they had the same experiences I did trying to get the new technology to work.)

What’s the problem? We still have a long way to go in this transition--and managing from the middle of it is very difficult, no matter what field you are in. So far, only about 10% of information is in electronic format--the rest is still in paper. Despite all the predictions to the contrary, and the massive efforts to reengineer business processing, the paperless office just hasn’t happened. “In fact, the widespread adoption of information technology resulted in about three times as much paper generation as before.” [5] (While that was reported as bad news for business, just think of what it means for archivists.)

Another telling sign that our entire society is in the creative skunkworks period is the current state of the Internet. It is one of the most exciting developments thus far from the information revolution. And characteristic of the mid-phase in a major transition, it is a total surprise. Without really intending anything of this magnitude, the Defense Department and their university research partners built it, and now everyone has come. The Internet has become an absolutely essential piece of our communication system--right up there with telephone and fax.

While gophers, anonymous FTP, World Wide Web, Mosaic, and Netscape contribute to the excitement about the Net’s potential, so far it is electronic mail that has transformed, not only our own professional context, but that of the users we serve. The fact is that, thanks to the Internet, real time collaboration is possible in an international context.[6] What we are moving towards is interaction that is much “closer to the speed of thought.”[7]

The Internet is particularly interesting in our context because we share it with the people that we always think of as “our” users--those who look for information in libraries and repositories. We use it for the same reasons they do, but we also use it as an integral part of our daily work. It is, in many ways, the best window we have on what users really want from the emerging Information Infrastructure, and it is also a place where we can make a difference to them.

Take for example Paul Ginsparg from Los Alamos National Laboratory, who was recently featured in a Scientific American article.[8] He has a computer in his office that responds to 20K electronic mail requests every day for abstracts of new papers on high energy physics. Twenty thousand requests from 60 countries serviced every day. Now that’s a revolution.

Although this is a mind boggling development, we all know that the Internet has a long way to go. Sure, there is lots of stuff out there in cyberspace, but there are few mechanisms for sifting and evaluating it, which is what libraries and archives do.

Lots of people are inclined to confuse the Internet’s potential with its current utility. I was struck by a recent Newsweek article in which the chancellor of the 22 campus California State University system (not to be confused with the 9 campus University of California system) says of the new Fort Ord campus under construction:

“Why waste all that money on bricks and mortar and expensive tomes when it could be better spent on technology for getting information via computer. You simply don’t have to build a traditional library these days.”[9] I suppose it depends on what he means by traditional.

All I can think of is that the students are going to have plenty of beach time, because there just isn’t a critical mass of material available online. Yet we are in danger of having the ‘moguls, financiers, lawyers,’ and propellerheads create a scenario that won’t deliver what people need. We must speak up.

On the other hand, the chancellor’s pronouncement--whether realistic or not--gives a good indication of what users expect right now. Without further study of any kind--based on watching users every day and searching for information ourselves (as students, educators, citizens, and parents), we know some other important things about what users want in this new information environment.

They want access to be simple and cheap (preferably “free,” as in someone else pays). Further, they want to have lots of options and choices, and they want their access to be comprehensive, flexible, instantaneous, download-able, manipulable (forgive my linguist liberties), and perhaps most important of all, unmediated. And if we don’t get it right, like Paul Ginsparg at Los Alamos they’ll just do it themselves.

Until now, the focus of our efforts to describe collections of primary source materials has been on the exchange of information with each other (in codes that only we understand).[10] Our function has been largely custodial and curatorial. That was a big step forward, and hard enough in itself.

There is lots to feel good about in the great strides we have made to bring primary source materials under bibliographic control. In the process, we have created de facto union catalogs in RLIN and OCLC. We are beginning to realize the potential these have for our users--in addition to the collection management functions they serve. Just as we can start to see a faint light at the end of the collection description tunnel, we realize the user needs a lot more information than those records contain. We now have the tools to make that information available online and to integrate primary sources with other information.

While I said we know a great deal about what users want, there’s a lot more to learn. For one thing, we need to focus on how well our traditional finding aids actually work, and also look at the potential that our national databases have to lead users to the materials they need. As Lisa Weber put it in a speech at the Montreal Society of American Archivists meeting:

“We continue to struggle with the information retrieval issues of access by subject, provenance, and function, levels of indexing, and indexing consistency. On another level we are not sure how our traditional finding aids and control mechanisms such as accession records, donor records, records schedules, and agency history data fit within the context of MARC AMC.”[11]

Now on top of these issues, we are layering a whole new set of problems. We have to keep the user at the forefront as we make choices about where we are going to expend our finite resources and energy. The information revolution--from our professional perspective--should be about transferring control of access to the users. And for many people, relinquishing control over the materials will be harder than the technical piece.

What next?

In preparing for this talk, I called a number of colleagues around the country to see what people envision in the way of electronic access to primary source materials, as well as to find out if the enormous effort and expense is justified.

It turns out that there isn’t a lot of controversy about this. Most everyone I talked to described a near-term future in which there are agreed upon standards for putting existing finding aids online--building on the Berkeley Finding Aids Project. These online guides would be the middle piece in a three- (or four-)tiered hierarchy. Collection level records (in both institutional OPACs, as well as RLIN and OCLC) would be above, and digital facsimiles of the materials themselves, beneath. We can start to experiment with truly integrated online information systems.

It is clear that this project has captured people’s imaginations as it has demonstrated both the feasibility and the desirability of mounting a large-scale effort to make “finding aids” as accessible as online collection level records. This project is an experiment in the next frontier: what Ronald Weissman would call information processing, as opposed to data processing (see his excellent article in the 2020 special edition of American Archivist).[12] This distinction is about capturing and conveying the meaning in a document instead of just flat ASCII text. It is also about new methods for navigation and connectivity. We are in the process of launching a new generation of searching and finding aids that will lead ultimately to what our colleague Avra Michelson refers to as “the enhanced autonomy of the researcher.”[13]

I submit that this is a worthy goal in all our undertakings, and it is what the user wants.

This Finding Aids project is testing the use of Standard Generalized Markup Language (SGML) as a navigation tool for finding aids online, and it is trying to come up with a standard document type definition that would enable widespread use of this approach. SGML is being touted as the potential standard that may emerge from the several possibilities available for encoding texts. Even The Economist recently reviewed the various humanities encoding projects underway, and speculated in positive terms about the future of SGML.[14] But if Weissman’s predictions are any where near the mark, SGML will serve only as a stopgap in the evolution of smart tools that will transform research.[15] The fact is that no one knows for sure; and until it all sorts out, we need to learn as much as we can from the best tools available.

Another important feature of the Berkeley project is that it is being conducted with an openness and sensitivity to an expressed need for broad community consensus. The approach has been catalytic. Which leads me to my most important point: collaboration.

Telling this group about the importance of working together for common solutions would really be preaching to the choir. The hard work that the archives, manuscripts, and special collections professionals collectively put into the development of the AMC standard improved dramatically and forever the information environment for all kinds of users. It brought these ‘fugitive’ materials into the information mainstream. We are here today to continue that work. We are also here to learn more so that we can be better collaborators, rather than just admiring bystanders.

There are a number of questions that deserve our collective attention:

  1. What equipment and software are our users going to need for the tools we develop? Are they standard and commonplace (because if they’re not, we had better think again)?

  2. What new opportunities does the existing technology environment present--for us as information professionals? Can we transform ourselves--never mind the finding aids--to fulfill expanded service and access functions, using the new electronic tools to decrease the tension between preservation and access?

  3. Can we move on from our competitive traditions in which institutions strive to carve out territories of excellence, defined by ownership and control of information, to become facilitators of access?

  4. Can we live with the inevitable abuses that will occur (and offend our sensibilities) in order to support and promote expanded use of our collections for productive and creative purposes?

  5. Can we find or create a level playing field where we can experiment together and contribute standards, tools, and a corpus of accessible materials to the networked information environment? Can we lend our expertise about structure and navigation and hierarchy and description to help bring order to the chaotic environment that exists on the NET?

As we take up these next challenges, bear in mind the motto of the Aldine Press: festina lente, or make haste slowly.[16] Regardless of whether we are still on the cowpaths or into the skunkworks stage, we have a long road ahead. In many ways that is reassuring. To paraphrase St. Augustine, “give me chastity, but not yet.”[17]

In closing I would emphasize that we have a critical role to play in figuring out how technology can enable powerful, well-integrated information access for research, study, and general interest. The materials we have been collecting and taking care of are still a well kept secret. We have the tools now to make them more accessible. Our users--and potential users who don’t have a clue this stuff exists--deserve this.

We have some very hard decisions to make, not only about how to provide better access, but to what. Clearly, all those things we’ve collected are not going to make it into the new information environment. We will make better decisions, and ultimately more tangible progress, if we build on our collaborative traditions of consensus, compromise, and experimentation--together.

I’ll leave you with some lines from “Casablanca,” that are relevant in our situation:

Ilse: “Can I tell you a story Rick?”
Rick: “Has it got a wow finish?”
Ilse: “I don’t know the finish yet.”
Rick: “Well, go on, tell it. Maybe one’ll come to you as you go along.”[18]


Footnotes

[1]
“The Electronic Pinata: Information Technologies and the Future of the Library,” Institute for the Future, Menlo Park, CA, 1993, p. 13.

[2]
Ibid., p. 15.

[3]
Ibid.

[4]
Gregory S. Curhan, Documents Go Digital: Electronic Information Management, Volpe, Welty & Co. Equity Research (San Francisco: October 20, 1994), p. 2.

[5]
Ibid., p. 3.

[6]
Roger Clarke, “Electronic Support for the Practice of Research,” The Information Society, vol. 10, p. 28.

[7]
Gary Stix (quoting Steven Harnad), “The Speed of Write,” Scientific American (December 1994), p. 107.

[8]
Ibid., p. 106.

[9]
“Wiring the Ivory Tower,” (January 30, 1995), p. 62.

[10]
Weber, “Putting Archival Cooperation Into Focus,” an unpublished speech to the Society of American Archivists Annual Meeting, Montreal (September 1992).

[11]
Ibid.

[12]
“Archives and the New Information Architecture of the Late 1990s,” American Archivist, vol. 57, no. 1 (Winter 1994), p. 21.

[13]
Avra Michelson & Jeff Rothenberg, “Scholarly Communication and the Information Technology: Exploring the Impact of Changes in the Research Process on Archives,” American Archivist, vol. 55, no. 2 (Spring 1992), p. 244.

[14]
“The Lays of Ancient ROM,” The Economist, August 27, 1994, p. 72.

[15]
Weissman, p. 21.

[16]
Saffo, p. 17.

[17]
Stuart Lynn used this quote in a similar context at an RLG Syposium on Digital Imaging Technology for Preservation, held March 17-18, 1994 at Cornell University, Ithaca, NY.

[18]
John L. Casti, “Preface,” Complexification: Explaining the Paradoxical World Through the Science of Surprise, Harper Collins, 1994, p. X.