Settling the Digital Frontier

The Future of Scholarly Communication in the Humanities

Paper presented at the Berkeley Finding Aid Conference, April 1995


Daniel V. Pitti
Librarian for Advanced Technologies Projects
The Library
University of California, Berkeley

Moving information off of paper and into a worldwide network of computers promises to dramatically change how archivists, librarians, scholars, and publishers interact with documents and one another. From the limited perspective of the archivist and librarian, I will discuss both the challenges and the opportunities presented by this information transformation, dwelling more on the latter than on the former. Before looking at the challenges and opportunities, however, I will first take a look at what the transformation has not altered, and thereby isolate and delineate the issues, making them at least appear to be more manageable, a little less overwhelming.

In 1968, Patrick Wilson wrote a philosophical evaluation of the nature of information titled Two Kinds of Power: An Essay on Bibliographical Control. In the very first paragraph of the introduction, he made an observation about the human record that remains worthy of examination today:

The world is full of writings. In libraries, archives, offices, and attic trunks is an enormous and rapidly increasing mass of written material of all sorts, the products of learning and imagination and speculation, of observation and painstaking record keeping, of public and private business. Some of the writings are of lasting interest, representing the cores of civilizations, bodies of literature and law, religion and philosophy, theories about the world, and recipes for successful action. Most are only of passing interest to anyone, despite their being records or traces of human activity; not all of our history is worth remembering.

Indeed the world is full of writings. In fact it is much, much fuller than it was in 1968. To the list "libraries, archives, offices, and attic trunks" we can add computer servers large and small with various memory and storage devices large and small, all teeming with information laden digital objects. And these same devices also make us more efficient in creating information, adding to the plenitude. I think I am safe saying that we live in an ever more corpulent information world. As overwhelming as this image is for us as archivists and librarians, we can take comfort in the final sentence in the Patrick Wilson paragraph quoted above: "Most [information objects] are only of passing interest to anyone, despite their being records or traces of human activity; not all of our history is worth remembering." As archivists and librarians, we should take refuge in this observation. We are not responsible for all of the information generated by humans and human-made machines; we are only responsible for that portion of the total judged worthy of being remembered. As comforting and helpful as this is, it is only momentarily so, for the complex system of interaction of archivists, librarians, scholars, collectors, creators, and publishers that results in the worthy being found, discovered, called to our attention, and selected for enduring availability and preservation does not yet exist in the digital information world.

Selection for preservation and access is only one function of the interaction of the critical players in the world of recorded knowledge. There are many others. Copyright and licensing; cost recovery; secure mechanisms for commercial exchanges; clear, rational access and navigation; refereed peer review; authentication; and preservation all currently lack accepted, effective systems in the digital world. It is the lack of accepted, effective systems for these and other culturally critical functions in this transformed information environment that presents us with challenges.

In the paper-based information environment the book, journal issue, or manuscript is the "place" where the archivists, librarians, scholars, and publishers bring their respective contributions and expertise, and where they interact with one another. An author creates a manuscript or succession of manuscripts. In the course of doing research, long before the author commits his or her own thoughts to paper, he or she may examine the by-products of other people's lives deposited in archives and libraries as evidence. Also in the course of doing research, the scholar-author frequently consults published materials in the library. After committing his or her own thoughts to paper, he or she then passes the manuscript to a publisher. The publisher and author may pass the manuscript back and forth, revising and editing it. The publisher may circulate the manuscript among a scholar-author's peers for evaluation and comment on its worthiness. The manuscript itself may be given to an archive when the author and publisher no longer have use for it. The publisher, after editing and designing the book, has it printed and bound. This act effectively transfers the work into the public sphere, and the control shifts to the published item in all of its copies. The printed and bound book, having received the blessing of the scholar's peers, physically embodies the work in a form that is not easily, exactly duplicated, and thereby reasonably ensures its authenticity. The publisher then publicizes the book, calling both the public's and the librarian's attention to it. The library purchases the book and it becomes part of the library's inventory. A copy-cataloger looks to see if a copy of the book has already cataloged and the record entered in one of the bibliographic utilities. If not, the book is passed along to a cataloger, who sets the book down in front of himself or herself, and describes it according to a host of rules. The most fundamental current cataloging rule is that the primary object of the cataloger's attention is the physical object itself, not the intellectual work born by it. People who want to read the book borrow it from the library. Only one person at a time can borrow it. A second person wanting to read the book must wait for the person with the book to return it to the library. If a user needs the book right away, then he or she must purchase it. That the book presents the reading public with the inconvenience of only allowing one person at a time to read it makes it acceptable to publishers and authors for the library to make a copy of a work available for free. Users, for their part, trust that the work is by the author listed on the title page and, perhaps, that it is worthy of being read because this publisher has a reputation for publishing only the very best on the topic. As the years go by, a given copy of the book located on the shelf of a library begins to fall apart. This fact is called to the attention of the librarian responsible for preservation. Sometimes the preservationist decides to save the work embodied in the book rather than the book itself, and so the work gets transferred from one physical medium to another, most typically from paper to film. In other cases, because the medium is itself judged to be an important record of human activity, the library employs special measures to save the artifact or book itself. The library transfers it to a secure storage area, and only lets people use it on site, and only if their hands are clean and they have no pens in their possession. In the paper based information environment, the physical object is the focus or object of our activities. As such, it is the object of the differing kinds of control that each interested party brings to the paper based information culture.

In the networked digital world, information is no longer embodied in a physical object that must accompany it in order for it to be displayed and read. Though the information is physically stored on a physical medium at one or more locations on the network, in principle it can be displayed and read anywhere, by anyone, night or day, and at one and the same time. Networked digital information is thus not bound by space and time. The information is virtually omnipresent. It is this convenience, the total portability, the virtual omnipresence, that renders obsolete the mechanisms of control that supported the print Gutenberg culture. From this follows the uncertainty and anxiety that we witness among publishers and librarians and scholars when they meet to discuss the future. How does one protect the rights of owners of information when it is so easy and simple to copy and distribute information? How does the librarian distinguish valuable information from information of only passing interest when all of the mechanisms upon which we depend for selecting have no analogues in the digital network environment? How does the librarian describe, control, and provide access to the information selected? How does the curator preserve digital works, both the digital files themselves, and the machines and software necessary for rendering the streams of ones and zeroes into humanly intelligible forms? How do we provide free and open access to information and at the same time ensure that creators and publishers of information are adequately remunerated? If we want to charge someone for access to information, how do we control access, and collect the money? How do we control the peer review process, and authenticate and ensure the accuracy and value of what passes their review? We easily could continue this list, each of us, based on our own professional commitments and responsibilities. But the list is sufficiently ample to illustrate that archivists, librarians, publishers, and scholars do not have the control over those components of the digital information environment that they need to fulfill their responsibilities, to do what they do, to make their contributions and reap their rewards. The contemporary paper culture, with its complicated system of checks and balances representing the culmination of over 500 years of adaptation to changing social, political, economic, and technological forces, has no analogue in the still largely uncivilized digital information wilderness. This wilderness favors frontierpersons and rugged individualists.

The ease with which digital information can be copied and transported has disrupted the order of things and has thereby disoriented the participants. This disruption presents a challenge that must be met if we are to satisfy traditional real world expectations and obligations (for example, some semblance of copyright, fair remuneration for creators and those who add value to information, and the academic reward system). The technology has enabled a shift in power and control to the creators of information; anyone can instantly publish their own work to the world without recourse to the authorities and controlling mediation outlined above. While this may offer many new and wonderful possibilities, it threatens to undermine the communities that have recognized and rewarded the creators, and ensured broad based and enduring access to their creations through high quality cataloging and preservation. To meet the challenge, we need to create an orderly, structured digital environment that enables the various participants to interact with one another and to maintain control over those aspects of the environment and the interaction that enable them to contribute and be rewarded. We need an orderly digital community space that replaces the physical object as the common focus and control mechanism. This new community space will necessarily have to employ interactive and shared or at least overlapping mechanisms of control. The interaction also will need to involve mutually accepted rules and regulations that will ensure that the participants can fulfill their professional obligations and responsibilities and that their rights, including the right to fair remuneration, are protected. This space will involve explicitly or implicitly social, political, and economic institutions erected on a strong technological foundation that gives each of the players the control he or she needs. [1]

In what follows, I would like to propose that the catalog, and special structured documents linked to it, represent the axis of the new digital world order, or, using an architectural metaphor, represent the foundation of a digital academic, educational, and research community.

The central mechanism of control in the modern library is the bibliographic catalog. The catalog is a large database of extremely elaborate records describing documents of one sort or another. The central function of the catalog and the bibliographic records contained in it is to systematically and predictably describe, control, and provide access to identifiable units of information. Mostly these units are books and journals. With respect to journals, access is generally provided to the journal as a whole, and not to individual articles contained in individual issues. This more detailed level of analysis and access is generally left to abstracting and indexing services. Other forms of control are also linked to the catalog. One control module tracks the acquisition of the bibliographic materials and the transactions associated with this task. Another tracks the physical circulation of the books and serials, keeping track of who has borrowed what and when it is due and the like. Both of these forms of control, though, are in the service of the catalog, whose central reason for being is intellectual as opposed to fiscal or physical control. There is still another form of control that is related to the catalog, and while it is there to serve the catalog, it functions at a higher level, selectively condensing and normalizing critical access information found in bibliographic records. I am speaking of authority control. Archivists and librarians use authority control files to identify real world entities such as people, institutions, corporations, and societies and the name or names by which they are known. While people and corporate bodies frequently function as authors, they are quite frequently the subject of books and articles as well. Archivists and librarians also use authority control files to identify abstract subjects, which, as we all quite know, is an especially difficult undertaking. Authority records, like bibliographic records, serve the function of intellectually organizing information, isolating and naming creators and subjects of works. Authority control thus operates over and above the catalog, bridging bibliographic records by gathering works by and about an author under that author's name, and works about a subject under the name of that subject, and each with references from other forms of the name if such exist and are discovered. As described, the modern online catalog represents a civilized digital world in which the various interested parties from within the library and from without meet and conduct their business.

A little over ten years ago, the archival world began to use bibliographic catalogs to provide access not to discrete information objects, but to collections of archive and manuscript materials. The development of the MARC Archive and Manuscript Control format enabled this new use of the catalog. In this approach, instead of providing item-level cataloging, a collection of related materials is subsumed under a category or organizing principle shared by all, and treated as an integral object. Typically what makes a collection of objects cohere is that they all share the same provenance. The MARC AMC record provides a synoptic intellectual description of the aggregate. Under ideal circumstances, a fuller, more profound description of the collection will be found in a finding aid, and this will serve as the principal source of information for the catalog record.

Archivists generate finding aids as an intentional byproduct of analyzing and processing collection. While each item in a collection has discrete value, this value is enhanced or perhaps even revealed by its relation to the whole, and further, by its relations to other objects in the collection. Thus its full value is derived from it being found in a context. Finding aids represent a more detailed level of analysis than will be found in the catalog record. The depth of analysis varies, but generally falls short of describing individiual items in detail. The principal functions of the finding aid are to describe, control, and provide access to a collection. In the hierarchical structure of archival information access and retrieval, the collection-level catalog record leads to the finding aid, and the finding aid leads in and around and sometimes directly to the items comprising the collection.

Using the online catalog as a foundation, we can extend and enhance it by linking structured machine-readable finding aids directly to the collection-level records. Users, having identified a collection of interest, will be able to invoke a machine-readable finding aid by clicking on "hot text" or an icon located in the catalog record. They will then be able to navigate through the more detailed description of the collection found in the finding aid. Further, we can use the text in the finding aid to refer to and, if necessary, control other text, and to refer to and control digital representations or surrogates of primary source materials existing in a variety of native formats: photographs, sound motion pictures, drawings, paintings, audio recordings, maps, manuscripts, typescripts, printed pages, and more. In fact, anything that can be digitally captured and subsequently re-presented on demand in an intelligible form can be controlled by the finding aid, which will provide access to what it controls when one clicks on hot-text, a thumbnail,or an icon. Using structured finding aids, we can thus extend the catalog to provide direct access and navigation of digital representations of the primary source materials themselves.

As attractive as structured access to digital representations of primary source materials might be to scholars and researchers, the digital environment described so far would necessarily be under the control of archivists and librarians. The scholars and researchers would be able to visit and examine the information contained in the system, but they would not be able to actively contribute. I would like to turn my attention now to describing two complementary scenarios in which scholars would add to and enhance the extended information research system comprised of catalog, finding aids, and digital surrogates of primary source materials.

The first scenario involves informal activity by individual scholars or scholars working together informally. The second scenario involves the formal publishing of electronic scholarly works, and it therefore involves publishers and professional societies working with both researchers and librarians.

Over and above the structured database of catalog records, finding aids, and digital representations of primary source materials, it will be possible to create both private and public information spaces that reference the materials. In his or her private space, it will be possible for the individual researcher to attach notes and annotations to selected items in collections, and to establish and document relations between items and collections not made in the catalog and finding aids. Literary manuscripts might be evaluated. The notes and annotations might offer tentative interpretations and logical arrangements of the materials, perhaps even incorporated into an historical narrative. When the scholar wants to discuss his tentative evaluations or interpretations with his or her peers, he or she would be able to make the private space public, or at least available to selected individuals, who would be able to annotate the sources, evaluations, or interpretations with comments, alternative interpretations, suggestions, and the like. The scholar might permit view-only access to everyone, but limit those who can deposit comments and further annotations. Such a private space might also be established by a closed community that shares control and governance, with all members having the same rights and responsibilities. Many configurations are possible. Regardless of the configuration, two features are critical. First, the structured underlying database built by the library and archival community provides access to a rich, relatively stable and therefore reasonably predictable storehouse of primary source materials upon which scholars can confidently build their intellectual structures. Second, the archival and library community, and the research community each maintains control over those portions of the information space essential to their respective missions.

Although scholars work and communicate informally while they are engaged in their research, current intellectual and cultural fashion requires that they formally interact with the academic and publishing communities, negotiating recognition of the value of their creative works, and ultimately passing custody and control over it to others. The communities that play different roles in the comprehensive and complex process of creating, evaluating, canonizing, disseminating, utilizing, organizing, and preserving the human record require that new research, if it is to become part of the record, must be objectively established, captured and fixed in a stable, identifiable, verifiable form. [2]

Typically faculty submit the results of their research and work to their peers for evaluation. The refereeing process is commonly managed by publishers functioning as disinterested third parties ensuring fairness and objectivity. Scholarly associations and societies also can and do frequently fulfill this function, as well as other roles commonly associated with publishers. Once the work has been judged worthy, the publisher must embody the work in an unchanging form the enduring authenticity of which it can guarantee, and make it formally, publicly available. The academic review committee that considers a scholar for tenure or advancement through the institutional hierarchy generally looks at his or her publications. They must be confident that the published works submitted by the person are the same as those judged intellectually meritorious by his or her peers. It is also necessary for a work to remain constant if it itself becomes an object of study and evaluation by other scholars. Publishers also have needs. They need to generate funds to finance their contribution to the knowledge culture. To do this, they need to be able to publicize the availability of new works, making people aware of their existence, and thereby creating a market. They also need to be able to control and monitor access and use in order to negotiate payment. The market for an electronic work will be of limited duration. We are safe in assuming that publishers will not want to bear the cost of storing and maintaining publications when the costs of doing so exceed the revenue being generated by access and use of them. Therefore mechanisms must be in place to trigger transfer of control and custody of the information from the publisher or scholarly society to the archive and library community to ensure that the intellectual value is preserved after the economic value is exhausted. Clearly, if these complex intellectual, social, and political interactions are to take place successfully in the digital network environment, the various participants will need to have a structured information space within which they can communicate with one another while each maintains the form of control essential to his or her relation to the shared information. What in the world of digital information will replace the printed book as the domain over which the various interested parties will exercise their shared dominion? I believe that the electronic version of the archival access and control model we have been exploring in the Berkley Finding Aid Project -- the extension of the catalog through structured finding aids linked to digital surrogates of primary source materials -- forms the foundation of such an interactive place for the community of scholarship, and further, it suggests the direction to be taken to constitute a complete system that will fully accommodate everyone interested in producing and controlling research in the new world of networked, digital information. In my remaining remarks, I would like to complete the outline of this possible information space as it has been suggested to me by my work with structured finding aids.

A collection-level structured document like the finding aid clearly would serve this function well. The kind of collection that I have in mind here, though, is not one determined by the archival principle of provenance, but rather a collection that represents the shared interests of intellectual communities. Fortunately the model for such a collection already exists in the print world. It is the comprehensive, critical subject bibliography. In order for the bibliography to serve as the new axis of electronic academic publishing, we need to design it in such a way that it can be collaboratively and cooperatively built and extended by publishers, scholarly societies, and libraries. It must enable all of the critical functions to take place under the control and jurisdiction of the appropriate experts and professionals. An underlying assumption is that the subject bibliography will be organized hierarchically. Subjects and disciplines will be subdivided, and the subdivisions further subdivided, with as many levels of analysis as are needed. When bibliographies become too complex, or the discipline redefines itself, the bibliography can be divided or reconfigured to reflect the new definition. Alternatively, an older organization of a subject can be linked forward to the newer, creating an ongoing structure reflecting the development and changes in subjects. Once an electronic work is judged worthy of publication by a scholar's peers, the publisher will publish it by having the bibliography "point" to it. When a new work is published, a cataloger assigned responsibility for the discipline or subdiscipline will automatically be alerted. He or she will catalog the "item" using the entry in the bibliography as the chief source of information. Once cataloged, the work can then take its place in the controlled bibliographic universe of the library world. Users interested in the work might locate it either through the catalog, or, more directly, through the bibliography itself. In either case, the user must negotiate access to and use of the work with the publisher, on whose server it resides. Access to an abstract might be free, with access to the entire work involving payment. At a certain point, when the number of paying readers drops below a publisher defined threshold, those responsible for preservation of the human record are automatically notified, and transfer of the work to a library preservation server is negotiated. Like the finding aid, the bibliography would provide organized, structured, reasonably stable access to and description of information, only now the underlying works are formal publications. In this world of related and controlled electronic texts, the bibliography provides direct access to the texts themselves, and published texts point to and reference one another. The controlled subject vocabulary of the catalog that provides access to the literature in a given subject area will be enhanced by the text of the bibliography and the works themselves. Links from the works to the digital images of primary source materials upon which it is based will complete the scholarly information universe. Here we have the outlines of a well ordered "universe" of textual reference.

Clearly in this scenario primary source materials will continue to figure prominently in electronic publications in the humanities and social sciences as they have in print publications, but with an important difference. They will be present for anyone who wishes to review the interpretative work in question. Along an electronic chain they will anchor critical research to the empirical evidence it evaluates and interprets. But the the mechanisms and apparatus by which these materials are incorporated into the publications will change, as will the relationship between archival finding aids and scholarly works that point to the same materials. It will not be necessary to copy the digital file "into" the publication. Instead, the publication will point to the digital file residing on the archive's server, the same file being pointed to by a finding aid, and perhaps by one or more current research projects that have not yet reached publication. The archive or library will want to maintain control over all of this pointing for at least two reasons. First, while most public archives and libraries will want to make access to primary source materials free when the intended use is for research and education; where the materials are used in publications intended to make a profit, they will want to charge for the use. Monitoring the pointing will make it possible to automate both the negotiations and transactions. Second, it will be possible for users to navigate in and among the various texts pointing at the object through the object itself. Many important objects, over time, will have multiple paths leading to them. When a reader comes across the object, he will discover a nexus of interpretations competing to illuminate it.

Technology currently exists or is emerging to design and build the kind of comprehensive archive, library, research, and publishing environment that can provide an orderly, civilized space for scholarly communication. It is technologically feasible to build a shared, structured environment that will enable archivists, librarians, publishers, and scholars to communicate and negotiate with one another, to have control over those components of the environment that they need to fulfill their missions, to make their contributions, and to reap their rewards. The obstacles to building such a community are primarily intellectual and political, not technological. Mircea Eliade, the late Romanian historian of religions, once observed that we must first create the world in order to live in it. Intellectually we must overcome the spatial and temporal limits that dominate our print culture and our imaginations. At this moment there are a wide variety of interest groups and forces competing with one another to to define and create the networked digital world. Each wants control, and to the extent that they understand what this means, they are busy acquiring the necessary techniques. Most of you do not need to be reminded that many of the interest groups trying to define and exploit networked communication have values that conflict with ours, and that they have considerably more resources to bring to the conflict. We need to understand what is at stake, and, if we are to have a chance in the struggle, we too must master technology in order to preserve the cultural heritage that has been entrusted to us.

Before we can employ the technology in the service of our professional commitments and values, we must master it. to achieve this, we must also build closer, cooperative relationships between researchers, scholars, publishers, librarians, and archivists to collaboratively define and create an information environment that addresses the full array of interrelated needs and functions. The medium that we are beginning to employ is an inherently communal one. The omnipresence of networked digital information offers us an extraordinary opportunity for the advancement of human knowledge, but only if we succeed in controlling it. What I have proposed is admittedly quite conservative, namely, reinventing in the digital environment the complex system of interaction of archivists, librarians, scholars, collectors, creators, publishers, and readers that results in the worthy being found, discovered, called to our attention, and selected for enduring availability and preservation. But it is not purely conservative if one takes that to mean merely restrictive. The networked digital information environment I have outlined is also expansive because it promises to transform the intellectual commmunity by admitting whole groups of people who prior to its advent never set foot in an archive or gazed upon the treasures stored in an archival vault.


The argument being presented here should not be interpreted as technological determinism wherein the technology (and the technicians) impose a new digital world order on authors, publishers, archivists and librarians. Instead I am advocating that the users of the technology master the techniques of control to serve their professional and philosophical commitments.
"Identity and change" is of course a basic, enduring philosophical issue, and thus certainly not one limited to information in a digital environment. The basic problem can be stated as follows: how can something that changes still have the same identity?; Or, alternatively, when somethings changes, at what point does it assume a new identity? The ease with which we can change digital information only quantitatively compounds the practical problems associated with the issue, it does not alter the issue itself qualitatively. Issues of editions and versions have plagued the library community for years. The library community decided to make arbitrary decisions concerning "identity and change" based on a minimum of observable evidence in order to move on. The digital world does add a new wrinkle to the practical side of the problem in that it alters the nature of the evidence and the observation of it.