SGML: Hensen, THE BERKELEY FINDING AIDS PROJECT

NISTF II: THE BERKELEY FINDING AIDS PROJECT AND NEW PARADIGMS OF ARCHIVAL DESCRIPTION AND ACCESS
by
Steve Hensen
Director of Planning and Project Development
Special Collections Library
Duke University

When Daniel Pitti originally approached me to give a paper on the importance of the Berkeley Finding Aid Project, I found myself uncharacteristically hesitant. This surprised me a little because I usually have something to say about everything and will often do so--occasionally at great length- - with little or no encouragement. However, when it came to this project (which I have been following closely since its inception 3 years ago), it seemed to me that its importance was so obvious and elegant in its simplicity that the best I could do would be to write "WHAT A GREAT IDEA" on the blackboard 500 times. Daniel assured me that he had been following some of the papers I have given recently and that much of what I needed to say here had already been said by me in one form or another over the last 5 years. I went back and started looking at some of these (since Daniel seemed to have a better memory for them than I did), and discovered that he was mostly right.

I have found that much of my own work in developing descriptive standards and in integrating the description of archival and manuscript materials into the so-called "bibliographic" mainstream, has been leading, however unwittingly, towards the work that Daniel has been doing. His project is in fact a logical culmination of much that has gone on before; an extension- - nay, more of a quantum leap- - of standard archival practice that leapfrogs the archival world into the unaccustomed role of "leading the pack" or sitting out on the "cutting edge" of current advances in developments in networked information access.

In what I can only attribute to either cosmic serendipity or an astonishing prescience on Daniel's part, his project is maturing at exactly the very time that archivists and manuscript curators both need and will be able to make use of it. When he first started the finding aids project, there was little sense (at least on my part) of how such a system might ultimately be implemented. It was, I now understand, a response on his part to a lot of foolish archival "gophering" that was then currently underway. Nobody had even heard of the World Wide Web or hypertext markup language; and certainly there were very few who, even in their wildest imagination, could have predicted the current explosion of information resources over the Internet. Today, forward-thinking archivists are developing plans and systems for making information about their holdings- - and even the holdings themselves- - available over the Internet. These systems are almost universally conceived of in traditional hierarchical models of archival description: catalog records/subject indexes linked to finding aids linked to material representations. Without the Berkeley Finding Aids Project, such systems would be unthinkable. It is difficult to imagine a more fortuitous convergence of events; though, as I suggest, it may be more prescience on Mr. Pitti's part than any other combination of luck and timing.

In the title of this paper, I make a connection between the present project and something I refer to as "NISTF II," a conceit I confess owing to Larry Dowler. At last year's annual meeting of the Society of American Archivists in Indianapolis, Larry chaired a session in which I was one of the speakers. The session was somewhat enigmatically titled "Archival Stonehenge," but in fact was focused on the 10th anniversary of the MARC-AMC format. In commenting on the papers dealing with the past, present, and future of MARC-AMC, Larry remarked that archival descriptive standards and systems had come so far so quickly and were, in fact, heading even more quickly into unanticipated new directions, that maybe it was time to convene a new National Information Systems Task Force, or "Son of NISTF," as I believe he called it. More recently (early last month, in fact) at the RLG Primary Sources Forum meeting, Larry was called upon to summarize the presentations and recommendations of the assembled representatives of this group and he came to the same conclusion. Subsequent conversations with Larry revealed that it was the Berkeley Finding Aid Project, coupled with the explosive growth in new modes of networked information access and retrieval that were pushing him towards this conclusion. Moreover, in this he was recognizing subtle parallels and connections between the dynamics of current developments and his own earlier experiences in archival standards development.

It is thus my contention that the Berkeley Finding Aids Project in many of its goals and ultimate aims is, if not the "Son of NISTF" that Larry was calling for, such a strong lineal descendant that it deserves immediate adoption, if not actual christening as "NISTF II." As a quondam companion of Larry's in much of that earlier work, I would like to draw on that experience and briefly examine the principal themes and history of archival standards development over the last ten years by way of more fully appreciating and understanding the importance of the present project and how deftly and systematically it fits into the overall process (whether or not it really meant to).

It is indeed a privilege to participate in some small way in phase II of a process that started with the Society of American Archivists' National Information Systems Task Force. I emphasize small way, however, because most of us who participated in that experience would not willingly re-live almost any part of it. One of the reasons for this is that it was an extremely lengthy and often contentious process. To be sure, the issues with which we were dealing were complex and controversial and required as much attention as we could give them. In addition, however, the world was a more leisurely place 10 years ago; there was not the pace of progress with which we must contend today where fundamental social and cultural changes seems to occur at the same pace as (and indeed are often prompted by) changes in computer software and hardware- - which is to say an 18 month obsolescence cycle. The two most recent projects with which I have been involved- - the RLG Digital Image Access Project and this one- - have been trying to move on as the ground was almost literally moving under their feet. And as a consequence, they have both ended up in altogether different environments than those in which they started. But I digress.

As I was saying, the issues with which NISTF had to contend were difficult. Among these was the seemingly ingrained hostility by many in the archival community towards anything that smacked of librarianship, and the firm belief that since archives were unique, they required unique approaches and that standards could thus never be applied. Add to this mix the sentiment that the methodologies and principles of archivists were somehow fundamentally different than those employed by their more library-oriented "manuscript curator" colleagues- - a vestige, if you will, of the "archives-historical manuscripts" dichotomy which dates back to Sir Hilary Jenkinson in the early part of this century.

Thus, among the first questions that NISTF had to address was to determine whether there was any substance in the long-standing dispute between "archivists" and "manuscript curators" over various matters of theory and practice. Towards this end, Elaine Engst of Cornell University conducted a thorough study of descriptive practices in a wide variety of repositories. Her unpublished report, "Standard Elements for the description of Archives and Manuscript Collections," clearly demonstrated that there was virtually no practical difference between the descriptive approaches of these two groups and that, in the words of Tom Hickerson, "there are common methods of archival description which could be integrated into a broadly applicable set of standards."[1] More important, though, it helped lay an essential foundation for the subsequent development of a format to carry the various elements of description. It also helped archivists to begin to understand that there was an area of common ground between archival description and library cataloging.[2]

This work led to the development of a unified data elements dictionary, which was the first step on the road to adapting the MARC format for the purpose of describing (or "cataloging," if you will) archives and manuscripts. Although at the time this work was going on, it was not altogether clear to the members of the task force that it was possible or desirable to describe these materials in the very same systems used for describing other library materials, it was already obvious that the superstructure the library world used for doing this work (the MARC formats) could easily be adapted to archival purposes. The result was the USMARC Format for Archival and Manuscripts Control.

Perhaps more germane to our discussion here was NISTF's realization that this new superstructure they were devising must somehow accommodate the details of handling multi-level archival hierarchy. For there is nothing quite so sacred or central to an understanding of that world-view that is peculiarly "archival" than the principle that the fonds, whether consisting of personal papers or government records, are essentially organic in nature, i.e., generated as the natural documentary byproduct of the activities or functions of corporate bodies or persons. From this flows the archival principle of provenance (often known as respect du fonds) which holds that the arrangement and description of these materials follows their original, function, purpose, and order. Thus, for the archivist, the concept of multi-level description is deeply rooted in the basic principles of the profession. In 1964 Oliver Wendell Holmes (the archivist) defined five basic levels of archival arrangement and description- - Depository, Record Group and Subgroup, Series, Filing Unit, and Document.[3] And until 1986, when Max Evans came along and effectively destroyed the concept of Record Groups,[4] this system of hierarchically-based levels had, according to Terry Abraham's 1991 article, achieved the status of "dogma" in the American archival profession.[5] This hierarchy was a logical outgrowth of the archival focus on the principles of provenance and respect de fonds in which the original order of the archival materials is considered a direct reflection and result of the bureaucratic structure and activities that created them. The basis of Holmes' earlier argument was that there were distinct descriptive and arrangement requirements inherent in these levels.

NISTF, recognizing that any structure or standard that did not accommodate the description of archival hierarchies or levels was both inadequate and doomed to failure, took a harder look at the MARC formats. There, in some then relatively undeveloped fields, they discovered that the structures established to accommodate library analytics were not only perfectly suitable for controlling archival hierarchy, but were also, in their "part-to-whole" configuration philosophically consistent with archival levels of description. This idea, while perfectly obvious now, was an epiphany at the time and really paved the way for subsequent full development of the MARC AMC format and the full integration of archival description into heretofore strictly "bibliographic" systems. In the RLIN execution of the USMARC AMC format the Research Libraries Group fully implemented these "linking" fields. Since then, they have become the very essence of the description of particularly government archival material within RLIN, providing a means to describe materials at any appropriate level while logically associating that description with other descriptions of hierarchically related materials.

However, the MARC-AMC format, no matter how well-suited to archival descriptive needs or finely attuned to archival principles, was, on its own, simply an empty vessel- - a "data structure standard," as we now understand these things. What was required to make it usable inside the framework within which most MARC records were created was a companion "data content standard." Once again, the forces of fortuitous serendipity were at work. In 1978 the second edition of the Anglo-American Cataloguing Rules (otherwise known as AACR2) was published.

Although the publication of AACR2 per se cannot be said to have had much impact one way or the other on the archival world, the archival response to it certainly has. When the second edition of this library cataloging standard was issued, most of the archival world took little note. This was not the case in the Manuscript Division of the Library of Congress where I was then employed as Senior Manuscript Cataloger. As the Library was one of the principle partners in the development of these rules, I was more or less obliged to use these rules for the cataloging of manuscripts. However, after a brief review of AACR2, it quickly became evident that they were written with no obvious input from anyone in the manuscripts or (much less!) the archives community. Moreover, they represented a significant departure from the rules and principles that were then in use. Without getting into a detailed discussion of the specific problems (about which I have written extensively elsewhere), suffice it to say that the rules as presented were essentially unusable for cataloging manuscript and archival materials. The response of the Manuscript Division was to attempt to develop an alternate set of rules; rules which were consistent with what were then understood to be sound archival principles while at the same time working as much as possible within the general overall spirit and structure of AACR2. These alternate rules were subjected to a thorough review from within the Library, by an editorial committee drawn from the American archival community, and by a number of commentators from around the country. The result was the first edition of Archives, Personal Papers, and Manuscripts.

This manual, which is now in its second and revised edition, has been widely accepted by the American archival community as the standard for the cataloging of archives and manuscripts- - especially in an automated environment (it is also, I have just learned, about to be translated into Italian, which, if nothing else, will now give archival cataloging an emotion and passion that it has never enjoyed before). It is important to understand that this is not a manual of general archival description, nor is it a guide for the development and construction of archival finding aids (though its rules and principles are based upon the existence of such finding aids and upon a general presumption of standardized data elements).

Its success is based, first of all, on the basic premise that archival cataloging is simply one facet of a larger apparatus of description. As I noted earlier, the preparation of a variety of internal descriptive finding aids is normally central to the very mission of most archival repositories. No archive or manuscript repository could long survive without such tools. Contrary to what some have written, this manual does not in any way supplant or replace this process; on the contrary, APPM clearly states that "in such a system, a catalog record created according to these rules is usually a summary or abstract of information contained in other finding aids."[6] This approach is based upon the assumption that, however valuable and effective our internal finding aids might be for describing and controlling our holdings, they are a very cumbersome and awkward way to share information in any sort of collaborative environment. Thus, if archival repositories are ever going to share information- - not only with each other, but with scholars, researchers and other users as well- - then the preparation of summary descriptions or cataloging records according to established standards is the most effective way to do this.

Thus, the acceptance of APPM as an established standard is based upon the ways in which it synthesizes basic archival principles into the broader framework of bibliographic description, fine tuning that framework to transform it into a vehicle for specifically archival cataloging. This synthesis is based on four major principles:

First, it recognizes the primacy of provenance in archival description. This principle holds that the significance of archival materials is heavily dependent on the context of their creation and that the arrangement and description of these materials should be directly related to their original purpose and function. One of the specific implications of this is a heavier emphasis on the use of notes in archival cataloging since it is difficult to capture the complexities of substance and provenance in the sort of brief formulaic encryption that characterizes most bibliographic description. Moreover, the expanded use of notes is more consistent with archival traditions of subjective analysis.

The second principle embodied in APPM is that it acknowledges that most archival material exists in collectivities or groupings and that the appropriate focus of the bibliographic control of such materials is at this collective level. While the practical effect of this approach relieves the archivist of the overwhelming burden of providing item-level catalog records for record series or manuscript groups more frequently measured in scores of linear feet, it also supports the principles of archival unity, in which the significance of individual items or file units is measured principally by their relation to the collective whole of which they may be a part. A corollary of this is that the most appropriate place for component-level description and analysis is within the archival finding aid.

The third principle in these rules is that they recognize that archival materials are generally preserved for reasons different from those for which they were created. They are the unselfconscious byproduct of various human activities and consequently lack "the formally presented identifying data that characterize most published items, such as author and title statements, imprints, production and distribution information, collations, etc. Personal or corporate responsibility for the creation of archival materials (another way of saying provenance) is generally inferred from, rather than explicitly stated in the materials."[7] Such identifying data is normally created by the archivist in the course of arranging and describing the material. Thus the principal implication in this for the cataloging of archival materials is to legitimize traditional archival descriptive systems such as finding aids, guides, registers, etc. as sources of cataloging data and to move the cataloging process away from the literal transcription of information that characterizes most other bibliographic description.

Fourth, APPM recognizes that there are "a number of appropriate levels of description for any given body of archival material. These levels normally correspond to natural divisions based on provenance or physical form."[8] Thus, the rules provide an essential framework for multi-level description, making it possible for archival catalogers to prepare consistent records regardless of the level of description. This approach mirrors that of NISTF in that it is analogous to the bibliographic concept of analysis, which provides for the preparation of cataloging for a part or parts of an item for which a comprehensive entry has been made. Given the overwhelming importance of hierarchy and provenance, this has been an essential feature of these rules.

Perhaps most important though is the fact that the approach embodied in these rules accepts as a given the legitimacy of archival material as part of the larger universe of cultural artifacts. The introduction to the first edition of APPM states that "a fundamental and compelling rationale for this attempt to reconcile manuscript and archival cataloging and description with the conventions of AACR2 lies in the burgeoning national systems for automated bibliographic description. If these systems, which are largely based on the descriptive formats for books and other library materials outlined in AACR2, are to ever accommodate manuscripts and archives a compatible format must be established. This manual is based on the assumption that, with appropriate modifications, library based descriptive techniques can be applied in developing this format."[9] Underpinning this is the conviction that it is both appropriate and desirable to catalog and describe archival materials as a part of those systems which describe more traditional library materials such as books, films, serial publications, maps, sound recordings, graphics, etc. It is thus now axiomatic from the point of view of access to research information that there are logical, vital, and inextricable relationships between all of these materials and that it is important to be able to show those relationships in a bibliographic context.

The superstructure provided by MARC-AMC and APPM for the description and control of archival and manuscript materials would have remained an untested abstraction (something, I'm sure, that would have pleased any number of archivists at the time) without some concrete evidence that it actually worked. For, as I noted earlier, many archivists in the United States were still deeply suspicious of the library origins and essentially "bibliographic" structure of MARC-AMC. Fortunately, even before NISTF had completely finished its work, several university libraries that were members of the Research Libraries Group were urging RLG, along with the National Endowment for the Humanities, to support a project that would truly test the viability and utility of this new approach.

This early project involving the manuscript and archival collections in the libraries of Yale, Cornell, and Stanford quickly proved not only to the archival community but also to RLG and the larger library world (for it must be noted that initially there was considerable skepticism and even resistance within RLG to permitting cataloging records for archival materials to be entered into the RLIN bibliographic database) that the MARC-AMC format and the descriptive standards embodied in APPM could be used successfully for the control and description of archival materials as part of a larger integration of those materials into heretofore strictly bibliographic databases.

While some of you may be starting to wonder what this bit of history regarding APPM and MARC-AMC has to do with SGML and archival finding aids, I firmly believe that without the foundation provided by these events we would not be here today. Without our experience of developing and implementing standards within a larger bibliographic context and without the relative success of that experience, there would have been no pressure or impetus for archivists elsewhere to start exploring the larger world of related standards. Most archivists had happily survived for years in splendid, idiosyncratic isolation and, but for their formal homage to a few sacred archival principles, saw no need to standardize the way they went about their business and certainly saw no need to ally themselves with librarians. What the experience of MARC-AMC and APPM has shown is that, quite beyond considerations relating to the internal management and description of archival materials and quite beyond the traditional inherent administrative and bureaucratic value and purposes for archives, there was real benefit in being able to clearly communicate archival information- - not only just among archivists- - but also with the larger world of historical scholarship and research. And further, that archivists were much more closely allied with other information professionals than they realized.

Looking back I think that is safe to say that few of us who were involved in these projects back in what now seems to be the archival pre-history of the late 1970s and early 1980s had any clue of the ultimate enormity of the impact of our work. The task that NISTF had designed for itself was initially very simple; to wit, heading off a potentially unpleasant jurisdictional dispute between the National Union Catalog of Manuscript Collections and the repository guide project of the National Historical Publications and Records Commission. The fact that the MARC-AMC format emerged from NISTF's deliberations was a result more of fundamental practicality on the part of the task force than any new vision of the future of archival description: it was simply easier to adapt the MARC format to our needs than it was to develop an entirely new system to underpin any archival "national information system" that might emerge (whatever that meant).

Similarly, my own work in recasting ACCR2 to accommodate the requirements of modern manuscript and archival cataloging was undertaken with rather more modest goals than those that ultimately resulted. Like NISTF's aim, I was simply looking for a practical solution to what seemed like a relatively small problem. There was little sense that this solution would have wider application or appeal. In addition, though I am somewhat chagrined to confess this now, there was also little sense of the vital connection between the work of these two projects. It was only by the sheerest coincidence that they were roughly contemporary with each other. Moreover, I have always suspected that my own place around the NISTF table had more to do with simply having someone representing the Library of Congress who was a member of SAA (I was, after all, at that time simply an anonymous toiler in the back rooms of the Preparation Section of the Manuscript Division; my own work with cataloging standards was in its infancy).

Thus it was that the combined work of these efforts was presented before the world with a distinct sense of uncertainty and unease. Remember, this was long before "Field of Dreams" and there was little assurance that anybody would come, no matter what we built. The archival profession in those days prided itself on its idiosyncratic and individualistic ways and it was not at all clear that standards of any sort would be either welcomed or accepted- - least of all standards intimately associated with the theory and practice of librarianship.

As I'm sure most of you know, these concerns were groundless. The development of the MARC-AMC format and AACR2-compatible rules for cataloging has utterly and completely transformed the world of manuscripts and archives- - certainly in this country, but also to an increasing extent in Canada, Western Europe, and now even into Russia. There are well over 500,000 records in RLIN alone from hundreds of separate repositories in the United States and Europe for previously elusive and fugitive primary resources and special collections. More important, the integration of these materials into heretofore primarily bibliographic systems is now understood to have been a logical and necessary evolutionary step. This success has been the source of considerable surprise to all involved.

The principal motivating factor behind almost all early library automation was in the economies of distributed or shared cataloging. Both RLIN and OCLC were originally developed as bibliographic utilities to accommodate this need- - and here the use of the word utility has virtually the same meaning in a library context as it would to describe the vital supply of such services as water, electricity, telephones, sewage would to towns and cities. It may be regarded as something of a happy accident that these same systems unwittingly developed- - through the sheer accumulation of bibliographic information from a variety of libraries- - into immensely valuable research tools (though this realization has come rather later to some than to others).

Thus, today in an environment where the exigencies of copy cataloging are either irrelevant (as is the case with archival and other unique primary resources) or can be easily handled in any number of ways, the purpose and focus of these systems is (or ought to be) evolving towards becoming more thoroughgoing and integrated research tools. These would be systems in which the description and control of the entire range of cultural artifacts is both accommodated and encouraged and where information is accessible without regard to the particular physical form that information might take. The world of research and scholarship (which most of us serve) has become increasingly interdisciplinary and less concerned with whether the information it seeks is to be found in traditional printed and published forms or in archives, photographs, motion pictures, videotapes, computer files, or in museum registers. As I noted earlier with respect to the theoretical rationale for APPM, it is now recognized that information of all sorts is now part of a large seamless web and it is becoming increasingly clear that service to research and scholarship is optimized when there are no artificial restrictions on the particular form that information takes. This realization has crystallized as part of the curious alchemy that MARC-AMC and APPM began- - particularly within the experiences of RLG and RLIN.

RLG's implementation of the AMC format in RLIN made the full integration of primary source materials with other library resources a reality and helped both archivists and librarians recognize the importance of this integration to the world of scholarly research. Thus, the evolution of RLIN from a "bibliographic utility" supporting shared cataloging into a more broadly based system (putting the "R" back into RLIN, as it were) was made possible by simply accommodating the cataloging of material for which derived copy cataloging was not an issue. Once archival materials were accommodated, it was but a small leap to realize that the description of other cultural artifacts was just as important. Thus was born the idea of RLIN as a "cultural resources database."

As significant as these advances have been for the world of archives and manuscripts- - and I wish in no way to minimize them; they have been spectacular- - they have still been constrained by the limitations of the systems in which they have operated. The MARC formats are a 30 year-old database structure that provides a functional standard through which libraries can communicate bibliographic information. Given the relatively short half-life of more modern database systems, one can only wonder at either the foresight of the early developers of MARC or its stubborn durability in a world not given to easy or sudden changes of direction. Furthermore, this format is superimposed over an approach to bibliographic control that goes back over 160 years to Sir Anthony Panizzi; and, slightly more recently, to the Paris Principles of 1961 upon which the superstructure of AACR2 is built. This arrangement is essentially a one-dimensional approach to bibliographic control based on the card catalog (or its more "modern" electronic equivalent) conveying information on whether a particular physical object, usually a book (defined through such distinguishing characteristics as author, title, publication date, publisher, measurements, etc.) is in a particular library. Conveying detailed information about the content or intellectual characteristics of this item are normally regarded as outside the scope of descriptive cataloging and are usually brought out in a couple of well-chosen (it is hoped) subject headings or (if permitted) some descriptive notes.

Is there anything wrong with this picture? Some would argue that this approach has been around so long simply because it's effective and serves us well. Richard Pearce-Moses, of the Heard Museum in Tucson, recently responded on the LCSH-AMC listserv to what he regarded as some ominously hyperbolic statements I had been making regarding the future of bibliographic description:

"I certainly don't expect to see the baby thrown out with the bath water. But I wonder how much the fundamental paradigms of description and access will really change. The format of description may (finally) evolve away from the card catalog style; yet, that style may have remained fairly constant because it's effective in the way it telegraphs information....Even the notion of hyperlinks to full text would not necessarily dictate change to the bibliographic description. At some point all those e-documents are going to be impossible to find, as would a library of several million volumes be useless without some guide. The bib database is an abstraction of the documents, and we will continue to need abstraction to avoid having to search the entire haystack."[10]

Richard is correct; we will still need pointers, or "meta-data" to get to the information that resides within the collections of our cultural repositories, and cataloging of some sort may still be the way to do this. However, I would like to suggest that our current cataloging systems are ill-equipped to do this on two counts. First, as I noted a moment ago, these systems are based on an approach to description that focuses almost exclusively on the physical characteristics and manifestation of the thing being described. At the very least, in a world in which bibliographic "items" or works may exist in many different forms and locations simultaneously, this seems curiously out-of-step. At the worst, with our users increasingly demanding and expecting more precise content- and subject-oriented retrieval, an approach that ignores these demands seems suicidal. Secondly, these systems are, as I also noted earlier, unidimensional in that they are based upon the assumption that there is a book (or, more generically, the bibliographic "work") in a library and there is a descriptive surrogate for that book, the cataloging record. That is the "system" in its entirety. The catalog record is used to locate a particular book and the user, armed with call numbers and library locations, goes off in search of that book, hoping (often against hope) that, if this book was retrieved through a subject search, that there actually may be something useful there on his or her research topic.

With the virtual explosion of activity over the last few years in the ability to provide internet-based information, first via gophers, and then Wide Area Information Servers (WAIS), and now the World Wide Web, this. disparity between what we have been doing and what we should be doing has become all the more acute and increasingly difficult to explain. This is particularly true as libraries become less concerned with managing their actual holdings and focus more on connecting its users with relevant information- - wherever that information might be and in whatever form in which it might exist. For these libraries, the catalog as a physical inventory has little relevance. The reverse of this is also true: for libraries providing digital access to its own "holdings" the catalog as a reflection of the physical artifact that may or may not be on their shelves really serves very little purpose in connecting potential user with the information contained within its collections. My own director of libraries, Jerry Campbell (not long ago regarded by many as inhabiting the lunatic fringe of academic librarianship, but currently serving as president of ARL), is utterly convinced that for research libraries to continue to conduct themselves in a "business as usual" mode is to virtually guarantee their extinction as a feature of modern cultural life.

I believe that part of the answer (a very big part) lies in reexamining not only the role of cataloging, but also the relationship between cataloging and other forms of meta-data. A little over a year ago, I stood in this very room with clear instructions from my hosts at the Bancroft Library to do my best to scare the hell out of their staff. Although at the time, I was focusing more on cataloging and management problems posed by enormous backlogs and arrearages- - particularly of rare printed materials- - the principal thrust of my remarks was to suggest that following a more archival model with respect to cataloging and description offered answers not only to backlog problems, but was also more suited to problems of information access and retrieval in the new electronic environment.

The reasons I believe this is so are rooted, not surprisingly, in the essential principles of archival cataloging that I touched upon earlier in this paper. First, archival cataloging is almost always part of a larger apparatus of description, which include a variety of finding aids, guides, registers, calendars, etc. And, further, that archival cataloging is both derived from and points to those finding aids. While these finding aids are, as I noted earlier, not only a fundamental and long-standing part of archival practice, they also provide the basis for the understanding that it is neither practical nor desirable for a catalog record to carry the entire burden of description. For many kinds of more traditional library cataloging (most especially rare book cataloging) the baggage that the catalog records carry is expensive and time-consuming to assemble and, in many cases, isn't even the right baggage. The archival model, with its hierarchically assembled layers of progressively more detailed information, though postulated in electronic pre-history, is, I submit, highly suggestive with respect to the architecture of modern information systems. If the catalog record is redefined as a window or gateway to other dynamically linked information resources, then the structure of that record and the access points that lead to it may become something entirely different.

Second, with an archival approach focused more on the provenance and context of creation of the described materials, there is necessarily more emphasis on the use of descriptive notes which serve to focus on the complexities of substance and content- - particularly as they relate to that context. While it might be argued as irrelevant (or at least difficult) to apply this approach to the description of published materials (particularly with respect to "context of creation"), it nonetheless shifts the burden of description towards content, rather than physical characteristics, which as we noted above are increasingly irrelevant in an electronic environment. In addition, by using a system of hierarchically structured meta-data that, while not a formal part of the catalog entry, but which can nevertheless be linked to it (as with the archival finding aid), it is easier to accommodate (or at least contemplate) a richer system of subjective analysis.

Third, with an archival approach more focused on collection-level cataloging control, the burden for item-level information shifts to forms of meta-data beyond the catalog record, whether they consist of finding aids, databases, or even subunit-level cataloging. Such an approach can even be used, as we have done at Duke, for cataloging large groups or collections of printed materials.

For example, we recently completed a Title II-C funded project to catalog the 65,000 item Guido Mazzoni collection of 18th and 19th century Italian pamphlets and monographs which had lain in the Duke library essentially untouched since they were acquired in 1948. While there had been previous sporadic attempts to catalog these materials, the combination of its size, language problems, and the fact that it was mostly pamphlets had defeated all attempts to bring it under control. This was particularly awkward, and occasionally embarrassing, since it was a well-known collection and contained one of the larger collections of per nozze known to exist in the world

Our approach in cataloging this collection was to treat it archivally. Since Mazzoni had originally collected and organized this material into large generally subject-based groupings, we would create a series of collection-level cataloging records based on those categories We even followed official Library of Congress guidelines on collection-level cataloging of printed materials (which, curiously, seem to contain much of the same language found in APPM). Item-level control was then provided in a separate database which we eventually intend to link to the collection level MARC cataloging records. This approach obviates the need to go through the entire "AACR2-MARC minuet" with each item since all item-level control is exercised within the database (where we made up our own rules!). It also takes a distinctly archival approach in maintaining that, however bibliographically significant individual items within the collection might be, what is most important here is the collection itself. Mazzoni assembled this group of material with specific purposes and focuses in mind and we have, to the best of our ability, maintained the original structure in our processing and cataloging of this collection in a nice bibliographic example of respect de fonds. And, in so doing, we provided what we feel is perfectly adequate access to this collection without the necessity of preparing a full cataloging record for each piece. Some will argue that there will be some kinds of research needs that will not be met with this approach; that some scholars will be disappointed. I have no doubt that this is true. I would answer, however, that we have at least provided access to the entire collection. A traditional advantage to a more archival approach to cataloging is the very practical matter of preferring limited access to all of a repository's holdings rather than detailed control over only some. Or, at least, this was the case until new network-oriented approaches to information access started to emerge. It is becoming increasingly clear that maybe we can have our cake and eat it too.

Several years ago, taking advantage of some of those new network technologies, a number of archivists and special collections librarians started taking some of the finding aids they had prepared for their collection materials and putting them on network servers where they could be accessed via the Gopher technology that had come out of the University of Minnesota. Although I have no specific data to back this up, it is my guess that many of the institutions that did this already had cataloging records for many of these same materials in the AMC files of RLIN and OCLC. And as useful as these cataloging records were, these archivists and librarians knew that the focus of their descriptive efforts was still- - as it had always been- - in the finding aids and guides that they prepared and upon which (it is hoped) the cataloging records were based. There were two essential problems in these finding aid gophers, however. The first was that there was no way to logically or dynamically link those finding aids to their corresponding catalog records. A potential user, looking at a repository's online catalog (often via a telnet connection) would have to exit that catalog and then go into the gopher site to see if the finding aid for the material they're interested in happened to be there. Which brings up the second problem with the gophers: they consist principally of large undifferentiated text files that are very awkward (and occasionally impossible) to search in any meaningful or structured manner. If the file happened to be accessible via WAIS software, there might be a marginally more robust searching engine (if one were using a Mac), but it was still awkward and frankly not much better than simply writing to the repository and asking for a photocopy of their finding aid and then sitting down and reading it.

It was at the point presumably that Mr. Pitti developed an interest in archival description which has never been adequately explained to me, but for which I am grateful nonetheless. He recognized the essential inadequacy of this gopher/WAIS-oriented approach to archival meta-data and promptly embarked upon the project which we are now called together to discuss. While there will be others who are much better qualified than I commenting on specific aspects and technical details of this work, I do want to finish up by simply pointing out (if it's not already manifestly obvious) how smoothly and logically this project fits into and builds upon the traditions and evolution of archival standards development and information systems over the last ten years.

In the early days of MARC-AMC in RLIN there were some interesting attempt to use it to enter entire finding aids into the system. It was never clear to me whether those attempts were based on simply a fundamental dislike of cataloging or whether those trying to do this simply "didn't get it." In any event, it didn't take long to realize that not only was MARC ill-suited to contain the level of detail traditionally found in those finding aids, but also that these huge "pseudo-cataloging" records were totally out of proportion to the other records in the system and constituted a somewhat intimidating, if not irritating, presence in the system. On the other side of the coin there were (and, unfortunately still are) those institutions who also enter their entire finding aids into the system, only they do so piece-by-piece with separate records for each item. Just as with the larger records, such an approach both fails to comprehend the essentials of archival description and cataloging in addition to filling the system with records which, though certainly not intimidating, are essentially useless and perhaps even more irritating. Both of these approaches, however, reflect the fundamental (and altogether understandable) impulse of archivists and manuscript curators to make more detailed information on their holdings more widely accessible. That the MARC format is not particularly effective or competent in accommodating this need may have spawned the impulse on which this project is based.

It does not take a great deal of imagination to see that there are rough parallels between the work of the Berkeley Finding Aid project in defining an SGML DTD for archival finding aids and the work that Elaine Engst began in her survey of archival descriptive practice that led to the NISTF data elements dictionary. In NISTF's case, that data dictionary became the foundation for constructing the elements of the MARC-AMC format. In the present case, however, the project is attempting to define the larger universe of data elements for those finding aids that are at the very heart of the processes of archival description (thereby, it is hoped, also contributing towards developing much needed standards in this area). But if this were the only point of this work, there would have been no need to invoke the mystery and complexity of SGML. Where NISTF separately developed the data dictionary, the vehicle for that information (MARC-AMC), and then sat back to see whether these instruments could or would be used, this project is combining all these processes into one. Document definition, structure (according to an already established standard), and navigational tool are all inherently part of the SGML encoding protocols.

In terms of archival hierarchy, this project has had the benefit of learning from NISTF's experience (again, perhaps unwittingly), but has the additional advantage of actually defining itself using the very essence of that hierarchy: the organic hierarchy of the materials themselves as reflected in the finding aids that describe them. Beyond this, however, it has the potential to provide for an unprecedented level of structural hierarchy within the overall descriptive apparatus. By this I mean that, because of the expected outcome of this project, it will soon be possible to fully realize the entirety of that apparatus within our evolving electronic information systems so that the entire hierarchy of information will be accessible from a single point: from the most general access point in a system to MARC catalog records to finding aids to details within those finding aids and ultimately- - if desired- - to linked files of digital images of actual collection materials. Currently, the catalog records are already available and, as MARC field 856 evolves, so is the capacity to link those catalog records with related information resources on the Internet. What is most critically and obviously missing in this structure is precisely that which this project intends to provide: a way to encode those layers of meta-data that have traditionally existed between description at its most summary and general level and the actual material itself. In addition to providing a mechanism for this linkage, however, this encoding will make possible a level of information navigation that was heretofore unimaginable.

As I remarked at the outset of this talk, those of us who were involved in the early days of archival standards development had little conception of the eventual impact of that work. A process that started with NISTF defining a set of descriptive elements and then mapping those elements into a bibliographic information communications format has now, among all of the other things we have been discussing, culminated in a project that seeks to further refine and define those elements, taking the entire apparatus onto, dare I say, a higher plane of existence. The principal difference between then and now is in our expectations and level of confidence. Because of recent advances in technology and the evident direction of those advances, we now have a much clearer sense of the possible; more than that, however, we now have the confidence and courage to project beyond the possible and to realize that our dreams of a truly "seamless web" of information can be realized. And, as I pointed out earlier, in what is a particularly gratifying turn of events, I believe that this is a model and approach to information management and access that I think has broader applicability to the larger world of cultural repositories and libraries.

As an example of this, let me close by describing an information system which my colleague Paul Mangiafico and I, along with the client-server computing staff in Duke's Management Information Services are in the process of conceiving and designing. It goes without saying that the very idea of this system could have never emerged without the critical evolutionary steps in archival information systems represented by this project.

We are currently trying to replace two moribund legacy automated systems that were designed for us over the last few years by several enterprising staff members and graduate interns. While we were, at the time, very appreciative of their efforts, let me just say that our own sophistication in computers and automation was only slightly less than theirs. However, as our own understanding of these things has improved, so too have our expectations to the point that we have concluded that we can no longer rely on amateur systems for mission critical functions. These systems were used respectively for the accessioning and control of our manuscript collections (Clipper compiled dBase for this one) and for registering patrons in our reading room and recording which materials they used (this one in DataEase). As I say, both systems have become increasingly creaky and unreliable and, without the designers around (grad students do tend to move on), we were stuck. In addition, both systems were locked into DOS environments and we were becoming an increasingly a multi-platform society.

Although we had considered a number of solutions over the past couple of years, none of them seemed to completely answer our concerns or meet our requirements. Because of a generous gift to the Special Collections Library last year (as well as a most fortuitous hiring in our Library Systems Coordinator), we were able to recently start thinking about solutions to this problem in ways we previously could not. Since we were already involved in the Berkeley project at that time and it looked like our finding aids would eventually be SGML encoded, the idea of possibly using SGML for the database itself came to me. Although I must confess that I was (and remain) something of a naïf with respect to the intricacies of SGML, I was particularly struck by the way it seemed to mimic relational database design in its methods for developing encoding for the demarcation of information elements. I subsequently posted a question to a couple of listservs inquiring about the existence of SGML driven databases and received a number of responses that essentially said that it was an interesting idea, but that nobody know of any specific applications.

In the meantime, the World Wide Web was bursting forth in an unprecedented paroxysm of activity and innovation. Without getting into the details of chronicling this growth (I could be up here another two hours!), suffice it to say that there were several developments that struck us as most interesting. First was the almost monthly increase and advance in the sophistication and flexibility of Hypertext Markup Language (HTML) which seemed to us anyway to be laying the groundwork in the near future for a fuller acceptance and, indeed even integration of its parent SGML in the Web environment. Second, the development of enhanced forms interfaces in Web client software, (particularly Netscape) which could be used for data entry, data query, and data display. And last, the facility that many programmers were developing in using scripting language (usually in Perl) for communicating between the Web client interfaces and other information resources and data.

We have concluded from this to build the databases we will need using a high-end relational software package which will use Netscape with its enhanced forms capability as a client front-end for the databases In this environment the databases and Netscape talk to each other via Perl scripts, which interpret data entry, queries, and commands on the one hand and give formatting and display instructions on the other. One of the databases of collection data we hope to use will be the MARC records for our materials that currently reside in our DRA online catalog. Using a Z39.50 client from Netscape we will be able to search the online catalog and once the appropriate manuscript or rare book record is retrieved, it will be displayed and linked to a patron registration record- - also displayed in Netscape.

The key element to full integration, however, is our expectation that our finding aids will be thoroughly encoded using the DTD developed in this project. In this form they can be fully realized in this information system with a full array of external and internal linkages. Although SGML search engines for the Web are in the early stages of development, there are already tools in use at the University of Virginia Electronic Text Center that can convert SGML to HTML "on the fly," as it were. While this is obviously only a temporary solution (after all, why go to the trouble to develop rich SGML encoding only to dilute it?), it seems clear, as I just noted, that the Web is evolving in the right direction.

Those of us in the archives/manuscripts field have only recently and belatedly come to a fuller understanding of our role in the larger world of cultural and information resources- - especially in the new networked electronic environment. While some of this understanding has come from within our profession, we have also relied on the perspective, good-will, and assistance of those from the library, museum, and computer systems fields. From RLG's willingness to accommodate the unique needs of archival description in order to develop a more complete cultural information system to the Library of Congress' assistance in the development of AACR2 compatible rules for description and the aforementioned MARC AMC format, there is a recent history of important furtherance from outside organizations and individuals that have contributed significantly towards our professional evolution and advance. The Berkeley SGML Finding Aid Project is the most recent example of such assistance and may well be regarded by future generations as one of the most important.

It is no news to anyone here that this project has generated a great deal of interest and excitement throughout the world of manuscripts and archives. There are two reasons for this. First, Daniel and his staff have worked extraordinarily hard in understanding what it is we archivists do and how we work. His presentations at Society of American Archivists meetings and his one-on-one conversations have all demonstrated an intuitive grasp of archival principles and fundamentals that is all the more remarkable coming from someone whose most recent library title was that of "authorities librarian." Moreover, the advance of his work on the project reflects those understandings, virtually guaranteeing that the results will be widely useful and applicable.

After this meeting is over and following the conclusion of the Bentley fellowship later this summer, I fully expect the standards and approach developed by the Berkeley Finding Aid Project to receive the immediate and enthusiastic acceptance of the archival community. When the SAA convenes for its annual meeting this fall in Washington, D.C., maybe we should hire a skywriter to fly over the mall (protected airspace notwithstanding) to spell out across the sky, "WHAT A GREAT IDEA," understanding, of course, that virtually everyone in Washington is likely to misconstrue the message.

Durham, N.C.
March, 1995

Footnotes

: [1] H. Thomas Hickerson, "Archival Information Exchange: Developing Compatibility" (Paper presented at "Academic Libraries: Myths and Realities," Proceedings of the Third National Conference of the Association of College and Research Libraries, Seattle Washington, 4-7 April, 1984).
: [2] This is drawn from an earlier article: Steven L. Hensen, "The Use of Standards in the Application of the AMC Format," American Archivist, 49:1 (Winter 1986), p.33.
: [3] Oliver W. Holmes, "Archival Arrangement- - Five Different Operations at Five Different Levels, American Archivist 27 (January 1964): 21-41.
: [4] Max Evans, "Authority Control: An Alternative to the Record Group Concept," American Archivist 49 (Summer 1986): 249-261.
: [5] Terry Abraham, "Oliver W. Homes Revisited: Levels of Arrangement and Description in Practice," American Archivist 54 (Summer 1991): p. 371.
: [6] Steven L. Hensen, Archives Personal Papers and Manuscripts: A Cataloging Manual for Archival Repositories, Historical Societies, and Manuscript Libraries, 2nd Edition, (Chicago : Society of American Archivists), 1989 (hereafter, "APPM"), Rule 0.7, p. 4.
: [7] APPM, Rule 0.11, p. 5.
: [8] APPM, Rule 0.12, p. 4-5.
: [9] Archives, Personal Papers, and Manuscripts: A Cataloging Manual for Archival Repositories, Historical Societies, and Manuscript Libraries, (Washington, D.C. : Library of Congress, 1983), p. 1.
: [10] Pearce-Moses, Richard, "AACR2000," message to LCSH-AMC listserv, 3/10/95 3:39 PM.