[Mirrored from: http://www.acls.org/n44mccar.htm]


Volume 4, Number 4 (February 1997)

Internet-Accessible Scholarly Resources
for the Humanities and Social Sciences

This issue focuses on the presentations of a program session on Internet-accessible scholarly resources held at the 1996 ACLS Annual Meeting.

Because It's Time:
A Commentary on the Program Session

by Willard McCarty
Editor, "Humanist"
Centre for Computing in the Humanities, King's College London

Two broad themes arise from the presentations in this session: first, that the time has come for critical assessment of the Internet as a medium for scholarly communications; second, that (in the language of programming) we are beginning to glimpse the difference between the bugs and features in our uses of this medium. It is time because we now have a sufficient mass of scholarly resources and enough direct involvement with the Internet to attempt a sorting of characteristics from the confused mass of temporary infelicities and misuses. To put the matter another way, we should now be able to throw off the imitative thinking typical of the early stages of a new technology and try to understand where the real genius of the medium might lie.

The panelists--Susan Hockey, Jennifer Trant, Richard Rockwell, and Charles Henry--spoke with notable unanimity about the limitations of current tools and materials. Nevertheless, as Trant pointed out for art historical images and Rockwell for social science data, the bare fact of their accessibility is of enormous importance, however poorly we have supplied and used them to date. (A comparison with the early years of printing would be instructive here.) If mere accessibility were not so important, there would be much less reason for us to turn our attention prominently to them and detail their faults by our most rigorous standards. Thus, for example, while we are cautioned to beware of search results from badly transcribed texts and to regard the World Wide Web as a mere beginning, both are useful within their limits. More importantly, they show that new tools lead to new questions, and begin to suggest what those new questions might be. Dismissive criticism is as senseless as uncritical acceptance.

Another perhaps obvious but vital point concerns our fitness for the task. All too often humanists and social scientists of the less numerically intensive sort imply in their remarks about technology that we are incompetent or at least inexperienced. Thus either we are portrayed as poor suppliants at the industrial/scientific banquet, or the introduction of computing into humanistic scholarship is viewed as the incursion of an essentially foreign, possibly inimical force. The truth is, however, that poets and philosophers have been thinking and dreaming about automata (the general class to which computers belong) at least since Homer (Note 1). It is only within the last few years that computers have become sophisticated enough even to begin to handle our data and so be worthy of our attention. As Northrop Frye pointed out in one of his last public lectures (Note 2), humanists have in the past distinguished themselves by their powerful applications of "advanced" technology, such as the codex or movable type. So, we need not worry about our qualifications.

In her remarks, Susan Hockey wisely defined the central function of the Internet as remote access to material, not as whatever the World Wide Web and its HTML may currently allow. If we look critically at the world defined by the Web, or indeed as defined by current offline software, we can see its radical insufficiency to meet the needs of basic research in the humanities and social sciences. Thus, Hockey notes, the profound complexity of humanities data and how poorly it is served by the primitive retrieval methods available on- or offline. In other words, we cannot rest content, nor be seen as such by our more skeptical colleagues, but must rise to the challenge that these data offer to systematic treatment. Here lies a great opportunity for scholars to help solve problems which extend well beyond their specialist areas and conventional methodologies.

In a time when superstitious reverence for learning no longer shields us from public skepticism, it is vital not to overlook any opportunity for demonstrating the ultimate usefulness of pure research. This usefulness may at least be argued from the general principle that to solve a problem in a more complex form is to solve it in many simpler manifestations. If, that is, scholars in the humanities and social sciences bring knowledge of their complex data to the task of developing the software tools, demanding that these data be the yardstick, then we are far more likely to see some advancement of the medium than if processing the language of weather reports or medical diagnoses is the goal.

In a related point, Hockey has emphasized the crucial question of how these data are used in the usual course of scholarship. One rather new aspect that software brings to humanities and social science data is that its manner of use becomes inseparable from providing access. This has a number of important consequences. Socially, as Jaroslav Pelikan points out in The Idea of the University, the change means that it is becoming unworkable, and even more mutually damaging than formerly, to segregate those who support research into a lower, servile class; in the common interest of our survival, they must be made colleagues in the research enterprise. Intellectually, among many other things, it means that the central task of software development for the humanities and social sciences, whether on- or offline, is the modeling of scholarly methods. This modeling has been and continues to be done by an often productive alliance between scholar and programmer. I would argue, however, that the ideal lies precisely where the evolution of computing appears to be taking us: to the point at which the scholar constructs his or her own tool, directly, by assembling methodological "primitives" into a process that models thought. One might call this the Lego-approach, or to use a term coined by American physicists from my own generation, the "tinkertoy modeling" of scholarship. There are two points to this sort of modeling: first, to produce results; second, and primarily, to learn more about our data from the failures of the model (Note 3).

Hockey notes that at least for now textual scholarship seems to be better served by the techniques of markup than by clever algorithms. Her perspective is typical of the computing humanist, less so of the computer scientist or computational linguist, whose algorithms we continue to hear about but seldom if ever get to use. Markup, though laborious to encode, has the advantage of putting into the hands of the average working scholar the means of rendering human intelligence into a form that even simplistic software can process. No large grants are then required to continue the work, and the intelligence of the scholar is engaged with the data in a direct, immediate way. Literary scholars will recognize this as a new form of "close reading," and in it philologists find their ancient passion.

Textual markup is often misunderstood, and its potential grossly underestimated, by the tendency to think of it as belonging to a preparatory, mechanical stage one must get through before serious work can be done. In some cases, of course, tagging is largely mechanical, but the more complex the phenomena one wishes to identify consistently, and so be able to process reliably, the stronger its role as a means for manifesting one's understanding of the data. (Give the job to a graduate student and you have just made him or her a colleague.) Tagging, I like to argue, is akin to translation and so forces deep knowledge as much or more through its failure to convey meaning as by its successes. Tagging is the form of tinkertoy modeling most readily available to those with the most intimate knowledge of humanities and social science data. It is an instrument of perception, a medium for thinking about text.

Markup also provides for the encoding of metadata, the editorial information about information that allows the scholar to document the history of the text, its status, and what he or she has done to it. Without this meta-information the electronic book is, as it were, without a title page and preface.

The potential of markup is crudely and poorly represented by HTML, which is a highly simplified form of the Standard Generalized Markup Language (SGML, now adapted for scholarly purposes by the Text Encoding Initiative, thus TEI/SGML. Currently the preferred way of delivering primary and important secondary texts across the Internet is to render SGML-encoded text on demand into HTML, using so-called "gateway" software. This has the advantage of allowing us to exploit current Web technology without requiring that we make a large investment in tagging our data with a still primitive and severely limited meta-language. As HTML improves, or when it is discarded for something better, only the gateway software need change, not the tagging of the data.

Scholarly publications solely in electronic form are becoming increasingly visible. Again, the imitative mode of thinking still seems to prevail in many of these, but gradually we can observe signs that the unique or at least characteristic potential of the medium is being realized and exploited. Hypertextual, or more generally, hypermedia links to other resources are the first sign of something genuinely different, especially when these link within a document to remote software. When the reader of an online article can, for example, access the database on which the research is based, directly from the article, then we glimpse something new or not so new? Is this not, interestingly, a mechanical model of an educated reader's memory, one further step in the direction begun by the external, written record?

Until very recently, computing in the humanities and social sciences has on the whole paid much less attention to graphical data than to verbal or numerical, for the simple reason that images require better hardware even for the simplest operations (Note 4). The World Wide Web has, of course, thrust image-delivery into the forefront, but as Jennifer Trant pointed out, such delivery is still in a primitive state. There are, first of all, a host of technical problems, e.g., with the control of color: the provider has in fact no precise control over the actual colors that the viewer will see. Image files tend to be quite large, even with current compression techniques, and so present a serious problem for timely delivery. More difficult is finding images in the first place. The best we can currently do is to attach keywords to each image-file in a database, and provide small versions of the images (called "thumbnails") on a Web page for browsing. We are still, she notes, going to particular and often idiosyncratic collections, not to the images directly through some kind of central index or finding tool. We are still very far from usable image-recognition software, which would allow scholars to search by shape.

The print medium has had other difficulties with images that the online medium avoids, however. Printing them is expensive, integrating them with text crude and often difficult, and the final arrangement is necessarily fixed. If we can assume that the user already has adequate equipment for viewing color images taking due care to compensate for "developed-nation myopia"then cost is no longer a problem, so that they become practical where they were not before. Furthermore, images can follow the text, or the text follow the images, as the reader prefers, and detailed commentary (verbal or pictorial) can be attached directly to the visual details. Art historians are, of course, in the middle of the action, but scholarship in many other areas is bound to be affected if for no other reason than the ease of including images in text. Furthermore, even the art historians have, as a rule, not concerned themselves much with the techniques of imaging, whereas now they must at least be acquainted with these techniques if not be capable of applying them skillfully. The disciplines mingle, as for example in the Piero Project at Princeton University.

Scholars in the humanities and social sciences have a great deal to learn about imaging--not only the technical aspects, but even more importantly for those outside art history, the relevance of images to scholarship. Some literary scholars, for example, tend to regard them as unnecessary decoration or entertainment, a distraction rather than an aid to understanding. My own opinion is that we do not know very clearly when they are distraction, when essential. The online medium urges us to find out.

In some respects, social scientists are ahead of humanists in their applications of computing. Perhaps simply by virtue of longer and more widely accepted involvement, social scientists have, for example, been quicker to recognize the importance of the online "data center," where one may get easy access to large amounts of information, e.g., of demographic or economic data. In some universities, for example, undergraduate students can easily share use of primary data with their professors and perform sophisticated analyses as part of routine classwork (Note 5). Ideally, and perhaps also in practice, such access is or will result in a much more informed populace, one much less easy to fool by misleading statistics drawn from restricted datasets. This at least is the chief social aim, and it clearly illuminates the importance of access.

Rockwell spoke to these points, but he also returned to a point made by Hockey, that the "information specialist" plays a key role in making this access meaningful. Librarians of the dawning present and immediate future are increasingly going to need scholarly knowledge of how the data may be used and abused. Their training needs to be methodologically informed. Their task would be more than daunting, perhaps even impossible, were it not for the fact that from a methodological perspective, much of what scholars in the humanities and social sciences do belongs to a common ground. Inevitably, it seems to me, as the library merges with the computing center, it also becomes something much more like an academic department. The nature of the closer collaboration between computerized libraries and academic departments must still be worked out, but I think we can hazard a guess that the research-methods course is the natural point of contact, as it has sometimes been in the past. Here is an excellent opportunity for "collegial service."

Charles Henry focused on two primary issues that nicely complete the picture I am drawing here. His topic was the American Arts and Letters Network (AALN), which he defined as an experimental, refereed Web-presence aimed at assessment of online resources in the humanities. Allow me to conclude my review of the panelists' remarks by deconstructing his definition as a series of comments on the subject of the meeting. I focus on three terms: experimental, refereed, and assessment.

Experimental. This implies, of course, something tentative, unfinished, to be valued more for what it discovers in the attempt than for its material accomplishments. An experiment puts an idea or hypothesis to the test, and thus looks to the future. Computer technology, a friend has observed, is always in the future tense; what we have is never as good or important as what we are about to get. The conclusion I draw from my attempts to teach humanities computing is that particularly in this incunabular phase, the ideas are what really matter because they are what survive, perhaps even become clearer, from one version of a computing system to the next. So, we learn from an experiment, then move on. As with questions, a good experiment leads to a better one, not to a final answer that, as Blake saw, puts the light of knowledge out.

Refereed. As any careful survey of online scholarly activity will show, refereeing is now commonplace, although it should not be assumed that non-refereed items are necessarily less serious. The experimental nature of the medium implies that many are serious if for no other reason than they show us something new. Furthermore, in some cases online publications have other means for guaranteeing the quality of what they publish, e.g. in highly specialized, sparsely populated fields where assessment is implicit. Nevertheless, it should be abundantly clear that online publication is now sufficiently popular among scholars that filtering mechanisms are needed. Cost is in general no longer a factor, but individuals must have an intelligent way of deciding quickly whether a publication is worth reading. Editors of online publications should be highly motivated to set up good refereeing mechanisms, since they have to convince both conservative readers and often even more conservative authors that they are worth the attention.

Assessment. Referees assess, but here I refer to the prior meaning of assessment in the incunabular phase of online publication: discovering what the standards should be. It is not to be assumed that these are simply identical to those we have used in the past. They may be quite different.

First we need to discover the particular genius of the online medium. What is it good for, what is it not good for? Journal editors and publishers most often see the urgent matter as economic, the electronic medium offering survival in the face of certain extinction. A deeper question, however, is the uncertain balance between what the medium can be made to do and what it is particularly suited for. Where progress intervenes, to solve a problem that formerly looked like an inevitable characteristic of the medium, is not always possible to say. If, for example, bibliographic instability and other forms of transitoriness are basic to the medium, then it is foolishness to pour effort into attempts at permanence. Can we rely on progress to solve the problems of instability and transitoriness? To me the answer is not obvious.

I hasten to stress that even in such a highly technologized area, few of the issues are purely technical. Even if some do not require our particular skills, they impinge on scholarship in significant ways.

Once the characteristics of the medium are well separated from the temporary problems, it will be much easier to see how it might best be used. This is not simple, even if the characteristics are known, because scholarly publication is an intimately interrelated part of our disciplines. The problems we face are systemic, not isolated. If, for example, relative transitoriness turns out to be intrinsic to the medium, then online publication is best suited to the quick exchange of ideas now more characteristic of the social sciences than the humanities. Were the humanities to take on the new style, then the nature of the work might well change, and with it the ways in which professional assessment is done, the criteria by which people are recruited, and so the kinds of individuals in our departments and eventually the nature of the discipline they practice. As I pointed out earlier, online publications can offer direct access to the data on which the research was done, e.g., a tagged literary text. To the degree the opportunity is exploited in literary critical writing, the nature of this writing will change, and with it the discipline, and so on.

An analysis such as the above runs the danger of reducing a highly complex, contingent, and so ultimately unpredictable situation into a set of known forces whose interactions we think we can control. It is salutary to consider, as Robert Friedel has in his recent book Zipper: An Exploration in Novelty, just how contingent new technologies can be. With some, such as the telephone, individuals have from the beginning been able to foresee their application and significance, but with others this has emerged much more slowly, or as with the zipper, their very survival has remained uncertain for a long time. We must therefore be cautious in adopting visionary pronouncements--the new medium is especially prone to these--but at the same time quick to get involved. As has been pointed out, the new medium is rapidly being shaped and is rapidly shaping the world around us. Our intellectual condition if not survival appears to depend on our engagement with this exciting adventure in scholarly communication.

Clearly much is happening. How are we to fit it all together and make it work effectively for scholarly purposes?

Here I would like to sketch in broad outline how I think our engagement in the construction of a culturally rich Internet might work. My suggestions, which I intend only as provocation to debate, take the form of brief commentary on the diagram "A Typology of Activities for Scholarly Internet Resources" (below). I invite those with a more intimate knowledge of the political and social realities, or simply with better ideas, immediately to begin modifying my sketch, or to replace it altogether.

The fundamental problem, I think, lies in focusing and coordinating activities into a coherent effort. We have no lack of talent. In some countries, like the U.K. or Norway, centralization of the universities and of research funding makes this a considerably simpler task than in the U.S., where (at least to this outsider) a bewildering array of organizations and institutions appear partly to compete, partly to cooperate. Countries other than the U.S. are particularly important to keep in mind, not only because they are different, but chiefly because the Internet is by nature international. The fact that English is currently the lingua franca, and that the U.S. is sufficiently powerful to maintain the illusion of self-sufficiency, is a genuine impediment to a full realization of what the Internet can do for scholarship.

A Typology of Activities for Scholarly Internet Resources

Now to the diagram. I begin with the core-function of academic governance or oversight. Especially in the U.S., with its number and diversity of players, leadership and coordination are crucially necessary to bring together all the players and persuasively to suggest how they might cooperate, what the best agenda might be. In essence the academy as a whole, however it is constituted, needs to take charge. In the U.S., the ACLS is very well positioned for the role; in the U.K., for example, the British Academy, perhaps also the Arts and Humanities Data Service, seem the logical counterpart.

To one side of this central role, political action and support are continually necessary to foster and maintain public approval for our initiative to transform the Internet into a cultural, or better, intercultural resource. This approval is obviously vital for adequate funding, through national agencies such as the NEH in the U.S. and SSHRC in Canada, as well as through private and local sources. Fundamentally, I would argue, what we need can only come about if we cultivate an informed populace. At least in Canada and the U.S., it no longer seems possible, even if it were desirable, to maintain an academy in monastic isolation from the world. Increasingly, protection from the heat and dust of ordinary life is a privilege constantly to be justified.

Direct communication between academics and the tax-paying public is difficult, however. Ursula Franklin, a renowned scientist at Toronto, makes the point in a recent remark that, "As students we were warned: Only the great dare touch the commonplace." Most academics lack the rhetorical skills needed for successful communication with the public--chiefly the ability to speak in effective oppositional sound-bites--and are not likely to gain them. Thus political action and support require specialists, for example in the U.S. the National Humanities Alliance. I hasten to point out, however, that our students already provide an essential channel of communication to the immediate future; they constitute especially fertile soil for ideas about the application of computer-mediated communication to humane ends. (Thus although good teaching is not on my diagram, it is everywhere implicit.)

So much for support. The next matter is the acquisition of material for a scholarly Internet. Partly this involves creation of new electronic data, partly the collection of existing data and coordination or organization of the results. Creation is laborious but best suited, I think, to the academic "cottage-industry." This industry consists of funded projects, large and small, but also of independent efforts by academics who may produce resources as a by-product or co-product of their research. Realizing the potential to any significant degree requires, however, that researchers understand that this potential exists. They may not, because it is largely a new phenomenon.

By nature computer-assisted research puts into readily communicable form supporting materials that formerly were lost or at best preserved as "foul papers." If our colleagues can be persuaded where appropriate to approach the construction of these materials systematically, with a view to their use by others--to think of them as adjunct publications--then much will be made available for the Internet without additional cost. Key to the process are the e-text and humanities computing centers, because they provide the advice, training, and means of preservation. The Summer Seminar, held annually at Princeton University by Hockey's Center for Electronic Texts in the Humanities (1991-96), performed an essential service in this regard by educating the educators. Also necessary are suitable venues for these adjunct publications (Note 6).

Once resources have been produced, especially in the incunabular stage we currently enjoy, they must be viewed as even less finished than a conventional manuscript submitted for publication. The standards by which to judge such a manuscript are well understood; those applicable to electronic resources are not, as I have suggested. Thus we need intensive, critical, intelligent discussion. This is already taking place, for example on "Humanist" , in more specialized online seminars, and at conferences, such as the annual joint meeting of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing. Here all we need do is to recognize the activity and more officially to engage in it by bringing in the scholarly organizations, and through them their members. Insofar as possible, the entire community must participate.

Out of discussion comes the means for evaluation, informally through a consensus that arises from discussion, formally by the academy at several levels. Within institutions it can only begin once the value of working with electronic materials is properly recognized. This may be quite difficult without persuasive exemplars, general agreement on standards of work and means for evaluating it. Again, the academy itself, through such organizations as the ACLS, is the logical place to look for help in putting together what we know and assembling examples of good work. In recent years, at several academic gatherings, the need for a directory to exemplary computer-assisted research projects has been strongly noted. We have the means of communication successfully to construct such a "bibliography" and make it available; the (admittedly large) task only needs to be done.

Once evaluated, work proceeds to publication. As one publisher recently noted, "We all agree that electronic publication eliminates the middleman, but we disagree as to which one of us is the middleman." Among academics, university presses are not infrequently fingered, but it seems naïve to assume that we can simply take on the impressive set of skills a good publisher has to offer: among others, editing, design, legal negotiation, publicity, and validation through the imprimatur of the press, signifying quality of contents. Furthermore, the middleman argument implicitly assumes that electronic will replace print publications, i.e. that the former are the same kind of thing as the latter, only cheaper. As I have argued throughout, this is to miss the essence of the matter entirely.

One cannot help noticing, however, that the private "electronic press" is now very easy to establish and operate, and that a prodigious flowering of new electronic publications is daily stretching the limits of conventional forms. These feed back into the ongoing discussion, where we must ask steadily, "What works? What does not work?"

Finally in my diagram is the review by which published products are judged, and so publishers and their authors kept honest. There is little to say about this, except that the need for informed reviewers also drives the evolution of the system. It is still true that in this time of flux and few jobs, we must reach for talent wherever it can be found, and in that reaching lies opportunity for the highly trained but under- or simply unemployed.

Where from here? Such meetings as this one give considerable hope that we are on the right track. Again, I think it essential that we understand our fitness for the task, how much our skills are needed, and the crucial role our intelligent participation will play in our own long-term survival.

As serious books give meaning to the older technology of printing and binding, so high-quality, culturally rich Internet resources will provide leadership by showing what the newer medium can do. Thus the social, outward-looking aspect of the job before us. The intellectual, inward-looking aspect is, I suppose, obvious to everyone by now. Particularly with the advent of the World Wide Web, crude though it is, the utility of the Internet is clear for all kinds of scholarly work.

Egoque ipse multa quae nesciebam scribendo me didicisse confitear,
Augustine wrote, "and I would confess that there are many things I did not know that I have learned in the course of writing" (De trinitate 3.1) (Note 7). Writing, he saw, is a means of perception and instrument of thought. I made this claim earlier about textual markup, but now I wish to extend it to computer-mediated communications as a whole, and so to assert in closing that the fundamental subject here is the transmigration of thought, or less philosophically, the refurbishing of our cultural heritage. It's in our job description.

1. Homer, Il. 18.376f; Od. 7.91-4. See also Apollod. 1.9.26; Apoll. Rhod. 4.1638-93 (cf. Frazer's edn. 118-19n.). I discuss automata in "Language, Learning, and the Computer: desultory postprandial investigations", in Peter Liddell, ed., CALL: Theory and Application (Victoria, B.C.: Univ. of Victoria, 1993): 37-55. [Back to text.]

2. "Literary and Mechanical Models", in Ian Lancashire, ed. Research in Humanities Computing 1, Papers from the 1989 ACH-ALLC Conference. Oxford: Oxford University Press, 1991. [Back to text.]

3. See my discussion of modeling in "Encoding Persons and Places in the Metamorphoses of Ovid: 1. Engineering the Text", Text, métatexte, métalangage, ed. Brian T. Fitch, Texte: Revue de Critique et de Théorie Littéraire 15/16 (1994): 278ff. Information about Texte is available online. [Back to text.]

4. See the excellent Introduction to Imaging: Issues in Constructing an Image Database, by Howard Besser and Jennifer Trant, (Malibu CA: J. Paul Getty Trust, 1995), also available online. [Back to text.]

5. See, for example, the Data Centre, Computing in the Humanities and Social Sciences, University of Toronto. [Back to text.]

6. With Russon Wooldridge I edit one such venue, particularly for the computer-assisted methodological aspects of research, CH Working Papers, published online at http://www.chass.utoronto.ca/epc/chwp/ and at http://www.kcl.ac.uk/kis/schools/hums/ruhc/chwp/. [Back to text.]

7. I am grateful to Professor James J. O'Donnell (Classics, Pennsylvania) for putting these words before me. [Back to text.]



Building the Scene: Words, Images, Data, and Beyond by David Green
Electronic Texts: The Promise and the Reality by Susan Hockey
Images on the Internet: Issues and Opportunities by Jennifer Trant
The World Wide Web as a Resource for Scholars and Students by Richard C. Rockwell
The American Arts and Letters Network (AALN) by Charles Henry
The National Initiative for a Networked Cultural Heritage (NINCH) by David Green
Online Scholarly Resources Mentioned in this Issue

Visit the ACLS website for further information on the American Council of Learned Societies and its publications.