The Perseus Project

Elli Mylonas
Harvard University



Introduction to the Perseus Project

Although the Perseus Project does not publish its material over a network, but rather, on CD-ROM, it is a substantial electronic publication, and one whose development highlights the issues that affect electronic publications. Some of these issues are the selection of materials, their presentation and different modes of access, and finally, how best to publish in order to reach the target audience.

Perseus is a multimedia hypertextual database of material pertaining to classical Greece.[1] The main body of the material consists of a large collection of primary texts with accompanying translation and notes, descriptions of archaeological objects with accompanying drawings and photographs, an atlas, an encyclopedia and a Greek-English lexicon. In addition, there are specialized tools for searching and navigating within the material. Very little of the data in Perseus was created explicitly for publication in electronic form. Most of it has been converted from traditional, paper originals. The tools in Perseus, however, are designed to provide not only traditional forms of access but also to utilize the potential of digital information. [2]

The first version of Perseus, Perseus 1.0, was published by Yale University Press in the spring of 1992. Since that time, we have continued work on the database, adding more data and enhancing the navigational tools. Although much of this work, especially the new material, was planned from the outset, it also includes features whose need could not be felt before there was a working system with real data on which to experiment. The inspiration for many new features came up during development of the Perseus 1.0, but we could not incorporate them into the system, since we had to focus on polishing and debugging the system, and not on new development, however seductive that might have been. We also received a great deal of useful feedback from our evaluation team, beta testers, and the much wider pool of users who bought Perseus after it was released.[3] User feedback has been indispensable for pointing out errors and making suggestions for the system and documentation.

What we have learned most about, however, are the reactions, circumstances, and expectations of our users. They pointed out mistakes and requested that particular information be included in future versions of Perseus. Equipment and software problems at the setup phase were more common and disheartening than we expected. Finally, it was often the case that users complained that they could not do something that was actually possible with Perseus, because they had not figured out (and we had not explained sufficiently) how to perform old tasks in new, electronic ways. What we also hope to see is the converse as well: when the users start showing us how to use Perseus in ways that we had not expected. The users' reactions are among the most difficult things to predict in advance. This feedback has been very important for the design and documentation of Perseus 2.0 and will continue to have an effect on the direction of the project in future phases.

Choice and Handling of Material in Perseus

What sets Perseus apart from other educational projects with similar basic features are the sorts of materials we have chosen to include, and based on them, the many methods of access we provide. Perseus is also one of the larger projects of its kind, and exceptional in that it may be used by a broad range of disciplines. Within our subject area--the classical period--we have focused on collecting primary sources, and tried to select information that would be of use to the widest possible audience. We decided the best way to do this was to avoid collecting large quantities of secondary materials such as essays and commentaries, because they tend to address particular audiences and to exclude others. This is also more economical, since good secondary material tends to be recent, and therefore under copyright, or has to be specially commissioned. Both of these require significant outlay for the data. An effective system based on secondary material only seemed to be possible if it could include many different viewpoints, and that would have made the data collection effort far greater than we could afford in time or money.

Most educational software is carefully targeted at a particular audience, and contains a great deal of interpretive material as well as built-in pathways in order to make the information it presents more clear to its audience. In some cases, primary material is excerpted, or presented merely as an illustration for the arguments that are being presented, and is meant to support interpretation, not to lead to it. Perseus, instead, can be the source for many different classes and research topics, since it behaves more like a library than like a syllabus. It can be used productively to research and teach ancient or modern literature, history, anthropology, art history and archaeology. This has practical benefits, since it broadens the potential audience for the system, but on the other hand, there have been complaints that it is too unfocused, and difficult to incorporate into a course of instruction.

Whereas the content of Perseus was chosen to be of interest to as wide a group of students and scholars as possible, in order to make it available in the inexpensive environment that such an audience can afford, it was necessary to impose extremely restrictive formats onto the data on the CD-ROM. Most of the data in Perseus 1.0 is either in HyperCard stacks, in a HyperCard compatible database, or stored as PICT images on the CD. These formats are adequate for the current HyperCard version of Perseus to function, but are not at all versatile, and do not let the user do much beyond what is already built into the system. For example, text in HyperCard fields does not support any other information beyond simple visual formatting, and requires that no single chunk of text be greater than 32K.

Within Perseus, there are several different indices and mechanisms that allow retrieval and navigation by canonical reference (eg Homer, Iliad 12.341). Without them, the user would have to rely on simple and slow linear searches through individual HyperCard stacks. Images are stored as 8 bit PICT files at a fixed size. A user who might want to zoom, enhance or superimpos Perseus images cannot, at the moment do this. A conscious decision was made to distribute Perseus in a form that would run on low end machines, using freely distributed software[4] but at the same time we knew we had to ensure that the data in Perseus would outlast any current software and hardware, and would have the potential to be ported to platforms other than the Macintosh.

These compromise between low-end, prompt delivery and long term survival of our work prompted us to choose robust archival forms for our data. Although the data in the distributed Perseus is in a relatively impoverished format, the data from which this version is derived is extremely rich, and has the potential to be put to other uses, and distributed in different forms. We have tried to use standard formats wherever possible, storing all continuous text with SGML markup, descriptions of archaeological objects in a powerful relational database, images at a very high resolution, and plans and drawings as Postscript files. These formats are among the most generic available, so that as current hardware and software become obsolete, and as new uses are envisioned for the Perseus materials, it will be possible to easily adapt to them.

Navigation and Retrieval

In order for Perseus to be more than just a large collection of data that happens to be stored on one physical unit, different tools and access methods are built into it. These let users who already have experience with the data navigate to the points they want to reach using well-known strategies, and also provide novices with the ability to reach the same places, even if they don't know exactly where they want to go. For example, a Greek vase expert who knows the name of the piece she wants to review, can just look up the vase of that name. The student, who knows only that she wants vases with iconography from the myths of Herakles, can look vases up by iconographic keyword. Finally, the researcher who is interested in vases of a particular shape, or from a particular geographical location, can retrieve a list of vases based on these criteria. The hypertextual features of Perseus allow users to move across related forms of information. There are also specialized tools, more developed for the texts than the archaeological material in Perseus 1.0, which allow more sophisticated access to the data.[5]

As we worked to complete Perseus 1.0, we discovered shortcomings in the retrieval and navigational tools that we had built, and hit upon ways to make them better. Feedback from users confirmed this. By this time, we had also refined the archival formats in which we store the data that goes into Perseus, and developed better tools for taking advantage of them. The next version of Perseus, Perseus 2.0, will not only contain more than twice as much data, but will make this information accessible in more sophisticated ways. For example, in Perseus 1.0, a user had to type a word and then select the type of information that she wanted about that word--she might type "Athens" and then select "Atlas" to see Athens plotted on a map. In Perseus 2.0, all she has to do is type the word, then press Return, and she will see a list of all valid categories of information that are relevant to that word. For "Athens", his might include not only the Atlas, Site Plans and Encyclopedia, where Athens is a main entry, but also Coins and Vases, because there are coins and vases from Athens. In this way, a user can discover pinpoint information that she knows is in Perseus, and at the same time, discover relevant information whether she knew it was there or not. This is the type of tool that helps add to a user's knowledge, either by presenting new information, or by creating a new connection between known facts.

Although we have discovered that most users of Perseus rarely use the most sophisticated features of the system, we think that as they become used to information in electronic form and the specialized ways of navigating through it, they will use more of them, and demand more of them. Currently, many users start out trying to transfer the methods they use to work with traditional forms of information to Perseus. They then see that it is more productive to use the methods that are suited to the medium, and they start to adapt the electronic tools built into the system to perform their work. Ultimately they will be able to do work in systems like Perseus that were not possible in traditional forms, and expect new tools to further that new work. An analogy may be found in the electronic library card catalogs, where users began using them as a substitute for the physical cards, and are now becoming more sophisticated, and performing searches that are more complex and more demanding of the systems.[6]

Rights and Publication

Copyright issues have at times been more complex and intractable than the technical difficulties that had to be solved in order to build Perseus. A great deal of the primary data in Perseus belongs to someone else, either publishers or museums, and we had to get permission in order to use it in a published work. Publishers and museums are used to giving rights to publish small amounts of their data in traditional paper publications, based on percentages and number of editions. But since Perseus is an electronic publication, it does not have easily identifiable editions, and contains far more data than a conventional book or compilation, of vastly different types. This complicates the process of calculating a percentage. Publishers also prefer to be paid royalties, which, in the case of a work the size of Perseus, would require significant bookkeeping.

In order to arrive at a percentage when the new work is a multimedia database, a raw byte-count, on the analogy of a page count, is of little use. In Perseus, image files take up four times as much room as the rest of the system. In addition, there are the programs, indices and other tools that make up Perseus and that are sometimes difficult to separate from the data. Calculated as an absolute percentage based on byte count, all texts in Perseus 1.0 make up about 3% of the total data on the disk. However, the texts actually represent a significant component of the Perseus database to the user who is working in it. Furthermore, unlike works in a paper compilation, they do not function without the rest of the system.[7] They are only small because it is possible to store them more efficiently. Image compression provides a more extreme example of the same phenomenon. The same image, that can be displayed at the substantially the same resolution, can take up different amounts of space depending on whether it is compressed, and by how much.

The well-defined concept of an edition is also disturbed by electronic publication. A paper publication is a unit, which is produced and distributed in a fixed, unchangeable form. Electronic materials may be updated gradually and continually, in an accretive process. They may be distributed with network licenses that allow multiple users at once. They may also have to be re-issued in order to remain compatible with software or hardware developments, without actually changing any of their content, but with changes to their interface. It is this last point, of longevity, that is the most problematic. A paper book remains usable and readable for many years before it physically disintegrates. We can still read books that were printed hundreds of years ago. Electronic publications may become unusable in less than 10 years. If Perseus is to be a scholarly publication on which students and scholars base their research, it has to outlast any particular hardware and software. This also means that the permissions to use and reuse the material in it have to accommodate such software and media shifts.

The problems of how to calculate a fair royalty based on permission and number of editions have not been solved in a satisfactory manner. We tend to negotiate rights on an individual basis, and they are often either based on traditional models, or depend on special arrangements because Perseus is a pioneering project and is unlikely to make large profits. Some of the circumstances of how Perseus is being published and distributed make these negotiations easier. The fact that it is being distributed on CD-ROM makes the analogy to printed books easier. A CD is a discrete, unmodifiable unit which is sold to a single user. Since Perseus can only be used through HyperCard, and it is quite difficult to remove parts of the content to use independently of the rest of the system, the integrity of the system is ensured. It is helpful that Perseus is a research project based at an educational institution, funded by a not-for-profit funding agency, and published by a university press.

Just as a book may be photocopied, it is also possible, given sufficient time and expense, to appropriate almost any form of electronic information for without paying for it. The way to keep this from happening is to make the cost of a publication reflect the value of the information in the system and to make the data in the system as a whole more valuable than its dismembered parts. A Perseus 1.0 CD costs $125[8], which is within reach of most individual users. At the moment, it is not feasible to copy Perseus wholesale because copying the CD would cost more than buying Perseus and a CD player. Furthermore, individual pieces of information from Perseus can only be of limited use, since the formats in which they are distributed on the CD-ROM are tailored to the current Perseus application. We feel that it is preferable to try and discourage copying by making it more profitable to use the system as it is distributed, rather than to try and impose restrictions by means of software and data encryption.

The current state of software and media work together with the legal and economic paradigms in place, and make it much easier to produce and distribute Perseus right now. At the same time, many of the decisions we have made run counter to the basic enabling philosophy of Perseus. The concept of Perseus as an integral, frozen and bounded work is supported by its delivery on CD-ROM with a HyperCard interface. It is analogous to the bound paper book and doesn't require conceptual shifts on the part of the distributors, lawyers or consumers. It is also helpful for us, when we seek permission to include data in Perseus, to be able to explain that we can guarantee the integrity of the materials we are using. As users become more sophisticated, however, they will not be content with data that is distributed in restricted forms as it is in Perseus. They will want to apply other tools and navigational mechanisms on top of the data, or simply remove the data and use it with their own analytical and display tools. They will want the Perseus materials in their generic archival formats, which are richer and offer more possibilities.

The trend towards centralized data storage trend is also likely to work against the present comfortable arrangement of copyright and distribution. Many universities want to be able to store and administer data centrally, and make it available over a network. This eliminates many problems of updating and support. However, it radically changes the control a distributor has over who has access to information. This kind of access is in keeping with the concept of Perseus as an enhanced electronic library, but runs counter to the expectations of copyright holders.

Conclusion

Since the beginning, Perseus has been modeled on the library. Our goal was to provide large amounts of information and tools, integrated in such a way as to allow many different types of user to do their work in different subject areas. Although Perseus has been undergoing evaluation throughout its development, it was only after the first version was finished and ready for distribution that we were really able to gauge how it was being used, and where our assumptions about its use conflicted with what was actually happening. The feedback that we are now getting from our evaluators and users is extremely important, because Perseus is a new and unfamiliar type of information system, even to its creators.

In building Perseus, we are not trying to emulate current paper technology in a more advanced form, but to enable means of working that are qualitatively different. We want to bring our users into electronic space, and help them move about within it, and shape it to their needs. Hypertext is supposed to promote freedom of movement, and we have incorporated hypertext features into Perseus for just that reason. At the same time, we are bound by numerous technological and logistical problems, not the least of which are the issues of ownership, copyright and distribution, which tend to the restrictive rather than the enabling.



Bibliography

Crane, Gregory and the other creators of Perseus, "What Is Perseus? What Is It Not? Comments on the BMCR Review of Perseus 1.0." Unpublished.

Crane, Gregory, "Composing Culture: The Authority of an Electronic Text." Current Anthropology 32, June 1991, pp. 293-311.

Mylonas, Elli and Sebastian Heath, "Hypertext from the Data Point of View." In A. Rizk, N. Streitz and J. André (eds.), Hypertexts: Concepts, Systems, and Applications. Cambridge University Press, 1990.

Mylonas, Elli. "An Interface to Classical Greek Civilization." JASIS 43:2, March, 1992.

Samuelson, Pamela, "Some New Kinds of Authorship Made Possible by Computers and Some Intellectual Property Questions They Raise." University of Pittsburgh Law Review 53:3, Spring 1992, pp. 685-704.

Wiltshire, Sian, Lee T. Pearcy, Richard Hamilton, Harrison Eiteljorg, II, James O'Donnell, "Review of Perseus 1.0." Bryn Mawr Classical Review 3.5, 1992, pp. 347-357.



Figure 1: The catalogue card for the Aegina pediment, with long description and reconstruction drawing.

Figure 2: Shows an investigation of the word geras that was encountered in the text of Apollodorus. After looking it up in the dictionary, the user has decided to see what other related words there are, and which of them Apollodorus uses. Note that the English Greek Word Search is not really a full reverse lexicon, but rather an index into the definitions from the Greek Lexicon. In order to use it effectively, a user has to be aware of how it works.



[1]The Perseus Project has been underway since 1987, and is funded primarily by The Annenberg/CPB Project and also by Apple Computer. It is a collaborative effort based at Harvard University, with contributors from Pomona College, St. Olaf College, Bowdoin College and the University of Maryland.

[2]Mylonas, 1992.

[3]Perseus evaluation is led by Gary Marchionini at the School for Library and Information Science at the University of Maryland. We have also received useful feedback from the Perseus LISTSERV list, PERSEUS@BROWNVM.BITNET.

[4] When we began to build Perseus, and up until early in 1992, HyperCard was distributed free with all new Macintosh computers. The full version was later turned into a commercial product by Claris, and then in early 1993, reclaimed by Apple. It is unclear what its fate will be at the moment.

[5]The different forms of navigation, and the decisions leading up to them are discussed in greater detail in JASIS article (Mylonas, 1992).

[6] The same analogy may be used to illustrate the counter-productive assumptions that users unfamiliar with these systems make, before they properly grasp the scope of the electronic information and the power of the new tools they are using.

[7]Pamela Samuelson has studied and written extensively about copyright issues and hypertext. Her general approach is to examine current law, and discover where the analogies between it and new forms of work and publication cease to be applicable. A recent article on this subject is Samuelson, 1992.

[8] $150 with documentation, and another $200 for a videodisk which contains the same images that are on Perseus, without the rest of the system. Network licenses are available for up to 25 users.



Go to next article
Go to previous article
Go to table of contents