REPORT ON THE ACRL E-TEXT CENTER DISCUSSION GROUP MEETING From owner-etextctr@lists.Princeton.EDU Tue Oct 17 14:49:30 1995 Message-Id: Date: Tue, 17 Oct 1995 13:49:27 EDT Reply-To: etextctr@lists.Princeton.EDU Sender: owner-etextctr@lists.Princeton.EDU From: ETEXTCTR Discussion List To: Electronic Text Centers List Subject: Report on the ACRL E-Text Center Discussion Group, June '95 X-To: etextctr@lists.princeton.edu X-Listprocessor-Version: 7.2 -- ListProcessor by CREN -------------------------------------------------------------------------- Sender: Mary Mallery Subject: Report on the ACRL E-Text Center Discussion Group, June '95 REPORT ON THE ACRL E-TEXT CENTER DISCUSSION GROUP MEETING at the American Libraries Association Meeting in Chicago, IL, June 24, 2-4 PM by Mary Mallery "Putting E-Texts on the Net: Three Perspectives" Marianne Gaunt, Associate University Librarian at Rutgers University Libraries, chaired this meeting of the ACRL E-Text Center Discussion Group at the ALA Conference in Chicago on June 24 in the Venetian Room of the Drake Hotel, where thirty conference attendees met to discuss "Putting E-Texts on the Net: Three Perspectives." The speakers were: * David Seaman, Coordinator of the University of Virginia's Electronic Text Center, who spoke about his experience using Open Text's PAT software for accessing and manipulating electronic texts for scholarly use at the University of Virginia; then * Mark Day, Co-Director of LETRS (Library Electronic Text Resources Services) of the Indiana University, and Perry Willett, the Coordinator of Collection Development of LETRS and Librarian for English and American Literature, Indiana University Libraries, described issues that must be addressed in order to establish a collection of quality texts for scholar's uses with TEI-conformant SGML markup that can also be mounted on the Web in html; and * Gregory Murphy, Text Systems Manager at CETH (Center for Electronic Texts in the Humanities), gave a live demonstraton of SoftQuad's Panorama, software which can take SGML tags and map them to a format (such as html for Web access). Panorama works with Spyglass Enhanced Mosaic and currently Panorama comes bundled with the most recent version of NCSA Mosaic (available at the NCSA ftp site). David Seaman began with a brief history of the UVA Electronic Text Center, which was started in 1992. Its original aim was to provide access through the network to electronic texts through a single piece of software for the user at home. Immediately, they decided not to treat the products as isolated CD-ROMs and not to maintain a lot of different machines and interfaces. Seaman noted that treating the texts as data made the acquisition of the texts more palatable to the collection development bibliographers at the University of Virginia. However, building an online archive with a common interface also brings with it the responsibility of building a user community. The E-Text Center staff found that they had to spend a good deal of time marketing the center, which meant hunting down users in their faculty offices and in the classrooms and showing the advantages of using electronic texts to teach and to analyze the works. Seaman outlined the disadvantages of putting texts online: * a lot of work in processing the texts into SGML and html is necessary for better usability because these texts are not provided in a package, like CD-ROMs; * training sessions are necessary, which involves staff time for planning AND implementation; * in-house documentation must be written and maintained at all stages; * there is an initial investment in the central machine as well as the hiring of a 100% full-time coordinator. However, the advantages of putting texts online are that your users are training in a system, not an individual product, so that they can use any text once they have learned how to use the single interface, and there is a cost saving to the library in that they only have to buy one machine for multiple access through the Web as opposed to the banks of CD-ROM drive terminals that many libraries have been forced to purchase. It is also a cheaper service over several years and a better one because the texts are accessible from patron's homes or wherever, 24-hours a day. Seaman used Netscape to demonstrate the PAT interface at the University of Virginia (http://www.lib.virginia.edu/etext/ETC.html). He noted that the more texts you have, the more important the SGML markup (and its consistency) becomes. With more texts, users will want to search across works for, e.g., first chapters of novels by different authors. The texts in the University of Virginia's Modern English Collection also have a CGI query capability to search for word collocations. In addition, one should keep in mind that when preparing the literary texts, it is important to keep them in a form that is appropriate (e.g., verse form for poetry), use TEI-conformant SGML mark-up. At Virginia, they wrote a search and replace filter that performs conversion from SGML to html on the fly with little clean-up afterwards. This filter will be available soon for public use at the Virginia ftp site. The central issue that came up after putting these texts up for use on the Web was: What sorts of conditions of use should be placed on this material? For example, America Online, a commercial Internet provider, put the University of Virginia on their Web page. Seaman recommends that a paragraph outlining conditions of use could state: * No commercial use without prior permission; * Don't set up texts on another site unless there is strict cooperation; and * If other projects want to use them, use TEI conformant markup. One other issue that Seaman noted is that images are uncontrolled information on the network. The University of Virginia team buried a TEI header as text in the single image file so that with viewing software, the user can learn the source of the image. In the future, Seaman said he envisions there will be more full- text databases. Also, he said, we will see increasing use of Web documents in classes as a teaching tool. Finally, Seaman stressed that with the Web the library now has the opportunity to establish within our own community the library as the focal point for the creation and use of online information. "It is a chance in a new environment to re-articulate in a visible and exciting medium quite what it is that a library does, and why it is the functional heart of a college." Perry Willett, the Coordinator of Collection Development for LETRS, Indiana University, outlined his center's collection development policy for electronic texts. They have no central fund for acquiring electronic texts, but use a general fund. There were three levels of priority for LETRS support: 1) acquire and maintain SGML-encoded texts to make them available over the network (this entailed acquainting the subject bibliographers at the library with SGML and the TEI); 2) acquiring non-SGML texts and dealing with the inherent problems mentioned by David Seaman earlier; and 3) acquiring humanities computing software, e.g., TACT and Pro- Cite, to search and manipulate the texts. Willett recommended that any library establishing an e-text center should have a collection policy in place. Many other parts of the library will see your center as the answer to their electronic needs (e.g., reserve, reference, etc.); however, the goal should be establishing access to electronic texts in the humanities. Evaluating the quality of electronic texts is hard to assess as the formats of publishing them multiply; you can get e-texts over the Web, through ftp, on CD-ROM, etc. One must evaluate each collection of electronic texts as it becomes available because evaluating quality is no longer a question of a publisher's imprimatur. Many publishers are not familiar with TEI-conformant SGML, so we cannot depend on them to have well-marked up texts. The reason to put your collection of e-texts up on the Web is increased access. People want access to texts at their desk, where they're doing their work. The staff at LETRS found that the logistics of putting the texts up involved: * having the staff to get the e-texts running on the network, and * determining what kind of search client was needed (user surveys had to be performed): do people/faculty have equipment that can run a graphical interface, or is a vt100 search and display client more suitable? LETRS uses Open Text PAT software for network display of texts, but Willett noted that they have run into problems with, for example, inconsistencies in mark-up of Oxford University Press's Jane Austen texts, where paragraphs of texts were skipped because of two different uses of entity references to denote the same thing. Willett also noted that librarians have the lead in knowing SGML and being the resident experts in their community. Mark Day, the co-Director of LETRS at Indiana University, gave a presentation of the LETRS Web page (http://www.indiana.edu:80/~letrs/letrs.html). LETRS began in the Reference Department of the IU Libraries as a small project modeled after the first Electronic Text Center at Columbia University set up by Anita Lowry (now at the University of Iowa). It was designed, however, to be more fully integrated into library operations, with selection decisions and funding for acquisitions coming from the existing core of subject and area specialists. LETRS then expanded into a unique collaborative program, with an expanded public facility located in the main library but jointly supported and administrered by the IU Libraries and the University Computing Services -- with Day as Co-Director from the Libraries and Dick Ellis as Co-Director from UCS. Choosing to put the texts up on the Web made the job harder and easier and affected the choice of texts. To make texts available to all at home meant dumbing down applications to the VT100 emulation level. The cataloging and classification of electronic texts is easy for the librarians but decisions on displaying the texts are editorial and bring up the question: are we publishers, librarians, or computing people? However, without SGML markup you can't do a lot of text manipulation beyond display and string searching. Day noted that with the current TEI-Lite markup scheme, students can now learn TEI-conformant markup and help with processing the texts for display. Day closed with a look at future integration issues to consider when establishing a collection of electronic texts. He noted that when planning an electronic resource center, one should look ten years ahead whenever possible. In future libraries, it is probable that special projects, such as LETRS, will be integrated with related electronic text collections and services to form a single virtual university library. Gregory Murphy, Text Systems Manager for CETH (Center for Electronic Texts in the Humanities), gave a demonstration of SoftQuad's Panorama, software for displaying TEI-conformant SGML tagged text in html on the Web. Murphy began by noting that html gives one few choices: the fonts are limited and documents with Greek text can't be accommodated. There are too many different kinds of humanities texts for easy homogenization into html. Murphy used his own SGML-marked version of Chaucer's translation of the "Romance of the Rose" to show Panorama's split screen display and navigator functions to change >From text in Fragment A to Fragment B, then compare it to Fragment C. Then he used the Navigator to establish hot links to the text. Murphy demonstrated the scroll bar's added feature on Panorama, where if one searches for a word (e.g.,the Middle English "wyf"), the scroll bar gives a graphical representation of the density of search hits in your document with lines and darker thicknesses of lines for multiple hits. The user can also link to the place in the text where the hit occurs by clicking on the scroll bar. In searching the document, Panorama is not fast because it does not index the text as PAT does but instead uses the grep utility to walk through the text. Murphy closed by showing Panorama on a text that is part of the Freud Project at CETH: multiple texts of Freud's "Hierarchies, Boundaries and Representation in a Freudian Model of Mental Organization." When one searches for a term such as "transference," it links to Freud's early work on aphasia (using the TEI extended pointer syntax inspired by Hytime standards). The Hytime location ladder allows the user to link to the exact place in the second text where transference is discussed. Murphy also showed how images can be linked using Panorama. He also noted that the user can search on tags as well as words in Panorama. There was a question and answer period after the presentations. The next meeting of the ACRL E-Text Center Discussion Group will be held at the Midwinter ALA in San Antonio, TX on Saturday, January 20 from 2PM-4PM, the topic will be announced. ******************************************************************************** *Mary Mallery, Ph.D. * Telephone: (908) 571-4404 * *Reference and * Fax: (908) 571-3456 * *Electronic Services Librarian * E-Mail: mmallery@hawkmail.monmouth.edu* *Guggenheim Memorial Library * http://www.monmouth.edu/irs/library/ * *Monmouth University * library.html * *West Long Branch, NJ 07764 * * ********************************************************************************