SGML: ETEXTCTR Review #2

SGML: ETEXTCTR Review #2

This online copy of ETEXTCTR Review #2 contains references to works of interest to librarians, some of which reference SGML. Most of the relevant articles are cited in my full SGML bibliography, but additional abstracts are available here, usually linked from my main entries. -rcc


From owner-etextctr@lists.Princeton.EDU Fri May 19 17:21:35 1995
Return-Path: <owner-etextctr@lists.Princeton.EDU>
Received: from lists.Princeton.EDU by utafll.uta.edu (4.1/25-eef)
	id AA07199; Fri, 19 May 95 17:21:30 CDT
Received: by lists.Princeton.EDU id <23243.s2-1>; Fri, 19 May 1995 16:22:09 -0400
Received: from ponyexpress.princeton.edu ([128.112.129.131]) by lists.Princeton.EDU with SMTP id <23223.s2-1>; Fri, 19 May 1995 16:21:31 -0400
Received: from phoenix.Princeton.EDU by ponyexpress.princeton.edu (8.6.12/1.7/newPE)
	id QAA02318; Fri, 19 May 1995 16:20:18 -0400
Received: by phoenix.Princeton.EDU (4.1/Phoenix_Cluster_Client)
	id AA04751; Fri, 19 May 95 16:20:02 EDT
Message-Id: <CMM.0.88.800914798.etextctr@phoenix.Princeton.EDU>
Date: 	 Fri, 19 May 1995 16:19:58 EDT
Reply-To: etextctr@lists.Princeton.EDU
Sender: owner-etextctr@lists.Princeton.EDU
From: ETEXTCTR Discussion List <etextctr@phoenix.Princeton.EDU>
To: Electronic Text Centers List <etextctr@lists.Princeton.EDU>
Subject: ETEXTCTR Review #2
X-To: etextctr@lists
X-Listprocessor-Version: 7.1 -- ListProcessor by CREN
Status: R

Sender: Mary Mallery, Moderator, ETEXTCTR Discussion List
        <mallery@gandalf.rutgers.edu>
Subject: ETEXTCTR Review #2

ETEXTCTR Review #2, May, 1995

ETEXTCTR Review provides abstracts of current articles from journals of
interest to those working with electronic texts in a research setting. 
Volunteer contributors for this issue are:  Jerry Caswell (JVC), Iowa State
University Libraries; Aurora Ioanid (AI), The Center for Electronic Texts
in the Humanities; and Mary Mallery (MM), CETH.

*************************************************************************

Bibliographic Entries



* Burrows, Toby. (1994). "Integrating electronic services into the academic library: the Scholars' Centre at the University of Western Australia." _Australian Academic and Research Libraries_ 25: 213-220.
The article examines the complexities of the process of setting up a center for scholarly electronic resources and integrating it with the rest of the library information services at the University of Western Australian Library. The author emphasizes the three "major imperatives" that govern the establishment of this center, known as the Scholars' Center: "proliferation of resources in electronic forms," the "need to focus on library services," and the necessity to contribute to the collective effort to "maximise the quality of its [the university's] teaching and research." The author addresses all the important aspects that are involved in the process, starting with issues related to the physical facilities, the need for specialized staff, collection development, type of access to various databases and electronic texts, as well as a thorough analysis of the users' needs to calculate the level of expertise needed for operating the Center. --AI



* Day, Mark Tyler. (1994). "Humanizing Information Technology: Cultural Evolution and the Institutionalization of Electronic Text Processing." In Sutton, Brett, ed. _Literary Texts in an Electronic Age: Scholarly Implications and Library Services_ Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, pp. 67-92.
Day discusses issues of cultural evolution in today's information society and the efforts made by the modern university library to adapt to them. Indiana University's Library Electronic Text Resource Service (LETRS) is an example of the adaptive process. It is a cooperative effort of the library and computing center to provide faculty and students with access to scholarly electronic texts in the humanities and related computing software tools. Despite organizational and economic constraints, this "humanist's laboratory" represents a new collaborative system of the cultural preservation of materials that embody traditional humanistic values. --JVC



* Guenther, Rebecca. (1994) "The Challenges of Electronic Texts in the Library: Bibliographic Control and Access." In Sutton, Brett, ed. _Literary Texts in an Electronic Age: Scholarly Implications and Library Services_ Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, pp. 149-172.
This article addresses special problems relating to bibliographic control and description of electronic texts. The main issues discussed are: identification of electronic texts, description, location and access. The author extensively describes the earlier OCLC Internet Resources Project Cataloging Experiment and its involvement in the study of the possibilities of accommodating online information resources in USMARC formats. Librarians are interested in placing data about electronic resources in the same type of database they use for the other library materials, that is USMARC. Consequently, the author mentions the different proposals that emerged from this experiment, as well as from other projects initiated by organizations like USMARC Advisory Group of the American Library Association. Among these proposals a very important one was the addition of the 856 field, which, in the case of the Internet resources, would supply the connection between the bibliographic record and the text itself. Because electronic texts are complex objects, AACR2 rules for computer files are scrutinized in an attempt to identify better ways of describing the various forms in which an electronic text can appear. In the end, the author discusses the misconception that SGML would replace USMARC and defines their different functionalities. --AI



* Harrison, A.D., Roos, F.A. & Thomas, R.E. (February, 1995). "(Semi)automatic capturing of bibliographic information from journal contents pages for inclusion in online library catalogues: the RIDDLE Project." _Electronic Library_, vol. 13, no. 1: 15-19. --MM
A summary of the RIDDLE (Rapid Information Display and Dissemination in a Library Environment) Project (available on the Web at <http://www.cwi.nl:80/cwi/projects/riddle.html> and at <http://web.inf.rl.ac.uk/proj/riddle.html>, an international endeavor funded by the Commission of the European Communities' (CEC) Telematics Research and Technological Development Programme. The project involved "a feasibility study of the use of scanning technology to capture the contents pages of scientific journals, extract the bibliographical information of the article and load this data into an online library catalogue (OLC)." There are tables of the SGML tags chosen for this work as well as a sample of results of marking a particular journal, and consideration of how easily SGML translates into the different catalogue interface packages in the European countries. Also included are formulas for computing the cost effectiveness of such a project. --MM



* Hockey, Susan. (1994). "Electronic Texts in the Humanities: A Coming of Age." In Sutton, Brett, ed. _Literary Texts in an Electronic Age: Scholarly Implications and Library Services_ Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, pp. 21-34.
This article provides a brief historical account of the progress of electronic texts in humanities research as well as a concise overview of applications in literary computing, "including concordances, text retrieval, stylistic studies, scholarly editing, and metrical analyses." The author reviews developments in electronic texts today and the steps forward in text preparation that the Text Encoding Initiative's _Guidelines_ make possible. Finally, the author looks to the future of texts marked up in TEI-conformant SGML and the development of better analysis tools that take advantage of the expertise of natural language understanding systems, as well as digital imaging technology.



* Johnson, Eric. (1994-1995). "Oxford Electronic Text Library Edition of the Complete Works of Jane Austen," _Computers and the Humanities_, vol. 28, pp. 317-321.
This review of the OETL electronic edition of the Complete Works of Jane Austen provides an introduction to a new kind of resource for libraries: the full-text CD-ROM of an author's oeuvre. To demythologize this new beast, Johnson shows its face, including examples of a page of SGML-tagged text and the same page formatted to hide the tags. In addition, the author shows how to use such a text, though Johnson notes that the analyses of the texts produced for his review were generated by programs that he wrote himself; however "since the texts are encoded with SGML, they should be able to be used with software designed to process SGML -- such as _Intellitag_ (from WordPerfect) or _DynaText_ (from Electronic Book Technologies)." The article also includes sample output from a simple query looking at the various characters' speech patterns. --MM



* Kiernan, Kevin. (February, 1995). "The Electronic Beowulf." _Computers in Libraries_ vol. 15, no. 2: 14-15.
The manuscript of the Old English epic _Beowulf_ has long been the center of dispute among textual scholars who would like to fill the lacunae left by the flames of the damaging fire the manuscript survived in 1731. Now, through digitization and the coordination of a team of experts from libraries, computer science, math and English departments in Europe as well as the United States, some answers are being found. Kiernan gives a quick history and overview of the Beowulf Project at the University of Kentucky and the British Library (now centered in the Richard Rawlinson Center for Anglo-Saxon Studies and Research). One can view a Mosaic presentation of the project at URL: <http://www.uky.edu/~kiernan/welcome.html>. --MM



* Lane, Anthony. (February 20 & 27, 1995). "Byte Verse: How to wing your way through thirteen hundred years of English Poetry in an afternoon of interfacing." _The New Yorker_, pp. 102-117.
Despite its title, this article provides more than an afternoon lark through Chadwyck Healey's _English Poetry Database_ at the New York Public Library. Lane is a keen observer. He speculates "on the sort of person who would really _need_ 'English Poetry,'" and he's thinking past the joy of follow-that-theme to its implications: "Once you have a printout of your sleep-meets-death findings, the onus is then on you, as never before, to wonder what on earth they might mean; the computer hasn't a clue." Lane's analysis of the database includes a short history from idea to actual transcription (in the Philippines). Also, he notes what poetry is available on the disk, as well as what's not (this is his list, others might add to it): no Shakespeare plays, no hymns published after 1800, no American poetry, and no verse from this century. Finally, you might read this article to experience the view from inside the heads of the novice user of tools for electronic text access. --MM



* Lowry, Anita. (1994). "Electronic Texts and Multimedia in Academic Libraries: A View from the Front Line." In Sutton, Brett, ed. _Literary Texts in an Electronic Age: Scholarly Implications and Library Services_ Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, pp. 57-66.
Acting on the premise that both graduate and undergraduate students could benefit from exposure to electronic texts and hypermedia databases that link primary source materials, the University of Iowa Library created the Information Arcade, which consists of an electronic classroom for instructor directed learning and a laboratory. The laboratory includes an information stations cluster for viewing preexisting information resources and a multimedia cluster for the creation and manipulation of electronic material. Experience with various classes suggests that all participants, including undergraduates, are strongly motivated to do research and to create materials. Problems include the multiple and proprietary platforms of some databases and the demands on staffing that the creation of source materials requires. --JVC



* Mathiesen, Thomas J. (1994). "Transmitting Text and Graphics in Online Databases: The _Thesaurus Musicarum Latinarum_ Model." _Computing in Musicology_, 9: 33-48. (MM)
This article provides a full description of the _Thesaurus Musicarum Latinarum_ (TML), "an evolving database that will eventually contain the entire corpus of Latin music theory written during the Middle Ages and the early Renaissance." It is a unique model for electronic text transmission because the database includes musical notation as well as ASCII text, so that the choice of graphics formats and a transmission protocol with the least amount of corruption was paramount. Mathiesen documents the decisions for data capture and verification with OCR software as well as the delivery and structure of the database through a gopher server <gopher://iubvm.ucs.indiana.edu/11/tml>, a listserv, TML-L (subscribe through listserv@iubvm.ucs.indiana.edu), and an ftp site, TML-FTP (available at ftp 129.79.1.10, password is "themulat"). The Appendices contain the "Principles of Orthography" for the database as well as the "Table of Codes for Noteshapes and Rests." --MM



* McMahon, Kenneth. (March, 1995). "BUBL BITS: Investigating the Computers in Teaching Initiative (CTI) WWW Services." _Computers in Libraries_ vol. 15, no. 3: 53-54.
There are twenty subject-oriented CTI centers in the UK, each of which supports the use of computers in teaching at the higher education level. The electronic resources of each center are available on WWW servers and accessible through a common interface (the BUBL WWW Subject Tree at URL: BUBL), which is located at the University of Bath. Resources include reports, bibliographies, full text journals, and courseware. --JVC



* Olson, Nancy B. Cataloging Internet Resources : a Manual and Practical Guide. OCLC Computer Library Center, Inc, c1995. Available via anonymous ftp at URL: ftp://ftp.rsch.oclc.org/pub/internet_cataloging_project/Manual.txt
Nancy Olson's manual for cataloging Internet resources represents the result of a collective effort directed toward the identification of AACR2 and USMARC capabilities to describe the specificity of Internet resources. Originally, it was initiated within a "... nationwide, coordinated effort among libraries and institutions of higher education to create, implement, test, and evaluate a searchable database of USMARC format bibliographic records, complete with electronic location and access information, for Internet-accessible materials." These guidelines have been developed in support to the OCLC project participants undertaking the difficult task of cataloging a new type of bibliographic resource located on the Internet in the form of electronic texts. --AI



* Price-Wilkin, John. (1994). "The Feasibility of Wide-Area Textual Analysis Systems in Libraries: A Practical Analysis." In Sutton, Brett, ed. _Literary Texts in an Electronic Age: Scholarly Implications and Library Services_ Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, pp. 113-136.
After recounting early efforts at Chicago, Dartmouth, Michigan and Virginia, Price-Wilkin identifies and discusses the fundamental characteristics of wide-area textual analysis systems: very precise searching at high speeds, the ability to show keywords in context and to qualify searches by structural characteristics, statistical generating capability, and grammatical analysis. To be useful in a wide area environment, texts must be reusable, observe standards for encoding (i.e., SGML, TEI), and be accessible from multiple user platforms. While an increasing number of texts are available from both academic and commercial sources, some are compromised by the choice of edition used, limited markup, poor transcription, or the lack of flexibility in licensing. Open Text Systems' PAT is evaluated as a server platform that meets many of the needs of a wide-area textual analysis system. Examples of its use are given at the University of Michigan and the University of Virginia. An appendix discusses document structure and the need for protocols and a standard query language that are aware of structure. --JVC



* Price-Wilkin, John. (1994). "Using the World Wide Web to Deliver Complex Electronic Documents: Implications for Libraries." _The Public-Access Computer Systems Review_ 5, no. 3: 5-21. (Or e-mail the command GET PRICEWIL PRV5N3 F=MAIL to LISTSERV@UHUPVM1.UH.EDU.)
At the University of Virginia the products of several scholarly projects in literature and history were converted into HTML so that they would be readily available over the World Wide Web. Unfortunately HTML's inability to reflect the structure of complex documents compromised the efforts. Price-Wilkin found a better solution by developing a common gateway interface (CGI) >From the Web to an SGML based server. This provided a simpler user interface for complex information retrieval, took advantage of a sophisticated retrieval engine (Open Text's PAT), and enabled the elements and relationships of complex SGML encoded documents to be represented without fragmenting them and without abandoning the standards that were used in their creation. --JVC



* Schwartz, Lillian. "The Art Historian's Computer: Riddles Posed by Ancient Works Fall to Historical Analyses and Electronic Explorations." _Scientific American_ vol 272, no. 4: 106-111.
By using computer graphics to scale and juxtapose images Schwartz has been able to shed light on the sources of famous portraits such as the Mona Lisa and Shakespeare. She has also used computer graphics to show how certain paintings relate to the environment for which they were created. --JVC



* Seaman, David. (1995). "Campus Publishing in Standardized Electronic Formats -- HTML and TEI." In Okerson, Anne, ed. _Filling the Pipeline and Paying the Piper: Proceedings of the Fourth Symposium_. ARL Publications. Also available at URL <http://www.lib.virginia.edu/etext/articles/arl/dms-arl94.html>.
David Seaman, the Director of the University of Virginia Library's Electronic Text Center, describes his Center's project in converting documents marked in TEI-conformant SGML into documents with hypertext (html) markup for distribution over the World Wide Web via Pat. The author notes the difference between documents that are suitable for html markup (e.g., short guides and brochures) as opposed to documents that would require more granular demarcation of their structures (e.g., finding aids, full texts, set of journal titles, encyclopedia, etc.). In addition, Seaman documents how the Virginia project team embedded the TEI header into their image files to maintain the record of the image's origins. The html version of this article contains many links to useful sites for creating html files, perl scripts for SGML-to-html conversion, and html documents and image files that pertain to the article. --MM



* Sperberg-McQueen, C. M. (1994). "The Text Encoding Initiative: Electronic Text Markup for Research." In Sutton, Brett, ed. _Literary Texts in an Electronic Age: Scholarly Implications and Library Services_ Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, pp. 35-56.
The work of the Text Encoding Initiative (TEI) grew out of the need to address the fundamental problems of representing and sharing electronic texts: how to represent document structure, how to link interpretive and auxiliary information, the lack of a standard and extensible system of markup. With support from professional associations and research centers TEI developed a system of SGML markup that culminated in the third edition (P3) of 1994. P3 embodies a hierarchical document grammar, which focuses on document structure rather than layout, defines a concrete set of tags which may be mixed or extended as needed, and, in requiring conformance to international standards, is platform independent and non-proprietary in nature. A sample text demonstrates its application. --JVC

**************************************************************************
If you would like to contribute to ETEXTCTR Review or recommend an article
for review, write to Mary Mallery, Moderator of ETEXTCTR, at e-mail:
<mallery@gandalf.rutgers.edu>.