[This local archive copy mirrored from the canonical location: http://www.dlib.org/dlib/september97/theses/09fox.html; refer to this authoritative version.]
Edward A. Fox, John L. Eaton, Gail McMillan
Neill A. Kipp, Paul Mather, Tim McGonigle, William Schweiker, and Brian DeVane
Blacksburg, Virginia 24061-0106
Project Director: Edward A. Fox
D-Lib Magazine, September 1997
On the first anniversary of funding by the U.S. Department of Education (FIPSE) for a National Digital Library of Theses and Dissertations, we review its origins (see [FOX96a] for an overview of the project), describe progress-to-date that warrants its now being called the Networked Digital Library of Theses and Dissertations (NDLTD), explain some of the controversy that has led to widespread publicity and dissemination, and explore future growth possibilities.
The first workshop about electronic theses and dissertations (ETDs) took place in 1987 with a technical focus on standards, namely applying SGML to the description of research. Ten years later, we realize that the proper aim should be improving graduate education by having students enter ETDs into a digital library which facilitates much broader access. Achieving that goal calls for a sustainable, worldwide, collaborative, educational initiative of universities committed to encouraging students to prepare electronic documents and to use digital libraries - NDLTD. Since students often learn best by doing, this competency-oriented program should ensure that the next generation of scholars is prepared more completely for the Information Age, in which they can apply and pass on their skills in academia or other research situations.
With funding in 1996 from the Southeastern Universities Research Association (SURA), our Virginia Tech team was able to build upon local efforts, including a solid foundation of library-developed processes, to facilitate a beta program in the Southeast. Additional support from FIPSE, and in-kind contributions from a number of sources, especially Adobe, IBM, and Microsoft, have enabled expansion to the national and international levels. Public forums afforded by the Coalition for Networked Information (CNI), the Council of Graduate Schools (CGS), and many other groups, have made the idea of an ETD initiative a familiar topic to hundreds of leaders at diverse universities. Much larger numbers have heard about the topic through newspaper, radio, and TV coverage [NDLTDa]. Yet, because news coverage often focuses on controversy, the discussion below attempts to concentrate on progress made and to dispel some misconceptions that may have arisen.
As NDLTD has expanded, we have seen progress in many places. For example, prompted in part by the NDLTD, UMI, which has the world's largest microform archive of theses and dissertations, has launched its ProQuest Direct service of scanning (at 300 dpi) and using optical character recognition software to convert the scanned documents into text files (into PDF, so text is recognized as accurately as current tools allow) works it receives after 1996. Many groups, including CGS, have established committees to explore the concept of ETDs. Although these deserve concentrated attention, we will focus mainly on collaborative efforts and work that is specifically oriented toward building the NDLTD.
Local progress toward NDLTD has been made possible as a result of efforts by the Library, Graduate School, funded project team, and other parts of Virginia Tech. Several student project teams in courses in computer science (CS4624, CS5604) have made important contributions, assisting in the preparation of multimedia training materials and prototype digital library implementations. Students studying digital libraries (in University Honors 3004 and CS6604) in Fall 1997 have already started to select term projects to assist our efforts. Professor Jong-Min Bae has come from Korea to spend a sabbatical year starting August 1997, providing further aid at Virginia Tech. In addition, there are students, faculty, and staff at other universities and organizations providing assistance by testing, adapting, and extending Virginia Tech's programs and processes.
At the University of Waterloo, a team has been studying about ETDs, and prepared a survey of worldwide activities [WATE97]. North Carolina State University recently established an ETD Web site, and the University of Virginia makes available on WWW a student-run ETD resource directory, plus pointers to publications showing student interest in the initiative [KIRS97]. From these sites one can learn about investigations and pilot efforts in Australia, Aalborg University, The University of Texas at Austin, and University of South Florida. We invite others involved in related efforts to provide pointers so that we may cite their work.
Though it may take 12-18 months for a university to investigate the idea of ETDs, develop suitable policies, reach consensus, launch a pilot effort, begin to train students, and enhance local infrastructure to facilitate network submission by students, some institutions have moved more rapidly. Among those institutions joining NDLTD most quickly are those in Dagstuhl, Germany and Monterey, California. At the Darmstadt University of Technology, it is likely that interest was stimulated because of local expertise in multimedia information and systems. ETDs allow students to apply those technologies directly and go beyond the limits of paper theses or dissertations by including audio, image and video illustrations and by adding hypertext links. In the case of the Naval Postgraduate School (NPS), building upon prior digital library activities [NORR97], a team of Navy reserve officers studied the matter and reviewed documentation provided by Virginia Tech. Very shortly after a telephone conference that obtained additional information, university officials decided to join NDLTD. NPS is obligated to provide access to its theses and dissertations to the Navy worldwide; this is much less expensive if electronic distribution methods can be employed. This shows a clear economic benefit.
During its first year, NDLTD has grown to 20 members, with scores of other institutions interested and in a number of cases, visited or briefed on the initiative. An online status file is maintained to document the current situation [NDLTDb]. At present, Florida is the state with the largest number of members in NDLTD. A team at University of South Florida is helping prepare an edited sourcebook on electronic theses and dissertations, having produced a call for contributions, with publication planned in 1998.
The University of Virginia has taken the initiative on adapting the Dienst system (developed at Cornell, and used in the Networked Computer Science Technical Report Library, <http://www.ncstrl.org>) to use for ETDs [MOOR97]. Interoperability tests with Virginia Tech are planned for Fall 1997. Access using Dienst will mean that end-users will have a single view of the distributed set of ETDs. They can use the WWW to browse among the dispersed collections at NDLTD sites, by author, topical area (i.e., department), or year. Alternatively, they can search the full-text of metadata (including abstract) for the full collection or parts thereof, i.e., issue one query to search all sites in parallel. Furthermore, the NDLTD as a whole could have archival and search engines flexibly structured and located to suit economic, political, and social preferences. Universities could keep their own archive or have it managed by an archival service, and search engines could be at each university or run by state, regional, national, or other services. For performance reasons, backup and regional replication systems can be included in the overall architecture. Further work with Dienst should afford other user services, especially if Dienst is used to handle a large portion of computer science preprints, and is extended to manage user profiles and selective dissemination of information.
Steering Committee. Guidance for NDLTD is provided by an international steering committee. The committee meets in the middle of March and September each year in Washington, D.C., and has email discussion during the intervening months. Members represent Canada, UK, World Bank (African Virtual University), universities and libraries in the Southeast (SURA, SOLINET), Western Area Graduate Schools, the National Science Foundation, Adobe, CNI, CGS, IBM, CIC (Big 10), Job Accommodations Network, NSF, OCLC, U.S. Department of Education, and other constituencies.
After hearing reports from UMI and OCLC about archival and access services, members at the March 1997 meeting decided to encourage maximizing access, allowing as many "players" as become interested to provide various services for those interested in ETDs. This will be feasible if all member institutions freely share among themselves and their agents MARC (library catalog) records describing their ETDs, and if each record contains one or more URNs pointing to authentic full copies, such as might reside in a university archive. Thus, the NDLTD support team at Virginia Tech is working to arrange interoperability tests, building upon existing library and digital library infrastructure.
To support digital library activities at Virginia Tech, IBM has donated a variety of hardware. One server, acquired to run IBM Digital Library software, and to serve multimedia files associated with ETDs, has four terabytes of hierarchical storage, roughly 40,000 gigabytes---enough for about 40 million average-sized ETDs. Virginia Tech will be hosting a user group meeting (October 20-22, 1997) for groups employing IBM Digital Library systems; the focus will be on human-computer interaction issues. It is hoped that the IBM system will be extended to support gateway and federated search capabilities that will allow interoperability tests among NDLTD institutions.
One of the IBM systems runs OCLC's SiteSearch, thanks to a license donation by OCLC. OCLC is providing over a million MARC records that refer to theses and dissertations from its WorldCat database. This will provide information previously not readily available since few masters theses are included in the UMI database. SiteSearch supports Z39.50, which enables access through a variety of clients [LYNC97]. It also can be adapted to provide similar functionality to Dienst, so that "federated search" is afforded, with client or gateway merging of results from remote sites [PAYE97].
In addition to hardware and software to support NDLTD, Virginia Tech also has a rich network infrastructure, including a vBNS (high speed Internet research and education backbone) connection through "Network Virginia," the statewide ATM network that it runs and which includes educational institutions all over the Commonwealth. Local access, so that students can submit their works electronically, is provided by the campus network as well as through the town (Blacksburg Electronic Village, <http://www.bev.net/>). While such an infrastructure is not necessary to participate in the NDLTD, it does improve online processing time and it enhances user access.
Students at Virginia Tech use a variety of tools developed to help them prepare ETDs, thanks in part to support from SURA aimed to support growth of NDLTD in the Southeast. We have adopted a scenario-based design approach, and in addition to assembling commonly available inexpensive software packages, have been constructing other files and tools to support low-cost document manipulation as well as efficient workflow processes [KIPP97]. These include:
Collaboration with staff at the University of Virginia who are involved in the Text Encoding Initiative (TEI) includes demonstrating interoperability between documents marked up with ETD-ML and those marked up according to the TEI Guidelines. Collaboration with a student at Rhodes University in South Africa deals with testing many of the tools discussed above, and complements efforts underway with various institutions in the Southeastern United States.
Institutions involved in NDLTD are all working toward having students prepare ETDs so they can learn from that experience and at the same time help build a large and smoothly functioning digital library. The contents of the Virginia Tech WWW site are distributed on CD-ROM to institutions that join the NDLTD. This is intended to help other institutions provide information and access to their community. Therefore, Virginia Tech's material has been reorganized into three parts, to discriminate clearly among the following:
Virginia Tech students have submitted over 500 works that are included in the Library's online catalog. A variety of additional services are provided on an interim basis until the collection gets larger and until distributed digital library software is tested in conjunction with other NDLTD members. Thus, browsing is supported, with a separate list for recent works, as many local students are eager to have their work immediately accessible, and many outsiders look for the latest findings. The OpenText system indexes each text or PDF part of each ETD, and handles full-text searching by all interested parties. Other software will index image files to support searching on image content.
A number of other NDLTD members already have online documents, including: Naval Postgraduate School, NC State University, and University of Virginia. In addition, searches over the Internet and discussions with personnel from a variety of universities have turned up small collections of works, made available by individual departments or centers. We have contacted each new interested party to see if they would join NDLTD.
Since the inception of FIPSE support for NDLTD, we have addressed several controversies, working with students, faculty, and publishers. While we anticipated concerns of publishers, and released in 1996 a Statement about Publications specifically targeted to assuage concerns of that audience [FOX96b], it appears that few have read that document or been assuaged by it. A variety of efforts are now underway to prepare paper booklets and eventually books to explore matters more fully, document the various perspectives, and explain many of the legal and technical complexities. We hope that such efforts will broaden the discussion and understanding, facilitate cooperative agreements between all parties involved, and further our aim of having students and universities understand more about preparing electronic documents and using digital libraries.
Meanwhile, there has been extensive news coverage related to NDLTD, e.g., an NPR Morning Edition story, an article in the NY Times that was later picked up by a number of regional newspapers, and an interview on a Singapore TV morning show [NDLTDa]. Much of that coverage concerns Virginia Tech's making ETDs freely available in connection with the NDLTD, and statements by publishers that they would not accept submissions that appear on WWW.
We believe that worries of publishers in this regard can be resolved by some variant of the Approval Form [NDLTDc], which is explained in an open letter to students [FOX97a]. In particular, this form requires students and their faculty committee members to sign an agreement in which they:
As has been discussed at events such as the 1996 Allerton conference (<http://edfu.lis.uiuc.edu/allerton/96/>) about social and user aspects of digital libraries, the success of a digital library project depends strongly upon how it relates to the activities of individuals, groups, organizations, and institutions, as well as the broader social context. Additional research on these matters should be given high priority [BORG96].
While this philosophy has been adopted since the early 1990s in connection with developing a program for ETDs, the ramifications and practical impact of various concerns of students, faculty, and publishers have not yet been summarized for the digital library community. Fundamentally, those concerns fall into three main categories, covered in the next three subsections, namely those relating to: time, effort, impact, reward, and quality; loyalties; and economics.
Theses and dissertations are written as part of the requirements for graduate studies. While there have always been particular quality constraints enforced on those works by faculty and officials handling graduate affairs, and while few have been willing to complain about those rules, changing those rules in a significant manner has caused a number to complain vociferously.
In 1996, even with an economic incentive (waiving $20 archiving fee), only a fraction of those turning in theses or dissertations elected to do so electronically. Since project objectives are for students to learn, Virginia Tech officials agreed during the spring of 1996 to make submission a requirement, starting in 1997, in effect forcing students to learn what was deemed beneficial for them. Though diverse publicity and training efforts took place on campus to alert students to the policy and to help them prepare their ETDs, when deadlines for spring graduation came near in April 1997, we received many vocal complaints.
There appear to be several explanations for such concerns, e.g.,:
Underlying these concerns are key issues regarding preparing theses and dissertations. First, writing a thesis or dissertation takes time and effort, usually more than was expected. Hence, anything that might increase the time required is very likely to be resisted. Second, many students are unsure about the impact of their works, or about what rewards they can expect from their effort, due to the complex system of credit given to people engaged in publishing. Finally, students are uncertain about the many tradeoffs and interconnections between aspects of electronic publishing, as shown in the following figure.
Quality results from time and effort that usually is prompted by hoped-for reward, such as impact on ones' scholarly community. That impact depends on a work being accessible, which is much more likely with ETDs than previously. Similarly, impact may increase if a student can more directly and simply express, using multimedia technology for example, the key ideas and message of their research. Creative expression thus may be facilitated through an electronic document. However, that may make it more difficult to archive the document, and extensive use of diverse multimedia representations may also reduce accessibility. Balancing these six aspects calls for more thought than most students, faculty, and librarians may have given to electronic publishing, but is a key to building digital libraries and is an important goal of the NDLTD.
Having students prepare electronic documents, even though based on sound pedagogical and career growth principles, also brings up a key issue which is at the heart of distinguishing digital libraries from the WWW. In the culture of the Internet, many vehemently argue for free information, regardless of the quality that results from such a policy. For example, in the arena of computer science technical reports, on the basis of experience with the WATERS and NCSTRL initiatives, few authors or departments are concerned about the correctness of bibliographic data that facilitates access, or the reliability of servers supporting searching. Some argue that fully automated systems, that gather data for searching without requiring work by authors or departments [WITT96], are adequate. Given such attitudes in the WWW culture, it is not surprising that students are unclear regarding how much time and effort they should invest.
Another underlying issue relating to ETDs is the diversity of opinion among students and faculty regarding loyalties. Why should a student support NDLTD, which aims to promote knowledge sharing and scholarship, and is endorsed by ones' university, when there are competing influences from ones' advisors, research group, discipline, and associations? Why should a student give copyright to a publisher and not retain rights to their intellectual property, e.g., that allow inclusion in their own thesis or dissertation as well as distribution of those important documents to interested scholars?
In some disciplines, students are further from center stage in research groups than others, and efforts to give their work more attention as opposed to that of their advisors may meet with some resistance. That attitude may be reflected in the amount of time spent by advisors in reading, editing, and helping refine a thesis. It also may be reflected in differences between disciplines regarding if a thesis should be made largely of chapters very similar to published works, or if dissertations should be more book-like, telling an in-depth story of the research undertaken. In the humanities and social sciences, dissertations often are more like a book. In science and engineering there are closer ties to conference proceedings and journal articles.
The complex mix of loyalties relating to publishing of student works seems to be at the heart of concerns raised by faculty regarding NDLTD. While a reasonable solution to these concerns appears to be allowing students and their committee to discuss and agree upon access to each ETD, in the long term it is likely that the answer will depend upon:
NDLTD relates to many issues with an economic basis. For example, during the first six months of the initiative, considerable attention was given to relationships with commercial efforts such as that of UMI (recall Section 2.1 above). Only at the March 1997 NDLTD Steering Committee meeting was it decided that such matters were beyond the purview of the initiative, and that the focus should be education and on maximizing access.
Another basically economic issue is the relationship of ETDs to other forms of publication. If one assumes a zero sum game (which in the context of access to information through the Internet is probably not appropriate), giving more prominence to theses and dissertations might be viewed as threatening to other publication enterprises. On the other hand, theses and dissertations have been produced for over a hundred years, and have supplemented other types of publications without conflict, through a variety of changes in technology. The number who will read hundreds of pages about a topic as opposed to a short summary article is likely to be quite small. It seems unlikely that NDLTD will have a negative financial impact on publishers.
The approval form allows students and faculty to establish restrictions on access imposed by publishers, and those restrictions can be implemented using digital library technology [GLAD97]. It would be beneficial to those in the scholarly community interested in ETDs to reduce such restrictions, however.
Compromises have been agreed upon, so that financial risk to publishers is minimized. In cases where an ETD or part thereof relates closely to an article, delaying worldwide access to the ETD for three months or even a year after the journal article is published is adequate protection. Similarly, in the case of a book that is published which is closely related to a dissertation, blocking outside access to the ETD from the time the book appears, till two years later, is more than adequate protection for publishers, but denies traditional access through interlibrary lending. In short, concerns of publishers, and related concerns of students and faculty regarding economic issues associated with access to ETDs on the Internet, can be resolved. While current solutions maintain an uneasy peace, they must be further negotiated so that economic concerns are addressed in coordination with access concerns for students, educators, and researchers. Ultimately, digital libraries may need to evolve past their binary basis, where access is not either completely free to the world or severely restricted, where charges are not either zero or a very large sum, and where access to student research is not either solely through a publisher or solely through a university.
The future of the NDLTD is continued growth, as concerns are addressed, and benefits increase. We consider three key aspects.
Fundamentally, NDLTD is an effort to improve education while building a digital library and expanding current library services and resources. While many education efforts have focused on how students can learn through accessing a digital library, NDLTD does not solely concentrate on that important issue. It also deals with how students learn by preparing an electronic document and submitting it to a digital library. Further, and key to solving various concerns raised, NDLTD strives to ensure that students are prepared to work with the world of publication.
As the collection of works related to NDLTD increases, log analysis and surveys will be used to determine how ETDs are used in graduate education. During the first year of widespread access to the Virginia Tech collection, the number of downloads per work appeared to be almost two orders of magnitude more than the number of circulations of the library copy. Additional factors to be analyzed include institution, topic, length, and use of multimedia relate to learning, and what measures prove informative: numbers of accesses, professions of those downloading copies, or types of use of ETDs.
True success of the NDLTD depends upon growth of a collection to the scale of hundreds of thousands of works, its ease of access, and the amount of use it gets. Widespread involvement of universities and their students will be determining factors.
In an era where there are increasing political and social pressures on universities to increase efficiency, be more open about their research findings, and share more with similar institutions, such collaboration seems appropriate. As universities see more need to archive their electronic works, and realize the economies of scale that result from cooperative ventures in the electronic publishing and archiving arenas, the type of initiative exemplified by NDLTD may become more commonplace. As has occurred in the context of state and regional library consortia (e.g., VIVA, OhioLink, CICNET), having a large market block to deal with publishers [NORR97] seems likely to motivate agreements, such as those over access policies. In the context of NDLTD, efforts in this direction are likely to expand, based on good experience in the Southeast, especially as interoperability tests proceed.
Access to university information has evolved through various stages, leading to sophisticated programs for interlibrary loan and universally accessible library catalogs. As more use of the WWW takes place in colleges and universities, and as technology advances to better support URNs and electronic archives, it will be easier to move into the realm of fully functional digital libraries. Challenges still remain, regarding federated search, and multilingual access [BORG97]. Efforts like NDLTD are likely to evolve along with the technology, as universities aim to improve education and learn the benefits of collaborative initiatives.
[BORG96] Borgman, C.L.; Bates, M.J.; Cloonan, M.V.; Efthimiadis, E.N.; Gilliland-Swetland, A.; Kafai, Y.; Leazer, G.L.; Maddox, A. (1996). "Social Aspects Of Digital Libraries." Final Report to the National Science Foundation; Computer, Information Science, and Engineering Directorate; Division of Information, Robotics, and Intelligent Systems; Information Technology and Organizations Program. Award number 95-28808. <http://www.gslis.ucla.edu/DL/>
[BORG97] Christine L. Borgman (1997). "Multi-Media, Multi-Cultural, and Multi-Lingual Digital Libraries: Or How Do We Exchange Data In 400 Languages?" D-Lib Magazine, June 1997. <http://www.dlib.org/dlib/june97/06borgman.html>
[FOX96a] Edward A. Fox, John L. Eaton, Gail McMillan, Neill A. Kipp, Laura Weiss, Emilio Arce, and Scott Guyer (1996). "National Digital Library of Theses and Dissertations: A Scalable and Sustainable Approach to Unlock University Resources." D-Lib Magazine, September 1996. <http://www.dlib.org/dlib/september96/theses/09fox.html>
[FOX96b] Edward A. Fox (1996). "Statement About Publications." <http://www.ndltd.org/info/pubs.htm>
[FOX97a] Edward A. Fox (1997). "Letter to Virginia Tech Students Preparing an ETD." <http://etd.vt.edu/submit/letter.htm>
[GLAD97] Henry M. Gladney (1997). "Safeguarding Digital Library Contents and Users: Document Access Control." D-Lib Magazine, June 1997. <http://www.dlib.org/dlib/june97/ibm/06gladney.html>
[KIPP97] Neill A. Kipp (1997). "A scenario from the Networked Digital Library of Theses and Dissertations: The life of an ETD from creation to dissemination." <http://www.ndltd.org/howto/etdlife.htm>
[KIRS97] Matthew G. Kirschenbaum (1997). "Electronic theses and dissertations in the humanities: A directory of on-line references and resources." <http://etext.lib.virginia.edu/ETD/ETD.html>
[LYNC97] Clifford A. Lynch (1997). "The Z39.50 Information Retrieval Standard: Part I: A Strategic View of Its Past, Present and Future." D-Lib Magazine, April 1997. <http://www.dlib.org/dlib/april97/04lynch.html>
[MOOR97] Mariahna Moore (1997). "UVA SEAS Electronic Undergraduate Thesis Pilot." <http://univac.cs.virginia.edu:3066/SEAS_ETD.html>
[NDLTDa] NDLTD Team (1997). "NDLTD in the News." <http://www.ndltd.org/news/>
[NDLTDb] NDLTD Team (1997). "NDLTD Status of Universities." <http://www.ndltd.org/join/status.htm>
[NDLTDc] NDLTD Team (1997). "NDLTD Related Projects." <http://www.ndltd.org/projects/index.htm>
[NDLTDc] NDLTD Team (1997). "Virginia Tech Graduate School Electronic Submission Approval Form." <http://etd.vt.edu/submit/approval.htm>
[NORR97] Bob Norris and Denise Duncan (1997). "Sink or Swim? The U.S. Navy Virtual Library (NVL)." D-Lib Magazine, March 1997. <http://www.dlib.org/dlib/march97/navy/03norris.html>
[PAYE97] Sandra D. Payette and Oya Y. Rieger (1997). "Z39.50: The User's Perspective." D-Lib Magazine, April 1997. <http://www.dlib.org/dlib/april97/cornell/04payette.html>
[WATE97] University of Waterloo Electronic Thesis Project Team (1997). "Terms of Reference and Team Members." <http://www.lib.uwaterloo.ca/~uw-etpt/>
[WITT96] Ian H. Witten, Sally Jo Cunningham, and Mark D. Apperley (1996). "The New Zealand Digital Library Project." D-Lib Magazine, November 1996. <http://www.dlib.org/dlib/november96/newzealand/11witten.html>
The U.S. Department of Education's Fund for the Improvement of Post Secondary Education supports NDLTD. Additional in-kind has been provided by: Adobe, Arbortext, Council of Graduate Schools, Coalition for Networked Information, IBM, OCLC, SOLINET, and SURA.