[Mirrored from: http://www.bampfa.berkeley.edu/geninfo/behindscenes.html]
The search engine we use for this search is the same one we use for many other projects (listed below). This search is just a basic search of the webpages on our site, much like you find on many websites today. The difference is that this search engine does a lot more than just one job, as you'll see. The search engine is called, Isite. It's free (!) It can be complex to customize, but for what it does at that price, it's a bargain! Isite can handle many types of information types from simple text files, to marked up HTML files, to complex SGML files or MARC library records. So it can act as a full-text search engine or like a database in the case of MARC records, etc.
It's very useful to be able to use one tool for the delivery of many types of information, it eases support considerably. Isite is a delivery tool exclusively; it is not a database in the sense that one does data entry into it directly - you stil have to use your collections managment system, library cataloging system, or SGML authoring system to create documents, but once made they can be put on your server along with Isite for delivery via the Internet. This is also handy for an institution with a limited budget in that the tools you use for the original authoring need not be too complex or costly themselves; you do data entry in your existing tools, then export the info to put on the server and let Isite do the public delivery. It does a good job of this with it's combination of full-text and fielded searchging, complex search methods (boolean), and relevance ranking method (that is it can find records *close* to your search as well as those that match *exactly* your search (the former method is used by most web full-text search engines, the latter by databases, both have advantages). Isite supports Standards by supporting standard record formats such as SGML or MARC, and it supports the Z39.50 standard for network query. Standards are important in that they allow your data to be very flexible in it's use, for sharing your information with the larger information universe (look at the web for instance, which is mostly based on standards!) and for greater longevity of your data.
These guides present our collections information in a context for greater understanding of them than an object-level database alone would provide. We wanted two things when looking for a way to present primary collections information - use of standards so that this information would have lasting value and maximize the work put into it; and the maximum flexibility and ability to share this information since it is the core upon which we will build. The method we found that fit all the above needs was the EAD. SGML (Standard Generalized Markup Language) is an international standard for encoding full-text and richly structured information. The EAD (Encoded Archival Description) is an implementation of SGML intended to encode information which describes collections. So the EAD both uses standards, and itself contains a set of tags for describing collections on an object level, but also for describing entire groups of objects and for including full-text such as biographies and essays.
To author the document, we use Author/Editor, and SGML author program from SoftQuad. We use A/E to mark up the full text essays mostly and to validate that the markup is correct. But encoding all the individual object records would be too time-consuming if we did them individually, so for those we go to our collections management database and create a calculation field. In this field we tell the database to gather all the fields from a record and automatically insert the SGML/EAD tags where appropriate, so it might look like:
When you view our EAD-encoded collection guides on the website, you are actually viewing a single SGML document. The software we use to display it on the web is Isite (above). We've customized this software a little, so that it translates the appropriate SGML tags into HTML tags for viewing in normal web browsers. We've also set it up so that when you choose to view the "biography" or "collection records" Isite goes and gets that "chunk" of the collection guides, and brings it to you instead of the whole thing. This makes it much easier and faster to view on the web. In addition, when viewing the collection object records we've set it up to show only the first 6, then provide a link to the following 6 and so on, thus overcoming the problem of trying to display 600 records when the user clicks on "browse the entire collection"! We also did not have to arbitrarily encode the records into segments to do this; the records are marked up normally, and the software just shows a few at a time.
These film notes were originally stored as hundreds of Microsoft Word files. You can imagine how this was to manage or, perish the thought, find an individual film note! We thought the best way to provide searchable access to them would be to use....SGML again. This time there was no existing standard for which set of tags to use (like with the EAD above) so we created our own SGML DTD (Document Type Definition). SGML can be very complex and rich with features, but it can be implemented quite simply too, as with this case, to great benefit. This was just a very simple set of tags which we would use to describe different parts of the filmnote, such as director's name, date of film, country of film, film title, etc. The notes were somehwhat homongeneous in their format, so we were able to write macros in Word that went through each file and inserted the markup automatically. We then checked them manually for correctness, and saved them as text/SGML files.
Then we put them on our web server and again, used the search engine Isite to deliver them on the web. As with the collection guides above, when you view an individual filmnote you are actually viewing a small part of a larger SGML file which usually includes about 80 filmnotes (two months worth of screenings). The software is going and getting just that chunk of text, identified by it's SGML markup, and bringing it back. To see the whole file, view the entire calendar instead (ie. apr-may.1997.filmnotes).
CineFiles is an image database of documents. Each document is taken from its folder and scanned as an image file (instead of OCR). Then a record is created in a Sybase relational database for that document, and the image is linked to that record. The record consists of cataloging info on the film or person indicated in the article, with links to authority files such as the Library of Congress Subject Headings for improved access. When you search for a subject or document, you are searching the records in the Sybase database, which then brings you the image of the document as well. Text was not used in this project because most of the documents were not published by the BAM/PFA and so a copyright agreement had to be reached with the publishers.
This bookstore database does not use SGML or Isite, what a suprise :) This was the first database we connected to our website. It is a FileMaker Pro 3.0 database for Macintosh that we've connected using a free cgi program called ROFM. This data changes much more rapidly than collections data, and includes deletions as well as additions, so we needed a "live" connection of the working database to our website so that the store info was always up to date. This database runs off a Mac webserver rather than our main Unix server (which runs Isite, etc.)
For the audio, video and the VR (really just 3-D) sculpture we used one tool: QuickTime. The reasons we chose QuickTime are that it is cross-platform, and can play on at least Mac and Windows machines. The plug-in is free, and the format is a common one on the web and in multimedia, thus increasing the chances that people will already have the capability to deal with QuickTime. It's also multi-faceted; many file types are able to be saved and delivered as QuickTime: audio, video and VR (known as QTVR). So this means that it's easier for us since we use Macs a lot in authoring for our website, the tools for saving into QT are often the same. In addition, our viewers on the web only have to have one plugin to view all the multimedia forms on our site: QT, rather than making them download a new plugin for audio, another for VR, etc. Using the QuickStart feature of QuickTime, we are able to process these files further so that they are optimized for the web and will actually start playing before they are entirely downloaded, thus providing us with a pseudo-multimedia streaming environment, even though we do not have any special (read: costly) server software.
QT, although common and useful, is proprietary like many of the multimedia tools for the web today, so it would not be the format we'd want to store our "master archive" files in, but for delivering it on the web, it's a good solution for us for now.
This is our internal guide for scanning images of art in our collection to be used for screen-based publishing mainly (web, kiosk, CD-ROM) but they could also be used for mid-range print publications. These guidelines were developed in-house for our needs, and should not be considered a raw template that will match the needs of all institutions. However, in trying to figure out how to best go about scanning our images we consulted a variety of resources, including a number of other museums who had bravely mounted their own guidelines on the web (special thanks to the National Museum of American Art and Library of Congress for helping us all out!) so we thought it only fair to return the favor.