Behind the Scenes

[Mirrored from: http://www.bampfa.berkeley.edu/geninfo/behindscenes.html]

Behind the Scenes at the BAM/PFA WebSite and New Media Projects

This page is an overview of some of the nuts and bolts behind our website; tools and procedures we use to digitize and deliver our content. It is intended to be a resource for other museum or non-museum web developers. We do not suggest that this is the only way to go about things, but it is always helpful to us to find information about such procedures in the field online, so we thought we should return the favor. Our priorities are close to those of must musums in digitizing their content; it has to be an inexpensive solution, it has to be scalable, able to grow and include changes in technology, simple as possible due to lack of human resources for extensive support, and deliver content in flexible ways that are easy to use for the end-user, to support maximum access. Projects are listed below separately. Contact BAM/PFA Information Systems Manager, Richard Rinehart with questions regarding these projects.

WebSite Search
This will search the full-text of all of the pages on our website. To find resources of a particular type, use one of the resources below.

The search engine we use for this search is the same one we use for many other projects (listed below). This search is just a basic search of the webpages on our site, much like you find on many websites today. The difference is that this search engine does a lot more than just one job, as you'll see. The search engine is called, Isite. It's free (!) It can be complex to customize, but for what it does at that price, it's a bargain! Isite can handle many types of information types from simple text files, to marked up HTML files, to complex SGML files or MARC library records. So it can act as a full-text search engine or like a database in the case of MARC records, etc.

It's very useful to be able to use one tool for the delivery of many types of information, it eases support considerably. Isite is a delivery tool exclusively; it is not a database in the sense that one does data entry into it directly - you stil have to use your collections managment system, library cataloging system, or SGML authoring system to create documents, but once made they can be put on your server along with Isite for delivery via the Internet. This is also handy for an institution with a limited budget in that the tools you use for the original authoring need not be too complex or costly themselves; you do data entry in your existing tools, then export the info to put on the server and let Isite do the public delivery. It does a good job of this with it's combination of full-text and fielded searchging, complex search methods (boolean), and relevance ranking method (that is it can find records *close* to your search as well as those that match *exactly* your search (the former method is used by most web full-text search engines, the latter by databases, both have advantages). Isite supports Standards by supporting standard record formats such as SGML or MARC, and it supports the Z39.50 standard for network query. Standards are important in that they allow your data to be very flexible in it's use, for sharing your information with the larger information universe (look at the web for instance, which is mostly based on standards!) and for greater longevity of your data.

Online Multimedia Collection Guides
These in-depth guides are comprised of object records and essays by curators and scholars including overviews and organization of collections, as well as artists biographies and historical context of their creation and collection.

These guides present our collections information in a context for greater understanding of them than an object-level database alone would provide. We wanted two things when looking for a way to present primary collections information - use of standards so that this information would have lasting value and maximize the work put into it; and the maximum flexibility and ability to share this information since it is the core upon which we will build. The method we found that fit all the above needs was the EAD. SGML (Standard Generalized Markup Language) is an international standard for encoding full-text and richly structured information. The EAD (Encoded Archival Description) is an implementation of SGML intended to encode information which describes collections. So the EAD both uses standards, and itself contains a set of tags for describing collections on an object level, but also for describing entire groups of objects and for including full-text such as biographies and essays.

To author the document, we use Author/Editor, and SGML author program from SoftQuad. We use A/E to mark up the full text essays mostly and to validate that the markup is correct. But encoding all the individual object records would be too time-consuming if we did them individually, so for those we go to our collections management database and create a calculation field. In this field we tell the database to gather all the fields from a record and automatically insert the SGML/EAD tags where appropriate, so it might look like: title of artwork in this record gets inserted here date of artwork goes here Then we export this field (which now has all the info from all the fields, and all the EAD tags) as a large text file which includes the same markup for all records in the database. Then we stick this underneath the manual markup we did above, merging both in one large text/SGML file and we're done! (for that one:)

When you view our EAD-encoded collection guides on the website, you are actually viewing a single SGML document. The software we use to display it on the web is Isite (above). We've customized this software a little, so that it translates the appropriate SGML tags into HTML tags for viewing in normal web browsers. We've also set it up so that when you choose to view the "biography" or "collection records" Isite goes and gets that "chunk" of the collection guides, and brings it to you instead of the whole thing. This makes it much easier and faster to view on the web. In addition, when viewing the collection object records we've set it up to show only the first 6, then provide a link to the following 6 and so on, thus overcoming the problem of trying to display 600 records when the user clicks on "browse the entire collection"! We also did not have to arbitrarily encode the records into segments to do this; the records are marked up normally, and the software just shows a few at a time.

PFA Filmnotes Online
These are film notes from the PFA exhibition calendar, documenting a wide range of types of films, including: foreign, independent, classic, and avant-garde cinema. This searchable text resource contains over 12,000 filmnotes written between 1979 and the present.

These film notes were originally stored as hundreds of Microsoft Word files. You can imagine how this was to manage or, perish the thought, find an individual film note! We thought the best way to provide searchable access to them would be to use....SGML again. This time there was no existing standard for which set of tags to use (like with the EAD above) so we created our own SGML DTD (Document Type Definition). SGML can be very complex and rich with features, but it can be implemented quite simply too, as with this case, to great benefit. This was just a very simple set of tags which we would use to describe different parts of the filmnote, such as director's name, date of film, country of film, film title, etc. The notes were somehwhat homongeneous in their format, so we were able to write macros in Word that went through each file and inserted the markup automatically. We then checked them manually for correctness, and saved them as text/SGML files.

Then we put them on our web server and again, used the search engine Isite to deliver them on the web. As with the collection guides above, when you view an individual filmnote you are actually viewing a small part of a larger SGML file which usually includes about 80 filmnotes (two months worth of screenings). The software is going and getting just that chunk of text, identified by it's SGML markup, and bringing it back. To see the whole file, view the entire calendar instead (ie. apr-may.1997.filmnotes).

CineFiles
A database of reviews, press kits, festival and showcase program notes, newspaper articles, and other documents from the Pacific Film Archive Library's clippings files. The files contain documents from a broad range of sources covering world cinema, past and present.

CineFiles is an image database of documents. Each document is taken from its folder and scanned as an image file (instead of OCR). Then a record is created in a Sybase relational database for that document, and the image is linked to that record. The record consists of cataloging info on the film or person indicated in the article, with links to authority files such as the Library of Congress Subject Headings for improved access. When you search for a subject or document, you are searching the records in the Sybase database, which then brings you the image of the document as well. Text was not used in this project because most of the documents were not published by the BAM/PFA and so a copyright agreement had to be reached with the publishers.

Browse our BookStore
Looking for that hard to find book on art, cinema, or cultural criticism and theory? Search the BAM/PFA Store's database of over 2000 books to find what you are looking for, then order it from our WWW site!

This bookstore database does not use SGML or Isite, what a suprise :) This was the first database we connected to our website. It is a FileMaker Pro 3.0 database for Macintosh that we've connected using a free cgi program called ROFM. This data changes much more rapidly than collections data, and includes deletions as well as additions, so we needed a "live" connection of the working database to our website so that the store info was always up to date. This database runs off a Mac webserver rather than our main Unix server (which runs Isite, etc.)

Get the Picture!
An Online Interactive Multimedia Guide for Kids to the UC Berkeley Art Museum. Rather than go into every aspect of how we created this guide, I'll just menion the two main tools we used for multimedia: Java and QuickTime.

Many pages use a Java Script to enable quick image-switching. We found a JavaScript on the web, and used it with permission, customizing it for each page. The nice thing about this JavaScript is that nothing absolutely depends on it; it senses whether the viewer has a java-capable browser, and if not, then it just skips the script and does nothing. We've provided more traditional options, such as click-through, for those cases. This means we're not "blocking out" folks with older viewers or text-only readers.

For the audio, video and the VR (really just 3-D) sculpture we used one tool: QuickTime. The reasons we chose QuickTime are that it is cross-platform, and can play on at least Mac and Windows machines. The plug-in is free, and the format is a common one on the web and in multimedia, thus increasing the chances that people will already have the capability to deal with QuickTime. It's also multi-faceted; many file types are able to be saved and delivered as QuickTime: audio, video and VR (known as QTVR). So this means that it's easier for us since we use Macs a lot in authoring for our website, the tools for saving into QT are often the same. In addition, our viewers on the web only have to have one plugin to view all the multimedia forms on our site: QT, rather than making them download a new plugin for audio, another for VR, etc. Using the QuickStart feature of QuickTime, we are able to process these files further so that they are optimized for the web and will actually start playing before they are entirely downloaded, thus providing us with a pseudo-multimedia streaming environment, even though we do not have any special (read: costly) server software.

QT, although common and useful, is proprietary like many of the multimedia tools for the web today, so it would not be the format we'd want to store our "master archive" files in, but for delivering it on the web, it's a good solution for us for now.

Procedures for digitizing images from the collection

This is our internal guide for scanning images of art in our collection to be used for screen-based publishing mainly (web, kiosk, CD-ROM) but they could also be used for mid-range print publications. These guidelines were developed in-house for our needs, and should not be considered a raw template that will match the needs of all institutions. However, in trying to figure out how to best go about scanning our images we consulted a variety of resources, including a number of other museums who had bravely mounted their own guidelines on the web (special thanks to the National Museum of American Art and Library of Congress for helping us all out!) so we thought it only fair to return the favor.