[This local archive copy mirrored from the canonical site: http://javalab.uoregon.edu/ser/software/docproc_2/ snapshot 980318; links may not have complete integrity, so use the canonical document at this URL if possible.]
Abstract | docproc is a software package that provides processing and layout of XML documents based on XSL scripts. docproc is written in pure java, and can be used as a server-side preparser for serving XML documents on the web. |
Table of Contents |
|
New |
This file was last modified on 07-Feb-98.
08 Feb 98
An email from Chris Lilley has prompted me to clean up the HTML docproc
generates, and I'm trying to get the output to at least pass through a
validator unmolested. One of the things I've had to do is hack an addition
to the SCROLL layout object. If you create a scroll object, you can
give it the following tags, which will be inserted into the header of
the HTML document:
|
Why use docproc? | docproc is one way of solving the problem commonly known as static HTML pages. When a web site is designed around HTML pages, the person writing the content is also usually the person laying out the web pages, designing not only what the page says, but how it looks. Often, it is preferable to separate these two jobs, allowing the content expert to author the content, and a layout professional to design the layout. Separating these jobs is difficult with static HTML. Another problem appears when the web site is upgraded, or a change in the layout is desired. At some point, someone must be assigned the tedious task of going through every web page at the site and changing the HTML so that all pages look the same, or a site must suffer from a lack of constant look-and-feel. Inevitably, there is a period when the site is in a state of flux, and often some pages are missed. There are two general solutions to static HTML pages. The first is for a site to purchase a site management package. Such a package normally contains a database connection, a template format, and a web server package that combines the two. When the web server receives a request for a document, the server software finds the data in the database and inserts it into the template, serving a dynamic HTML page to the client. The content experts can then add content into the database without worrying about layout, and the layout experts can design the web site layout while ignoring the content. The second solution involves a translation medium. With this solution, the content authors use a markup language to specify the type of information without specifying the layout. This is, in fact, a form of database entry, except that the document itself serves as the database. docproc follows this model, using XML as the markup language. Both of these solutions are very similar. Both are dynamic and allow special processing before serving the HTML page, such as inserting dynamic data (current stock quotes, time of day, etc) and client-specific processing. Both allow the layout to be changed without altering the content source. The database approach has the advantage that the content can normally be authored with any software (WordPerfect, etc), while a special converter adds the data into the database. This makes it easier for data entry. The database approach also costs money and requires significant infrastructure (if you know of a software solution that is freely distributable, please let me know). XML has two advantages over the database: the data stands on its own, and there is infrastructure for converting XML into formats other than HTML. The two solutions may be combined. docproc will attempt to do this in a later version, leveraging jDB, a database authored in pure Java by yours truely. |
Requirements |
docproc is written as a servlet, and will consequently run with any
HTTP server software that supports servlets. The most obvious server,
therefore, is JavaSoft's Java
Web Server; free for noncommercial use.
|
Usage |
docproc can be used in two different ways. The first, and ideal,
method is to use docproc as a servlet; the other way to use docproc is
to call it by hand on documents that you want to reformat.
Using docproc in an HTTP server How you install docproc as a servlet really depends on your HTTP server installation. For a more general overview on your servlet options, see my Hunt for Web October (I know, stupid name) document. As of 9.1.98, it is still being modified. The Java Web Server, by default, has all of its servlets in a file <server home>/servlets/. The easiest thing to do is copy XMLServlet.class into this directory. Place the lark.jar, ser.jar, and pnuts.jar archives somewhere in the classpath of the web server - this may require modifying the httpd startup scripts, if you use them. Also make sure that the directory containing the HTML.class, ASCII.class, and other backends are also in the classpath of the web server. Once this is done, go into the administration tool, manage the Web Service, go to Servlet Aliases and add a new entry with "Alias" = *.xml and "Servlet Invoked" = XMLServlet. You may also want to add an entry in "MIME Types" for XML ("Extension" = xml, "Type" = text/xml), but this isn't strictly necessary. Then go into the Servlets section and add a new Servlet with whatever you want for the description, and the class name "XMLServlet". You'll probably want it loaded at startup, so check that option. Close the window and restart the server. If this document (not the HTML conversion, but the original XML document) is accessible through your HTTP server, you should now be able to reference this document and get a formatted HTML file served to your client! Nexus is a little web server authored by Anders Kristensen of HP. The original Nexus web server is fairly usable, although it lacks some of the bells and whistles. It is very cleanly coded, very small, and very fast. You may find easy extension worth the effort. Nexus was the only web server that was close enough to support URI filtering that I found outside of the JWS, and I've installed most of the non-commercial ones. The original Nexus didn't support filtering, but I've added that support, and with Anders' permission, will make the distribution available for download. Email me if you are interested. Installation of docproc on Nexus is even easier than on JWS. Simply make sure that the docproc jar files and the XMLServlet class are in the class path when you run Nexus, and add an entry to the servlets.conf file: XMLServlet { code "XMLServlet", path "*.xml" },and you're off! Other web servers are more difficult or impossible to configure. The problem is that the servlet API is not consistantly implemented across all servers, and also that most servers don't support document filtering in this manner. Your web server must not only support servlets, but must allow you to filter a document request through a servlet, not just allow you to invoke a servlet by name. If you find out how to make Apache do this, please let me know. Using docproc from a CLI Here are the instructions for using docproc by hand. A UNIX shell script is provided for ease-of-use; call it with the -h option for a list of options. These instructions describe the details of invoking docproc. Make sure that your Java VM can find the textuality, Pnuts, and ser packages. The easiest way to do this is to include the three jar files in your classpath. Since the backends are not included in any package, also make certain that the VM can also find the backends. The two backends supplied with docproc at this point are a nearly fully-funtional HTML.class, and a partially functional ASCII.class. A LOUT.class is in the works. An example CLI usage is: java ser.nexus.docproc -t HTML index.xml > index.htmlTo get a help list describing docproc usage, use the -h option: java ser.nexus.docproc -h |
Mailing list | There is now a mailing list for docproc. At this point, this mailing list covers both docproc and docproc_2. To subscribe to the docproc mailing list, send an empty message to ser-docproc-subscribe@javalab.uoregon.edu. This is a qmail mailing list, with limited frills. I've been mucking about with various extensions to the QMail package, and the list server is a little more robust than it was before. You may send an email to the help server for docproc for a list of listserver capabilities. If you subscribed to the mailing list before, please subscribe again. I didn't have the listserver configured correctly before, and none of you are on the list at this point. Sorry for the inconvenience. |
Completed |
This section itemizes ways in which docproc deviates from the XSL specification.
If you find anything that is on the following
list that doesn't work correctly, please send me a bug report.
The following list is based on the
27 August 1997 XSL
proposal. Each item lists a section in the XSL proposal which
has been implemented, excepting gross descriptions of how XSL works.
Specifics start at secion 2.2.
|
To Do |
This is an incomplete list of things left to do and of items I'd like
comments from the users on. If you find anything which is incomplete
and is not on this list, please let me know.
|
Known bugs |
|
Getting docproc |
Since the Lark and Pnuts archives are so large, I'm making two distributions:
a full distribution with everything docproc needs to function, and a minimal
distribution intended for upgrading, which does not contain the Lark or Pnuts
archives. Also, I've discontinued the tar/gzip distributions, since anyone
running docproc has access to the jar tool.
|
License | This package is freely distributable, providing that the package is redistributed in original form. Value-added resellers and people intending to make money leveraging off this software should contact me first. I don't have anything against making money. Educational institutions are exempt from this as is private (personal) usage. Contact me if you feel you fall into a grey area. All sourcecodes are supplied. Feel free to spread the word and point people in my direction. |
Regarding scripting | Scripting has turned out to be enough of a problem as to require a section of its own. It was also a very large section, so I've moved it here. |
Extending docproc | docproc can be extended by adding backends, or by improving the backends I've provided. Look at the sourcecode for the HTML and ASCII classes for an idea of what needs to be done. |
Notes |
If you use docproc as a servlet, please note that XMLServlet takes advantage of caching, and caching only notices changes to the XML source, not the XSL source. Therefore, if you edit your XSL style-sheet but don't change your XML document, you probably won't notice any change when you reload the document in your browser. The solution is to "touch" the XML source (update the timestamp on the XML source file) before reloading.
If a System property named xsl-path is set, this path will be used to search
for XSL scripts in. This should be a comma separated series of paths.
xsl style sheets are searched for in the following order:
|
Examples | This document was generated from this XML source, using this XSL style sheet. This style sheet is a good example of several docproc abilities, including non-trivial pattern matching, extended scripting, and complex layout. This style sheet does not take full advantage of XSL supported by docproc. |
Characteristics Value Types |
The following table describes the mapping of DSSSL types to Java
types that are stored in the Characteristics class.
Types |
Credits |
This package would not function were it not for two resources, and derives
its most useful ability from a third:
|