Applying Information Technology to Court Case Files: Report of the Utah Electronic Filing Project

by Alan Asay(note 1)
September 1994

Nothing is too good to be true.
-Michael Faraday

[Mirrored from: "Applying Information Technology to Court Case Files: Report of the Utah Electronic Filing Project]

Technology is nothing more daunting than the means by which we do things. Pen, ink, and paper are a means of capturing information into writing, and so is a word processor. Books are a means of storing information, and so are CDROMs. Nowadays there are usually several technological approaches to doing the same thing. The choice between the alternative technologies usually hinges on the resources each alternative requires and its suitability to the task at hand.

Available technological alternatives have increased rapidly in the past few decades, yet judicial information systems still use an only slightly updated variation of pen-ink-and- paper technology. Court databases have automated tracking and accounting processes collateral to the core judicial process of adjudication, but courts have scarcely explored any non-paper technological possibilities for the case file, the informational centerpiece of litigation and adjudication. The case file remains essentially as it was in the days of plea rolls and yearbooks,(note 2)even though the processes that the case file supports are the central and most expensive functions of the court. Absence of change is not, in itself, a reason to change, but the lack of inquiry into case file technology prompts the question whether, in the intervening centuries of ink and paper, a new and better means of keeping and using case file information may have emerged.

The Utah courts have answered that question in the affirmative and developed a computer-assisted system for receiving filed litigation documents via electronic mail, automatically updating the court database and accounting systems, and storing the electronic documents in a paperless case file preserving the utility of paper but not its disadvantages.(note 3) The pilot project has confirmed that electronic filing of court documents and paperless case files are feasible and advisable. This paper examines case files and case file processes, and evaluates various technological methods for processing case files.

A Close Look at the Case File

To fit a technology to case files, one must thoroughly understand what a case file is and how it is used. Since the case file documents the case, the information it contains is the very stuff of the litigation, the accumulated exchange of information to judicially resolve a dispute. Understanding a case file requires genuine understanding of litigation and, from the court's perspective, of adjudication. Designing a tool for use with case files without such an understanding would be like designing a tool for dismantling automotive transmissions with only a slight, superficial, or biased idea of how transmissions work.

The point of litigating is to persuade the court toward one's desired outcome. To persuade, one can appeal to the past, that is, to settled rules or precedent cases, or to a prior enactment by a sovereign. However, the law is more than past acts of the powerful; "it is revolting to have no better reason for a rule of law than that so it was laid down in the time of Henry IV."(note 4) Social policy, natural law principles, individual experience, subjective emotions, and other factors all play a role along with the past in reaching judicial decisions.(note 5)

Whatever the efforts to persuade the court, each issue of the case must resolve itself into a decision and usually a justification, a "reasoned elaboration"(note 6)of the decision. Much of the most resource-intensive work of the courts consists of deciding litigated issues and expressing the reasoned elaborations of those decisions. The arguments of the litigants as well as the many books of the law aid greatly in formulating and expressing the decision and its reasoned elaboration.

This sort of informational work differs markedly from the work assumed by nondocumentary information models such as a spreadsheet or database. Reasoning, weighing values, justifying, and persuading have little in common with the calculation of interdependent numbers, for which a spreadsheet is useful. They also have little in common with the categorizing, counting, and tracking functionality of a database.(note 7) Word processors are useful for expressing decisions and reasoned elaborations. However, word processors are designed to create documents, not to find and utilize existing documents, so they are not optimized to present the arguments of the litigants or the statutes, precedents, and commentary needed to decide the case and justify the decision.

These processes of litigating and adjudicating, and the culture and discipline that have grown up around them, have little in common with the culture and discipline of information technology, and the conventions that informatics culture is most accustomed to advocating or implementing. For at least two decades, relational databases have been where the action is in information technology. The contents of the world's computers now consist of more databased information than anything else. Most of the world's information is in documentary form,(note 8) but the use of computers for documents is limited mainly to creating them and printing them out; once printed, they are stored in and utilized from millions of square miles of file cabinets, libraries, records rooms, and desks.

The database notions predominating in information technology do not fit legal information well, mainly because open texture, policy- and value-laden ideas, the human experience captured in one's memory, and the richness of natural language are all inconsistent with a database approach. Databases process discrete, categorizable, concrete things, but such things are not the substance of much of the case file; rather, its informa tion consists of facts that fit no preconceived pattern and constantly evolving abstract concepts and principles bounded in nothing more definite than open texture,(note 9)the great bugaboo of efforts to apply artificial intelligence to legal thinking.(note 10) Definitively stating a legal rule is problematic, and even when stated, the statement may well contain the law's indispensable weasel-words, terms like "reasonable", "material", or "timely", words which convey the interpreter to spaces of open texture without which the law would lose its roots in equity, natural law, and public policy. Since litigation information lacks discrete, categorizable items, quantification occurs, if at all, only at the last minute, when abstract justifications are defuzzed into a finite award in an instant case. Or quantification comes after the fact, when statisticians tally up the numbers of this or that sort of case, counting but not accounting for litigation's aftermath.

Database technology has difficulty dealing with fuzzy, analog concepts and the natural language in which they are typically expressed.(note 11) The atoms of the digital universe are discrete items, a zero or a one, with nothing allowed in between those harshly bipolar extremes. Building from the bits, a conventional database fits its universe into categories defined for the database in its "data dictionary". A shape is either round or square; in looking at a cone one must ask whether it is more like a round thing or a square thing. Never mind that the further one gets off the surface of the law, the more arbitrary any categories appear, and the more impossible seems the process of categorization or "normalization", the bootstrapping from analog to digital by clerical workers fitting the real world into preconceived categories.(note 12) In the case file, the law is a major, defining part of the content, and the technological model cannot assume that any legal idea fits within any preconceived digital strictures. The arbitrary clarity of an artificial technical conceptualization cannot be superimposed on the law without severe distortion.(note 13)

Besides losing the ability to accurately handle open texture, a contrived system of categorized or "normalized" ideas would limit the richness, flexibility, and persuasiveness of natural language in expressing ideas and argumentation.(note 14) Natural language is the principal means of communication in the case file, and not without reason. It is the principal means of abstract discourse generally, in fact, it is the predominate medium for an estimated 90% of all information.(note 15) It is not to be disparaged for its potential vagueness; vagueness is the stuff of which the law's open texture is made.

Natural language is also the only available set of terms capable of describing all the facts that find their way into court cases. Not everything in the case file is legal argumentation; much of it is factual. The factual contents of the case file cover the whole gambit of human experience-anything people can dispute they can litigate. There is no database data set as large as a natural language vocabulary, and a vocabulary that large is needed to cover everything from DNA testing in a paternity case to the materiality of a contractual breach in a failed financial swap transaction.

Applying conventional information technology, pervaded as it is by a database orientation, to litigation, adjudication, and the case file requires at least creativity, cultural sensitivity, and a sense of where the limits of the database model are. Moreover, since cultures, including the information technology culture, tend toward catechisms and orthodoxies, rather than real, hard thought, doing something radically new like automating case files requires the ability to appreciate the received technological culture for what it is and no more, and a willingness to investigate unconventional information paradigms.

I have found that willingness, thoughtfulness, creativity, sensitivity, and consciousness of limits to be rather rare among technologists. More the norm is a disregard of jurisprudence and the wholesale(note 16)imposition of a database model on litigation information, a feat of thought-imperialism so stunning that one would not expect to ever experience it. Yet it is commonplace.

Case File Processing

A case file is an accumulation of paper documents physically delivered to the court or, in fewer cases, generated by the court. Clerical workers manually append incoming paper documents to paper folders containing more paper. They also identify and deliver documents needing judicial attention to the appropriate judge. Case files not in use are stored securely on shelves or in cabinets. However, a stored case file is a useless one; to be utilized, a case file must physically go where it is needed, usually leaving only traces and fragments of its information elsewhere in the court. If a case file document needs to be copied, a clerk takes apart the paper file, extracts the document, and photocopies it. Since copying is rather difficult and physical storage space is limited and expensive, backup copies of case files are rare, although some courts microfilm incoming documents.(note 17)

The prospects for improving this case file process through computer-assisted information technology include:

Computers can move documents into court faster and less expensively than paper carrying systems such as the Postal Service or couriers, and with superior security.

Computers are faster and less expensive than humans in doing the step-and-fetch work of document retrieval, including branching by references from one document to others and from them to still others, functions often termed "hypertext". Hypertext functions enable a reader to point a computer mouse at a citation, click the mouse button, and automatically look up the cited work, saving the time of pulling paper files and volumes off shelves, flipping pages, then replacing the files and volumes.

Computers can also search out relevant passages better in situations where no citation points the way. Searching by computer for words or phrases has become a widely accepted technique for locating a critical needle in a haystack of text.

Because computers copy information easily and rapidly, they greatly reduce the bother of tracking paper file custody and coping with lost files.

Because computers communicate with each other well, they enable document retrieval and copying from remote locations. Remote access to the court's official case file benefits lawyers; an inexpensive, mass-marketed communications link can bring a court's case files onto a lawyer's desktop.

Computerized documents require less physical space to store than paper.

Computer-readable documents can interact with other computer-based systems. For example, a document giving notice of an upcoming hearing could interact with a computerized calendaring system. Documents initiating a criminal or divorce case are often packed with data gathered for demographic or criminal history purposes; in electronic form, such a document could transfer data into a database without human data entry and consequent errors, high cost, and time lags.

Realizing all of these prospects was the objective for the Utah electronic filing system, an objective intended to support litigation and adjudication without altering their essence. Like replacing a pencil and pad of paper with a word processor, the electronic case file envisioned for Utah is simply a better means to the same end.

Having considered the nature of the case file and how the courts could process it better, Utah developers sought a technological method for automating case files.

Technological Models for Case Files

Information technology is mostly mass-produced nowadays. Rarely will an enterprise invest the resources required to develop and support custom software when so much good quality, mass-produced software is available.

Mass-producers, including software vendors, group their products into categories fitting certain typified needs, much like automobiles divide into categories such as sedans, pickup trucks, and all-terrain vehicles. Each software category is based on a different conceptual model of the information it is designed to handle. Accounting information fits the category handled by spreadsheet software. Documentary information takes form in a word processor. Relational database software assumes information comprised of many inter related pieces, like the inventory of a manufacturer or retailer, or the checks clearing a bank. Choosing off-the-shelf software for a real-world information need thus requires thought about how to fit that real-world need into a product category's model of the information.

The assumed information model of mass-produced software determines the outer bounds of what the software can do. A spreadsheet, for example, is great at calculating but much less suitable for quick, good-looking text formatting. Word processors are good at documents, but using a word processor to keep track of a radio station's compact disk collection would be like using a wrench to drive a screw it can be done, but it's a lot of work because the a wrench is the wrong tool for the job. Functionality is a consequence of design according to a preconceived model; word processors don't offer much for a compact disk collection because counting, categorizing, and tracking many items are not very important for the documentary information the word processor was designed to handle.

The technological approaches most likely, or most heavily touted,(note 18)to fit case file processing are imaging, Electronic Data Interchange (EDI), and Standard Generalized Markup Language (SGML). In Utah, we examined other alternatives as well, but considered these three the most likely prospects.


The most straightforward approach to automating case files is to take a computer picture of documents as they come into court and store the pictures.(note 19) The device for taking a computer picture is called a "scanner" and works much like a photocopy machine, except that its output is not on paper but rather in a computer. Paper to be copied is placed on the scanner, and the scanner records the image from the paper as a matrix of many infinitesmal dots. The picture of the page can be displayed on the monitor screen.

In Utah, we considered imaging, but rejected it as the principal(note 20) approach for the following reasons:

The computer image has limited utility, unless it is converted into text.(note 21) Automatically converting it into text is difficult, especially if fonts are allowed to vary and stray marks or specks appear on the paper.(note 22) It is burdensome to litigants to forbid all errant marks and mandate the use of printers supporting required fonts with a degree of precision sufficient to enable error-free optical character recognition.

Imaging does not automate the path to the courthouse, at least, not in an efficient way. Part of the streamlining we wanted to accomplish in Utah was to eliminate the delay of paper mail and the expense of couriers. Imaging assumes that the document is on paper, then converts it to computerized form. Since imaged docu ments form large computer files, they are cumbersome to transmit over wide-area links; replacing the paper path with image transmission is therefore problematic. Transmitting the image of a 10-page document over a 9600 bits-per-second link, a predictably common scenario under current technology, would take an uncomfortably long time.

Imaging requires a large amount of resources. Storing a document's image requires much more disk space than storing its text. Presenting the image also requires more much more memory and processing power than presenting text.

Imaging does not go very far toward eliminating the archival drawbacks of paper. Imaging presumes that a document still must be printed onto paper at the law office, and a paper document is still received into court and must therefore be archived. In fact, the paper would probably be considered the original and the image a copy, so the evidentiary preference for the original would necessitate keeping it accessible.(note 23) Imaging is thus not really a replacement for paper, but rather a duplicate of it. Keeping duplicate case files does not reduce the effort and expense of maintaining the information.

However, despite these drawbacks, imaging is a good way to handle two situations, unavoidable paper and evidentiary paper.

Sometimes paper input into the electronic case file will be unavoidable; it is unrealistic to expect a wholly paperless stream of incoming case file documents. Mandating that all documents arrive in court in electronic form would disadvantage litigants who lack computer capabilities.(note 24) Thus, even when electronic filing via electronic mail is widely accepted and preferred, some continued flow of information on paper is inevitable. Although electronic filing will foreseeably be the most common way of filing in court, documents could still be accepted on paper then imaged by the court, so as to be included in the same computer-based case file as documents electronically filed in the case. The document image would not be word-searchable or quick to transmit, but at least it would be readable using a common medium, the computer screen.

Imaging is also necessary when the appearance of the document, rather than its textual content, is critical. Sometimes an issue in a case involves deriving the meaning of a document from its appearance. For example, cases interpreting written contracts may require the finder of fact to determine whether a word or phrase is underlined or crossed out, or whether the blank filled in on a promissory note names "John & Mary Doe" or "John / Mary Doe." In such cases, the look of the paper, the image, is everything. Although such evidentiary documents are ordinarily introduced at trial, and, at least in Utah practice, do not become part of the publicly accessible case file, often copies of evidentiary documents are attached as exhibits to case file documents such as affidavits and pleadings. The evidentiary exhibits should be in image form rather than text, since what they look like is the issue before the court.

Some limited imaging is therefore needed,(note 25) either for litigants who insist on filing on paper or for documents having evidentiary significance. However, imaging has significant drawbacks that make it inferior to a system predominately employing text rather than pictures of text.

Electronic Data Interchange (EDI)

EDI was our initial favorite in Utah for automating the flow of information into case files. EDI has become commonplace in business transactions;(note 26) much of today's commercial paper is exchanged as EDI. We planned to apply EDI in the case file setting by requiring litigants to include with each electronically filed document an electronic table of data in a prescribed form as a cover "sheet" to the filed document. Upon receiving an electronic filing, we would route the data in the cover sheet to the database and append the accompa nying document to the electronic case file.

However, we ultimately decided against EDI, essentially because it ignores the documentary nature of case file information. Documents in unnormalized natural language are foreign to the relational database technology that gave birth to EDI, and database software includes at best only primitive document handling capabilities. While EDI would tolerate a document as baggage of the database cover sheet, it is out of its element in processing the document in any meaningful way. EDI therefore does not solve the problem of how to maintain or transform a document format through transitions from one computer system to another, and offers nothing to resolve the serious and significant question of how to enable computers to process a document as such more usefully, beyond the mere reproduction or transformation of formatting conventions. The more an information technology can "understand" a document, its structure and content, the more fruitful the technology will be.

Because of EDI's inability to come to grips with the structure and content of the document, it is no help in integrating the data source, the document, with the database. In making an EDI cover sheet, data must be manually extracted from the controlling document, which creates a risk that the data coming into the database may vary from the data found in the document. In fact, a user could send entirely the wrong document with a cover sheet, or a document that was gibberish, and the database would be none the wiser and would act as if a valid document had been filed. From the point of view of real-world users-judges, attorneys, and clerks-the document, not the cover sheet, which is often discarded, is controlling.

EDI also fails to integrate the document with computer-assisted legal research tools. As noted above, the computerized case file would be much more useful to lawyers and judges if it includes hypertext links to cited legal materials, but because EDI ignores documents, it offers no means of linking documents in the case file with other documents, such as court opinions and statutes, in a legal research repository. The root of this problem is that EDI does not include any tools for processing the document itself.

The fundamental difficulty with EDI is that it has the tail wagging the dog. To lawyers and judges, the documents are the center of attention in litigating a case, and the court's database is merely a more or less helpful adjunct.(note 27) Rarely is anything decided in court by what is contained in the database. EDI serves the database well but ignores the document, and thereby misses the main point. Although we focused on EDI for a time in looking for a better technology for case files, we eventually looked beyond it, because case files are almost entirely documents, for which EDI offers only the highly distorting database view of the document as such. Implementing case files as structured, relational databases would be a bad technological fit for the reasons explained earlier in examining case files.

In a few instances, however, the documentary side of the filing is small and the filing contains mostly information of the database sort. A notice calendaring a hearing is one example; the main information a notice communicates is time, place, event, etc., the sort of stuff databases process. EDI would be well suited to a scheduling document. However, since we found SGML to handle all documents, including those that are mostly database- type information, we opted to use SGML for all documents, so that litigants would not have to master one method for scheduling documents and another method for everything else.

Standard Generalized Markup Language (SGML)

Standard Generalized Markup Language (SGML), International Standards Organization standard 8879, is a technique for marking a document for use in other computer processes. If, for example, a database needs to find out the name of the court in which the document is to be filed, the court's name can be tagged like this:

     <CtName>Second District Court of Weber County</CtName>

SGML parsing software can then scan through the document, recognize the text tagged "CtName" as a court's name, and pass the court's name on to other programs. The contents of these "UtCase" tags,

     <UtCase>State v. Arroyo, 796 P.2d 684 (Utah 1990)</UtCase>

could be recognized as citing a case, and could become a hypertext link to the cited case.

SGML can also handle forms of emphasis or reference indicated through typefaces:

     However, Ms. Brown insisted that she <Undln>never</Undln> left
work early without prior approval from her supervisor, and that she never
read an issue of <Italic>The National Enquirer</Italic> either at work or off

Once the SGML processor has done its work and fed the results to the software used to display the filed document, the underlined and italicized words would appear underlined or italicized on the computer screen.

The tags used to mark up the document may be defined by the courts in a "document type definition". They are independent of the codes used by word processors to format text on paper, but in many instances can be derived from word processor codes, especially if document format or styles are consistent from one court document to the next. For example, a word processor could automatically substitute a <Bold> tag for its proprietary bold-on code.

Word processors utilizing SGML technology, such as WordPerfect Intellitag, Microsoft SGML Author, or Softquad's Author/Editor, facilitate tagging a document and displaying the tagged document for reading and editing. SGML-supporting word processors allow SGML tags to be selected from menus, and can usually add many tags automatically based either on the other word processor's proprietary format codes, its styles, or on consistently used sequences of characters. Non-SGML word processors can also produce SGML documents, since the tags are simply alphanumeric characters delimited by angle brackets.

The advantages of SGML include:

SGML is standard. It is a set of published specifications approved by the International Standards Organization (ISO) and the American National Standards Institute (ANSI). As such, it is vendor-neutral; that is, it does not favor one software publisher over another,(note 28) and any number of software publishers can offer software for it, and many do.

However, though standard, SGML is flexible and easily adapted.(note 29) The Utah courts can specify document content in light of their information system, including the data dictionary of their database. A document specification can easily evolve over time as the courts and litigants gain experience with it or as new needs emerge.

It is easy to use yet powerful. In designing the electronic filing system, the courts placed great importance on making sure that whatever we came up with would impose minimal burdens on filing litigants-an important objective of electronic filing is to make filing easier, not harder. In fact, since electronic filing is currently an optional alternative to paper filing, no one would take the electronic filing option if it were more burdensome than paper filing.

It is consistent with standard, common-denominator electronic mail systems, such as the Internet's Simple Mail Transport Protocol.(note 30)

It provides an integrating technology for the court information system. Since the case file is central to the judicial process, it is a good place to bring together various forms of information, such as video arraignments, the case management database, and library resources.

The tags available for Utah court documents are specified in SGML document type definitions and explained in the Practitioner's Guide to Electronic Filing in the Utah Courts, which is published by the Utah courts and available by request.

The disadvantage of SGML is that tagging the text is work.(note 31) A good SGML word processor can place most of the tags automatically by working from formatting codes, word processor styles, or from sequences of characters appearing regularly through the text. Tagging documents created from forms will be less work if one tags the form. However, even taking all available shortcuts, some tagging work remains for the litigant. Our experience has been that litigants are willing to accept this work in return for the benefits of electronic filing to them, which are described above.

One additional fact to consider with SGML is the extent to which it should handle the appearance of the document on paper. SGML can do formatting as well as a word processor's format codes; however, an important principle of document processing theory favors marking up text according to its substantive content, and letting the machine handle the formatting based on the substance.(note 32) In other words, don't mark the beginning of a title with "space-down--inch, center-this-line, turn-bolding-on" codes; instead just mark it "<Title>" and let a programmer have the machine leave white space, center, and bold all titles. Marking up substance and leaving the format for programming and execution makes mark-up easier, and it lets the format vary with the medium. When viewing a document on a computer screen with limited horizontal space, you may want to leave less space before a title.

In Utah, we opted to mark up mostly according to substantive document structure. We will not accept tags which attempt to set a certain font, for example, or other characteristics that affect only the appearance of the document, so users do not have the option to set their own font or other minor appearance factors. The Utah courts use one font for all documents. (Note that we do, however, recognize changes in appearance that affect meaning; for example, we accept tags for boldface, underlining, and italics.) Tagging substance rather than form may offend some who have strong feelings about looks, but we at the Utah courts don't see that as much of a disadvantage in the way we have opted to implement SGML.


Case files are comprised of documents in natural language. As such, they have a textual structure and they contain bits of information that a database can make use of, but the information model that most closely fits the case file is that of a document in natural language. A database model, with its artificial, "normalized" model, fits a case file only in the mind of a computer technologist willing to cut corners with reality. An image model, seeing a case file as a picture of documents, is close to the mark and easy, and sometimes the optimum under the circumstances, but its not for most of the contents of the case file.

The best model for the case file is as a document, with the substantive structure of the document captured in Standard Generalized Markup Language.


  1. B.A., J.D.; Developer, Utah Administrative Office of the Courts, 230 South 500 East Suite 360 Salt Lake City, UT 84102 (801) 578-3939. Return to text

  2. F. Pollock & F.W. Maitland, The History of English Law Before the Time of Edward I 156 (1895); Maitland, The Yearbooks and their Origin, in R.L. Schuyler, ed., Frederic William Maitland, Historian 234-35 (1960). Return to text

  3. "Case file" means the court's collection of pleadings, motions, memoranda, transcripts, and similar documents filed by litigants and the court in adjudicating a case. Return to text

  4. Holmes, The Path of the Law, 10 Harv. L. Rev. 455, 469 (1897) (The essay continues in the next sentence: "It is still more revolting if the grounds upon which it was laid down have vanished long since, and the rule simply persists from blind imitation of the past."); see also Hynes v. New York Central R. Co., 231 N.Y. 229, 131 N.E. 898 (1921) (Cardozo, J., criticizing "the extension of a [past-derived] maxim or a definition [in present cases] with relentless disregard of consequences to a 'dryly logical extreme'", quoting Pound, Mechanical Jurisprudence, 8 Colum. L. Rev. 605, 608 (1908)). Return to text

  5. See, e.g., Pound, The Theory of Judicial Decision, 36 Harv. L. Rev. 641, 643-45 (1923); K. Llewellyn, The Bramble Bush 80-81 (1930, 1981 ed.); Friedman, Legal Philosophy and Judicial Lawmaking, 61 Colum. L. Rev. 821, 842-45 (1961); Jones, Man and Machine in the Search for Justice, 16 Stanford L. Rev. 515, 538-558 (1964); Jones, An Invitation to Jurisprudence, 74 Colum. L. Rev. 1023, 1028-30 (1974).

    On mentioning "experience", Holmes' well-known aphorism comes immediately to mind:

    The life of the law has not been logic: it has been experience. The felt necessities of the time, the prevalent moral and political theories, intuitions of public policy, avowed or unconscious, even the prejudices which judges share with their fellowmen, have had a good deal more to do than the syllogism in determining the rules by which men should be governed.

    O.W. Holmes, The Common Law 5 (Howe ed. 1963); see also Holmes, The Path of the Law, supra n.4, at 465-66 ("Behind the logical form lies a judgment as to the relative worth and importance of competing legislative grounds, often an inarticulate and unconscious judgment, it is true, and yet the very root and nerve of the whole proceeding. You can give any conclusion a logical form." For a later, more tempered view, see Holmes, Law in Science and Science in Law, 12 Harvard L. Rev. 58, 455 (1899); Elliott, Holmes and Evolution: Legal Process as Artificial Intelligence, 13 J. Legal Studies 113, 135-36 (1984). Return to text

  6. White, The Evolution of Reasoned Elaboration: Jurisprudential Criticism and Social Change, 50 Va. L. Rev. 279, 286 (1973). Return to text

  7. However, it is one thing to litigate and adjudicate and another to administer. Administrators bundle and count; to them a case is not the conceptual conundrum of the dispute but rather one of a large number items to be processed. These earth-movers never stoop to examine an individual stone. For them it is important that a case has a plaintiff, the a plaintiff has a name, and the case has a unique number. When you shovel cases rather than decide them, categorization and quantification are important. There are aspects of the case file which can be categorized for administrative purposes, but it is important not to lose sight of the fact that the greater part of the work with the case file happens close in and not with tally marks. Return to text

  8. Cronk, Unlocking Data's Content, Byte 111 (September 1993), quoting Bill Arms of Carnegie Mellon University. Return to text

  9. H.L.A. Hart, The Concept of Law 121-150 (1961); see also K. Llewellyn, The Bramble Bush 80 (1930, 1961 ed.). Return to text

  10. See, e.g., Leith, Involvement, Detachment and Programming: The Belief in PROLOG, in B.P. Bloomfield, The Question of Artificial Intelligence 220, 242-54 (1987); A.L. Gardner, An Artificial Intelligence Approach to Legal Reasoning 33-44 (1987); Bench-Capon & Sergot, Toward a Rule-Based Representation of Open Texture in Law, in Walter, ed., Computer Power and Legal Language: The Use of Computational Linguistics, Artificial Intelligence, and Expert Systems in the Law 39, 52-59 (1988). McCarty, Reflections on TAXMAN: An Experiment in Artificial Intelligence and Legal Reasoning, 90 Harv. L. Rev. 837 (1977) (At 849: "Superimposed on a manageable foundation of manageable complexity is another system of concepts as unruly as any that can be found in the law, with all the classical dilemmas of legal reasoning: contrasts between 'form' and 'substance,' between statutory 'rules' and judicially created 'principles,' between 'legal formality' and 'substantive rationality.' Concluding at 892: "The most important deficiencies appeared in Part I, where it was emphasized that the 'continuity of interest,' 'business purpose,' and 'step transaction' doctrines were developed in opposition to the mechanical manipulation of abstract statutory rules, and that they seemed to have a different structure from the concepts that have so far been successfully implemented in TAXMAN.). Return to text

  11. A computer scientist in the field of computer approaches to natural language explains the lack of success in the field in Sondheimer, The Rate of Natural Language Processing, in Y. Wilks, ed. Theoretical Issues in Natural Language Processing (1989). Return to text

  12. See N. Woodhead, Hypertext and Hypermedia: Theory and Applications 20-21 (1993). Woodhead points out the is a different targeting of the software products used for databases and text, especially hypertext. Databases are optimized for "set-level manipulation of homogenous data", selection of data items by relational operators and/or a query language, calculation of results, generation of printed reports, and superimposing a logical form and normalization over the contents. Hypertext systems are optimized for non-linear reading of non-normalized, non-atomized information with content taking precedence over form, natural language querying, exploration and browsing, and integration of heterogenous media.

    Franklin, Hypertext Defined and Applied, 13(3) ONLINE 37-49 (1989) cites legal information as the archetypal example of text, information which cannot be approached through a database's piecing up, normalizing, and tabulating.Return to text

  13. Actually, in the relationship between natural language and law lie profoundly informative lines of inquiry. For starters, see F. Haft, Einführung in die Rechtsinformatik 59-70 (1977). Return to text

  14. For a thought-provoking critique, see T. Vamos, Computer Epistemology: A Treatise on the Feasibility of the Unfeasible or Old Ideas Brewed New 101-127 (1991). Return to text

  15. Cronk, Unlocking Data's Content, Byte 111 (September 1993), quoting Bill Arms of Carnegie Mellon University. Return to text

  16. As suggested above in note 7, database technology has a serviceable place in court information systems; the concern here is to recognize its limits. Steeped by training and work experience in relational databasism, information technologists often lose sight of those limits. Return to text

  17. Reconstructing a single case file from the microfilmed inflow is laborious. A clerk must sift through the entire chronologically ordered inflow into the court and select the documents filed in the case since it began. Return to text

  18. It's tempting to leave the analysis for automating case file processing to information technology experts or even salespeople. Isn't that what they're for? But there is a danger in this: technologists tend to revisit the last successful project like militarists tend to fight the current war like the last one. Applying a past technological success in exploiting a current opportunity assumes that the solution to every new problem is the one that worked so gloriously for the last. A repeatedly successful technology tends to take on the mental dimensions of a paradigm, and paradigms in turn tend to dictate the mental model and even the vocabulary for understanding every other fact situation. Whatever does not fit the paradigm is glossed over or made to fit. People indoctrinated in a paradigm see the world in its terms; their paradigm becomes their horizon. When the paradigm is technological, method force-fits in the real world can result. Return to text

  19. The pioneering and leading example of a judicial imaging application is the Orange County system, reviewed in L.P. Webster & J.E. McMillan, Document Imaging in the Orange County, California Superior Court Probate Department: An Evaluation (1993). Return to text

  20. We do plan to scan documents which arrive at court on paper, once the electronic case file becomes the official case file of the court. We see electronic filing as an optional and preferable alternative to paper filing, but plan to continue accepting documents filed on paper. Those documents will be scanned and incorporated into the electronic case file. However, since imaged documents are not as usable as electronic text, the documents taken from paper are functionally somewhat disadvantaged. They will retain all the utility of paper, but the court will not endow them with the greater utility of electronic text. Return to text

  21. The problem underlying the lesser utility of the image is that the picture of an "a" is not the alphabetic character "a" that a word processor or word-searching program recognizes as the alphabetic character. Most computer systems today store and process textual characters according to the American Standard Code for Information Interchange (ASCII). ASCII specifies that the first 128 values representable by a single byte are equivalent to specified alphabetic letters, numerals, most punctuation marks, and some control and formatting codes (such as new-line and tab). (A byte is a basic unit of data, and is comprised of eight binary digits or bits.) Most word processors and text-searching software represent characters in ASCII.

    ASCII is standard number 646 adopted by the International Standards Organization, and, since it is used by the great majority of computer systems nowadays, it provides the basic means for sharing information between different systems. However, images, being just pictures of characters but not the encoded ASCII characters themselves, remain outside the processing capabilities available for ASCII. Return to text

  22. Converting images into ASCII by computer is the process usually termed "optical character recognition" or "OCR". OCR is an resource-intensive process yielding an imperfect product; computers do not recognize pictures of characters as ASCII characters nearly as well as humans do. See W.B. Green, Introduction to Electronic Document Management Systems 219-235 (1993). Return to text

  23. Utah R. Evid. 1002 (modelled after Fed. R. Evid. 1002). The best evidence rule often has low persuasive force if the copy is a good one and concerns about genuineness or authenticity are minimal. See, e.g. State v. Wilson, 608 P.2d 1237 (Utah 1980) (where the original was unavailable, a photocopy was admissible). However, images scanned at 300 dots per inch, a common rate, probably will not be quite as fine-grained as a good photocopy, for example, so, depending on the issues, the need to produce the paper original could be real, but only in the rather rare cases where case file documents take on evidentiary significance. Return to text

  24. Computers are becoming increasingly available. Public libraries often offer suitable computers for public use for a small fee. However, computer skills and know-how may not be present in a particular case, and incarcerated pro se parties may not have access to facilities for electronic filing. Return to text

  25. Though acknowledging the need for limited imaging, we in Utah have not yet implemented imaging because it is not the central core of our electronic filing technology. Return to text

  26. Most EDI development has been aimed at replacing sales paperwork: purchase orders, invoices, and the like. It could be expanded to include all commercial paper, if the financial transaction industry recognizes a means of authenticating electronic signatures. However, commercial paper differs in important respects from litigation paper. Commercial paper consists largely of database-type pieces of interrelated information such as a sum certain, a drawer or maker, a payee, perhaps a drawee. Commercial paper does not contain large passages of natural language text that vary greatly in content.

    The lack of a natural language capability in EDI is a legal drawback wherever it has been applied. Legal difficulties arise, for example, because an EDI message does not carry with it any contractual language but rather relies on a preconceived omnibus contract, and in many situations, on pressure between parties, for example to make a sale without thinking much about the legalities left undone. Xueref & Brousse, Des Editerms pour traiter les problèmes juridiques de l'Echange de données informatisées: propositions pour l'avenir, 1992/3 Droit de L'Informatique et des Télécoms 7 (1992). Return to text

  27. This general assumption that the document is more important than the database may not be true in every situation. A few atypical court documents are almost entirely data, with little or no narrative content. Bankruptcy proofs of claim, for example, are all data, with little or no free-form or narrative text. EDI is a natural in processing such documents, and has already been adapted to do so. The American National Standards Institute has adopted transaction set number 176 under standard X.12, adapting EDI technology for filing bankruptcy proofs of claim, and transaction set number 175 adapts EDI for certain court notices. Return to text

  28. The file formats used by word processors such as WordPerfect and Microsoft Word are not standard; rather they are controlled by the software publisher and vary greatly over time and from one publisher to another. Besides the problem of maintaining compatibility in the face of constant, uncontrollable change, there are so many word processors on the market that providing for them all would be prohibitively expensive, but providing for a selected few would play favorites among market competitors and unfairly disadvantage litigants who do not have one of the selected word processors.

    Moreover, word processor formats are designed to serve a different purpose than electronic filing: they are to make a document look good on paper; however, in a virtually paperless court, some paper conventions become unimportant, and other possibilities for making documents more usable gain importance. In other words, word processor formats are designed and optimized for the wrong objective, and they don't fit some important objectives of electronic filing, such as automatic data recognition and hypertext linking for citations.

    Although the Utah Judicial Council has not ruled out the possibility of accommodating proprietary word processor formats, currently only SGML documents are acceptable for electronic format. Return to text

  29. SGML's adapability also means that our electronic filing system can grow with us. We don't have to do everything at the beginning; we can figure out new ways of making the electronic case file better and implement them in the years to come. The forms of EDI messages, by contrast, must be approved by a consensus of the ANSI X.12 user community, a difficult process making the message form difficult to change.

    One of the more fertile fields of research today is in the application of artificial intelligence to text retrieval, the process of locating relevant portions of a large volume of text. Artificial intelligence offers the prospect of making word-searching a more effective means of performing the important task of quickly locating the needle in a haystack. See Rama & Srinivasan, An Investigation of Content Representation Using Text Grammars, 11 ACM Transactions on Information Systems 51 (1993); Krovetz & Croft, Lexical ambiguity and Information Retrieval, 10 ACM Transactions on Information Systems 2 (April 1992); Fuhr, Probabilistic Models in Information Retrieval, 35 Computer Journal 243-255 (June 1992); Croft & Turtle, Text Retrieval and Inference, in P.S. Jacobs, ed., Text-based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval 127-34 (1992) (hereinafter "Jacobs"); Jones, Assumptions and Issues in Text-Based Retrieval, in Jacobs, 157-73; Lewis, Text Representation for Intelligent Text Retrieval: A Classification-Oriented View, in Jacobs, 179-184; Marshall, Manipulating Full-Text Scientific Databases: A Logic-Based S emantico-Pragmatic Approach, 34 Computer Journal 245 (1991); Fung, Appelbaum & Tong, An Architecture for Probabilistic Concept-Based Information Retrieval, in Vidick, ed., Procedings of the 13th International Conference on Research and Development in Information Retrieval 455 (1990). Current approaches toward legal text retrieval are summarized by a real veteran of the effort in Bing, The Law of the books and the Law of the Files, in G.V.P. Vandenberghe, ed., Advanced Topics of Law and Information Technology 151 (1989), and Bing, Conceptual Text Retrieval for Legal Information Retrieval Systems, in Giannantonio, ed., Law and Computers: Selected Papers from the 4th International Congress of the Italian Corte Suprema di Cassazione 41 (1988).

    Haft points out perceptively that all we have accomplished in applying information technology to courts so far is merely the computerizing of the same thing we used to do on paper, which Haft compares to building a locomotive with legs instead of wheels. Real progress will come when we enable ourselves to look beyond convention and into essence. F. Haft, Einführung in die Rechtsinformatik 83 (1977). The far-seeing Vandenberghe envisioned a technologically enabled "judicial workbench" that we have only begun to realize, even though there are no insurmountable obstacles in the path forward. Vandenberghe, Software Oracles?, H.W.K. Kaspersen & A. Oskamp, eds., Amongst Friends in Computers and Law (1990). Return to text

  30. RFC 821 and 822 of the Internet Activities Board. Return to text

  31. Of the two front-running alternatives to SGML, EDI and imaging, imaging requires the least work of litigants; in an imaging system, they keep filing on paper as usual, including the work of transporting documents to the courthouse. EDI requires that the data be extracted from the document into a cover sheet, and that's about as much work as tagging the data in place in the document using SGML.

    Note that while SGML tagging adds a bit of work, it also relieves the litigant of some work. Much of the work of ordinary word processing consists of making the document look good on paper; however, in an electronic SGML document, the appearance is generated automatically from the tags, so there is no need for a secretary to fuss with it. Secretaries can dispense with the hassle of formatting captions, paginating, etc. Return to text

  32. J. Andre, R. Furuta, & V. Quint, Structured Documents (1989); C.F. Goldfarb, SGML Handbook 7-8 (1989).

    For greater elaboration, see Coombs, Renear & DeRose, Markup Systems and the Future of Scholarly Text Processing, 30 Communications of the ACM 933 (1987). Coombs, Renear, and DeRose point out that the question is not whether to mark text up but rather what the markup is doing. They distinguish are four broad categories: procedural markup, e.g. new-line tab-right--inch; content- and/or structure-oriented descriptive markup, e.g. paragraph, title, citation, date, unique identifier for this case; referential markup such as "insert graphic here"; and metamarkup, document-based rather than software-based macros, e.g. substitute "Administrative Office of the Courts" wherever "&AOC" occurs. Prevailing document processing technology emphasizes procedural markup, including traditional punctuation; industry-standard word processors are aimed getting the document look good and print out well on paper, and do so in about the same way typewriters always did, only better. The time has come to transcend the typewriter model, focus on content, and let format take care of itself. SGML can do procedural markup, but also includes robust methods for descriptive markup, referential markup, and metamarkup. Return to text