THE INTRODUCTION OF A NEW DOCUMENT PROCESSING PARADIGM INTO HEALTH CARE COMPUTING -- A CAIT WHITE PAPER Thomas L. Lincoln, MD (1,2); Daniel J. Essin, MD (2); Robert Anderson, PhD (1); and Willis H. Ware, PhD (1). 1) RAND 1700 Main Street, Santa Monica, CA 90407-2138 2) Los Angeles County + University of Southern California Medical Center Department of Medical Administration, 1200 North State Street, Los Angeles, CA 90033 GOAL The main goal of this CAIT White Paper is to address a very general problem in automated record keeping as it applies to Health Care -- thereby resolving the long standing unmet need for an Electronic Clinical Chart (ECC) for on-line clinical use [Institute of Medicine 1991, Ball 1992]. In clinical medicine (and in other similar venues) documentation must be responsive to real world circumstances. As a consequence, the information components are typically highly variable in both form and content, complicating their management and use. Today's technologies offer an opportunity to develop and introduce an effective new systems architecture based on the concept of "document processing" that can markedly improve processing effectiveness by anticipating such variability and making it's management a part of the underlying logic. Here the notion of the document as the object to be stored and processed is in contradistinction to the common computing view in which data, records, and fields are the fundamental items. Electronic documents, properly enhanced with additional labels, can form the archive from which data can be extracted from various viewpoints for classic processing, providing greater flexibility to end-user applications and enhanced results. The proposed approach is based on the observation that the medical chart is typically a collection of individual reports, forms and notes about a patient that come from many sources [Essin 1990, Lincoln 1993a]. As the record grows, navigation among its components and the extraction of information both become increasingly awkward, because the organizational relationships among these components have no natural order, but rather are defined by the problem that any particular user wishes to address. (For example, in the paper chart generated by a modestly complicated case, information is seldom organized in the manner needed -- summaries build on summaries so that an accurate review from first principles is exceedingly hard to accomplish and every attempt is time consuming.) The new approach considers each component of the medical chart as a loosely structured document in which the components can be uniquely delimited in some uniform manner by tags or labels [Essin 1993]. To do this, the (ISO) Standard Generalized Markup Language (SGML), which has been designed for this purpose with respect to data display and formatting, is extended to organize medical content. Here appropriate new content related tagging conventions are introduced that delimit each specific item and section for subsequent retrieval and processing. Other coordinating mechanisms are then built up on this basis. Such an approach is necessary in order to capture the full range of clinical information in a flexible format suitable for automated handling. However, introducing such an approach requires a very significant reorganization of system design and of the logical view of how medical information processing is carried out. Due to the sunk costs of prior capital investment, marketing momentum, intellectual commitment, and other traditional barriers, this important change will be difficult to bring about without a concerted and collaborative effort. BACKGROUND The central problem of clinical computing is to collect and manage patient information taken from a real world medical environment which is at once complex, highly variable, and unpredictable in specific detail. Moreover, as medicine continues to evolve, many of the medical categories, procedures and vocabularies change with time. In the face of all this, the potential uses of the collected information must be stable over long periods of time. For legal reasons some patient records must be kept for up to 21 years and there are formal proposals to keep them much longer. For both medical-legal and scientific purposes their integrity must be assured. Security and confidentiality must be appropriately maintained, and balanced against needs for access. In addition, these automated clinical records must satisfy numerous stakeholders with diverse backgrounds, interests, and levels of sophistication: doctors, nurses, administrators, analysts, etc. who require different interactive modes and displays to meet their needs. An ECC with this kind of even handed flexibility has yet to be developed. Present Health Care Information Systems (HCIS) suffer severely from design constraints that have been carried forward from earlier generations of computer technology -- notably that the intent of the computer has been to process data expressed as files, records and fields. When HCIS were first introduced in the 1970s, severe computing constraints dictated narrowly how things could be done. Although modern processing, transmission and storage capabilities have removed virtually all of the design constraints that were previously limiting, these systems still are sold and installed today; and designs from the past are still buried in the logic underlying them. As a consequence the professional user of such systems is often forced to conform to work sequences that are awkward and labor intensive. "Data entry" has generally been stripped down to fit pre-defined data structures, so that important information and context are lost. Thus, what can be captured today is generally anemic, fragmentary and of modest use, except for billing or for some other specific audience. These legacy platforms propagate widespread rigidity and narrowness of concept. For example, pre-defined entry sequences do not take non-standard medical situations into account, yet that is where much of the clinical interest lies. It is in the nature of very sick patients to be sick in unique ways, and it is here that professional judgment and patient specific intervention is most needed. Typically, 20% of the patients require 80% of the physician's attention [Lincoln 1993b] One commonly held view, first articulated by Prof. Fred Brooks of the University of North Carolina in 1987, is that "there are no silver bullets in software" -- i.e. no dramatic ways of changing software performance. Reviewing years of experience in the ponderous mainframe environment, he drew the conclusion that all software improvements are achieved in small increments, and at high cost. However, even before that time, giant steps in software had begun to appear, as new logical designs were introduced following the removal of particular hardware constraints: 1) powerful PCs, dedicated to the single user, produced spreadsheets, graphical interfaces and ultimately elaborate office application packages with processing intelligence that were exceedingly difficult to duplicate on the mainframe; 2) client server configurations distributed tasks and promoted the evolution of specialized server types, also prompting powerful software formalisms such as the Standard Query Language (SQL) and various transmission intermediaries such as POSTSCRIPT; 3) parallel distributed processing lead to piece-wise problem solving using blackboards and programs like LINDA [Carriero 89]; and 4) internationally networked computer communications opened up the development of interactive navigation tools across heterogeneous data bases, first with search tools such as GOPHER and ARCHIE and then more powerfully using the hypertext conventions incorporated in World Wide Web and MOSAIC. These new tools and configurations have simply outrun present Health Care Information system designs, so that every system in widespread operation today is profoundly obsolete, even where the vendors have hopped on the bandwagon, and introduced newer technologies as superficial modifications or add-ons. Much has been learned over the past 30 years about what formalisms are useful for medical information processing. This knowledge is embedded in the procedures of present systems, but it cannot be effectively put to use on present platforms. SUGGESTED APPROACH The document processing approach starts with the view that paper clinical charts are composed of a variable collection of various kinds of loosely structured documents, each of which follows a general outline: notes about admission, discharge and surgical operations; the patient history and physical exam (H&P); lab and radiology reports; progress and nursing notes; various encounter forms; and many others. These contain vast differences in content depending upon the circumstances of each patient. The vast majority of information consists of text, which is well handled in word processing or memo format, but cannot be fully captured when compressed into data records consisting of limited and pre-defined fields. Needed is a means of storing and processing these clinical documents directly with respect to their content. The issue is not an isolated one, and there are parallels elsewhere. Quite similar conventions underlie the formatting capabilities of all graphic word processing displays and their printer output In this latter case, tags are introduced to describe (or "markup") the text in order to dictate when it should be centered, or placed in italics or made bold, etc. The approach has been extremely successful in managing the variety and complexity of formatting in such information rich domains as the printing of dictionaries. These tags act as commands on the processing program that is working in the background to generate the visible display or the printout. Hypertext applications use similar conventions to identify the pointers for which highlighted text serves as entry points. To manage medical chart documents, this markup technology can be extended to delineate content sub-components in a manner similar to field headers in a record, but in a much more general fashion. For example, in the H&P, a conventional outline of labels delineates its components: CHIEF COMPLAINT, ..., HEART, LUNGS, ABDOMEN, etc., For any given patient, the observations entered in any section of the examination will differ from that of other patients in both focus and detail. One H&P may emphasize abnormal findings related to the heart, another to the liver, etc. By using content tags, document processing can manage such information by isolating the data within a marked-up region to which rules can be applied. Prototypes have been developed by several groups that successfully utilize some aspect of this approach. PARTICULAR ADVANTAGES The feasibility and fruitfulness of document processing has been evident in numerous localized domains and in prototype for some time. Thus SGML is already an established ISO standard. However, because fully marked up documents can consume a great deal of additional computing power to manage a modest amount of data, SGML applications with greater scope have only recently become practical after the cost and performance of processing, memory and storage have come down to modern levels. SGML has also evolved further to manage images, sound, and video sequences through useful extensions such as HyTime [Newcomb 1991]. Most recently, more focused applications that fall under the SGML set of formalisms using similar tagging principles, such as html, have simply exploded on the network scene through their usefulness in the navigation of heterogeneous data bases. Every approach to information management requires a definition of relationships among entities and processes. The classic approach seeks a "data model" to define the organization of data elements, while the object oriented approach requires a specification of an "object hierarchy". To date no satisfactory data model has ever been constructed for health care due to problems of complexity. In a like manner, proposed object hierarchies have failed, due to unanticipated exceptional circumstances . Moreover, due to continued change in medical science and health care practice, new exceptional circumstances will continue to arise that will not only be unanticipated, but unanticipatible. The conventions of SGML have proved more forgiving in their accommodation to change than alternative approaches. The relationships among tags and entities are set out in a table driven format as a Document Type Definition (DTD), which is modifiable or may exist in multiple editions concurrently to fit specific circumstances. DTDs may involve additional pointers and further levels of indirection. This format separates the application from the data as presented in the documents and allows a single application to work with different documents according to more than one set of rules. Recursive functions cascade through the options to a defined an end point in each case. In this manner a single document data base may be shared among applications and users that require vastly different views of the data. This approach avoids the need for (or search for) a static set of standard relationships by providing an explicit semantic order to the natural and omnipresent local diversity that is a part of every operational system. Broader interoperability, both over time and different local circumstances, may then be achieved by higher level translators that follow a consistent framework. Document processing has the further merit that it is backward compatible and forgiving, thereby accommodating prior data structures. A record can be reconfigured as a document. Thus it is possible to make changes piece-wise to older data structures by colonization, i.e., by evolution rather than by revolution. For example, the HL-7 coding scheme(s) for defining communications between programs or sites at the content level can be folded into the newer processing format using one or more DTDs. Documents as the units of information are susceptible to overlays, so that corrections can be made without erasing or destroying the original, allowing a clear audit record of how a document came to its present status. The expected configuration for document processing is one of client server relationships, so that full advantage can be taken of the flexibility of modern networking, not only for an extension of scope, but also for the modular isolation of functional components -- such as secure data stores -- in specialized servers. In document processing using differentiated tagging provides a means of addressing accessibility, confidentiality and security in distributed networks in a fundamental way. The security vulnerabilities introduced by distributed processing have not only made present security practices obsolete in these new environments, but have forced a deeper analysis of how processing must be done to reestablish the proper balance with the needs of operational access [Ware 1993]. The kinds confidentiality and security requirements found in health care and other non-military environments use types of safeguards and overrides by professional initiative that challenge long-standing rules and conventions [Ware 1973]. The ability to effectively audit the access to individual records becomes more important than denying access. For example, users can be warned that looking at certain data will be reported to the originator of that data or that prior permission must be requested except under dire circumstances -- where a justification of such circumstances must be explicitly given. However, the greatest risk is not to individual records (where it is easier to obtain personal information by confidence games played on those with valid access), but rather in the generation of mailing lists about people with particular vulnerabilities or conditions -- for example, those attending a Cardiac Wellness Center with upscale zip codes. Further markup can include time limits on access to certain items set at entry time, simplifying the long term use of data for analytic purposes. BENEFITS In the Health Care domain, much of the cost of care lies in the coordination of its many component activities through information. With present computerized systems, cost savings have been elusive; indeed the number of expensive and unsatisfactory system implementations continues to be ominously large. It is not difficult to point to failed systems where the losses have been in the tens of millions of dollars. The benefits of a document oriented approach will ultimately rest on improvements of work flow and management within health care itself. By anticipating the variability of information capture and use, this new architecture provides enhanced abilities to access and share information, to operate across heterogeneous configurations, and to manage and track the various semantic versions and configurations that arise over time. Architected to bring together more relevant information more efficiently in a manner open to more kinds of users, an ECC utilizing the new architecture can directly reduce the costs of the care process by first accommodating to the numerous health care work flows that are the backbone of medical care, and then allowing these to change to take advantage of new computerized capabilities. A balanced neutral system, capable of equitably serving multiple clinical stakeholders, will eliminate the "politics" of system acquisition, which need no longer be dominated by attempts of one particular group to obtain processing services at the expense of another. The proposed new architecture can reduce equipment costs and the risks of implementation. Due to changes in technology, new computer hardware and software can reduce the cost of working systems by at least a factor of 10 over their predecessors. Modular design allows an incremental implementation, so that current systems may be left in place for a time, but facilitating a stepwise upgrading of their data structures. To the present, radically different systems have served different practice venues: one for the intensive care units, one for the nursing floors, and one for outpatient practice. The new architecture is directly scalable from the different workplaces of the health center down to individual practice venues. By isolating the data in documents and separating it from the interpretive applications, this design facilitates the linkage to specialized systems and eliminates the need for radically different data conventions for different tasks. The new architecture can reduce costs by a feed back critique on present practice. Captured in appropriate form, health care information can be used not only to control costs through regulation, but also by a professional self-criticism that is presently difficult achieve because the mechanism is missing, in which physicians can examine the effectiveness of their own actions in their choice of diagnostic and therapeutic tools, and can be encouraged to do so. Many of the benefits here attributed to Health Care can be extrapolated easily to other domains. Health Care is merely one microcosm in which the complexity of the real world severely challenges rigidly structured systems for information management. Many other areas are quite similar. For example, an encounter tablet can be used to collect information at home visits for medical and other social services, markedly reducing paperwork and reporting time. The same tool could be configured to make field observations and to document inspections in many situations. Given the appropriate software work bench, a specific configuration can be set up in short order. BARRIERS There are obviously barriers to this level of significant change. Ad hoc approaches to particular perceived problems have been satisfying in the short run. For example, standards such as HL-7 (Health Level 7 of the ISO Standard) do a satisfactory job of content mapping for communication between today's systems, and thus commit the industry to present conventions. Moreover, the accumulation of special cases handled for each customer at their own expense does not contribute to systematic evolution. The economics of the market mitigate against vendor investment in significant change, despite the severe limitations in their products. The hospital and health center market consists of about 7,000 sites in the US, and thus is relatively small. Sales are generally made slowly, after considerable on site investment. Until recently the costs of the present generation of systems could be added to the patient's third party bill by "capital pass through," covering the risks of their failures. However, under revised Medicare rules, which are followed by other payors, this is no longer the case, so that many fewer systems are being acquired, and a number of previously successful vendors have failed or have been sold under pressure of failure. Present systems have momentum. Leading systems (including the system of reference in the Institute of Medicine Study on automating medical records) still use the mainframe technologies and logic of the 1960s and 1970s that are expensive to acquire, maintain and operate, but are familiar to the fiscal officers of hospitals. Cosmetic changes to these products (such as color screens) are much cheaper and more immediately marketable than revising the underlying platforms. There is a reticence to move beyond today's configurations because their designs and logical methodologies represent the "devils one knows," while new directions offer unknown pitfalls. There are significant examples of such failures following attempted innovations. For example, a once leading laboratory information vendor introduced a new file structure using third normal form in order to offer flexibility in analyzing data across patients. However, almost all clinical queries of the laboratory were for individual patient records that could be easily fulfilled by records stored and retrieved en bloc, while this new structure required a disassembly and reassembly of patient data at every inquiry. Their new design proved so slow in everyday practice (and too difficult to retreat from) that the leadership was lost and the company eventually sold. In their eagerness to provide new capability satisfactory to the financial office, they had failed to analyze or understand the fundamental work flow. All of these barriers have been perceived and attempts have been made, particularly in academic institutions, to overcome them. One proposed new convention to manage the difficulties with information environments that have great variety and richness is the use of object oriented programming and data bases. This should encapsulate the variety, provided that the object hierarchy can be properly constructed. However, the approach has proved more recalcitrant than anticipated, not only in this domain, but also generally. With even a modest level of complexity in real world phenomena, object oriented structures fail to manage important limiting cases -- and, in a rather classic way -- the hierarchy is found to be "incorrect." Finding the right hierarchy appears to be another example of following the same will o' the wisp as finding the right data model, only more subtle. This is one reason that the indirection introduced in the IEEE MEDIX standard for ISO level 7 has never closed fully. Although document processing does not resolve this issue by some static consistency, it does redirect the problem of cross communication toward an orderly set of translators. OUTCOMES The pivotal advantage of document processing is its ability to bring computer technologies into the service of human needs under human direction without bending the work-flow and information management to machine requirements. An effective ECC will not be created until this is accomplished. The effort expended on this technology will not remain unique to health care. Work in this domain is representative of a large range of situations where the real world and its context must be captured. Thus an appropriate architecture for document processing has numerous other applications extending broadly from military intelligence to the management requirements of service industries. MEASURES OF PROGRESS Reduction in management costs by improved processing overhead can in principle be measured by the number of full time work equivalents that are need to carry out a set of calibrated tasks, together with the investment needed to place the system on line. Improved management feedback in both administrative and professional practice can be observed by comparisons with the already existent baseline of costs assigned to particular Diagnostic Related Groups (DRGs). User satisfaction or dissatisfaction is conventionally measured by a variety of interview techniques. However a simpler and more focused approach concentrates on the sources of extra work that magnify the workload in health care institutions, asking how well the system copes with those non-standard situations that are central to managing sick patients. A test methodology based on Clinical Information Processing Scenarios (CLIPS) has been devised that seeks out such stressful situations and challenges the system to manage them expeditiously. This, and other measures can be used to devise a "Coping Index" to inexpensively compare performance over time between new systems and their legacy predecessors. STATUS To date, a number of prototyping activities exist at a number of sites and in a number of computing venues that point to the effectiveness of the document processing approach. These establish a starting point for collaborative work. To test the readiness of those in the information processing community concerned with medical issues to consider such a change of viewpoint and their potential willingness to collaborate, a draft of this white paper was circulated to various parties and bulletin boards on the INTERNET. The rapid and interested responses received underscore the importance of the information highway and the fruitfulness of "Connections" in the evolution of technologies -- after the manner presented by James Burke in his extensive documentary television series filmed by the BBC. Relevant commentary, critique, and the sharing of related work was received from members of universities, non-profits, and for-profit entities, demonstrating the open generosity that can be expected in a new domain by those who also perceive the same long standing and frustrating set of problems, and who have explored or considered similar solutions. Specifically, Sandy Mamrak and Jack Smith provided supportive specificity on the strengths of SGML and suggested a means of dealing with a major weakness -- the proliferation of individual document definitions -- using a powerful approach to the semantics of translators. References suggested are hard to find, but they are superb! [Barnes 1991; Mamrak 1991]. Combinations of SGML and translators for the development of a medical "forms library" for multi-center clinical trials have been explored by Steve Singer of Dedicated Response in Baltimore, together with Lael Gatewood , at the University of Minnesota, and colleagues at Johns Hopkins. Ed Pattison-Gordon has approached the medical forms question in a somewhat similar manner. Meanwhile, Don Connelly at the University of Minnesota Health Center has spent a number of years prototyping a professional workstation to link intensive care units with the clinical laboratory and similar sources of information. He reports that he has recently moved from an object-oriented platform to one based on html and MOSAIC (a tagging approach that is logically similar to that above) with a twenty fold increase in programming productivity. Paula Hawthorn suggests the Illustra extensible data base -- we are already on their level 2 list for academic grants. Information has also been exchanged with Jim Williams and Len LaPadula at MITRE. Terry Mayfield and Bob Jones at the University of Illinois at Urbana. (Additional assistance within RAND is part of the culture and is acknowledged here anonymously.) Since the original writing we have heard from Dennis Meyers of the Scott and White Hospital in Temple Texas. They, together with their partners in Austin (the R&D group from Nexus and Lexus when these were sold by Mead to Elsevier), have implimented a start at a similar document processing SGML approach, starting with 1 million psychiatric records... Thomas Naegele in Phoenix has fielded a less elaborate system compatible with the same principles in 50 office practices. Taken together, these responses underscore the likely success in this domain of the CAIT approach of fostering multi-institutional collaborative work. A five year concerted effort should place this architecture on a firm footing, capable of independent evolution. THIS PROPOSAL ADDRESSES A NUMBER OF CAIT OBJECTIVES: For simplicity, these objectives are reiterated here: Document processing can be considered today as a visionary electronic solution to major problems in the health care domain, worthy of pursuit by the CAIT. This architecture can form a foundation for digital communications not only of text but also of images, sounds, and voice. This problem area involves change that cannot be solved by an individual company, but might well be solved through CAIT collaboration. Inputs are needed from industry -- both the broader computer industry and HCIS vendors; from universities -- where significant explorations have been and are being carried out; and from non-profits -- which can often bridge departmental viewpoints in ways that are difficult for industry and academia. Given such a combination, the CAIT would be in a position to encourage commercialization and provide for technology transfer. Document creation, storage and processing, utilizing SGML, HyTime, and other coordinative modalities can provide a robust underpinning for the multi-media information environment that represents health care. The architecture allows the consistent development of cited areas of CAIT interest: Content Management Systems to provide understanding, routing, and indexing based on the information contained in images, sound bytes, highly formatted texts, tactile encodings, and so forth. Advanced Databases to store multi-media information, track information content, understand interface protocols, and enforce semantic constraints efficiently; Advanced Dictionaries to enable a user to access the syntax and semantics of stored multi-media information; Version and Configuration Management to constantly ensure the maintenance and delivery of appropriate versions and aggregations of data, such as engineering drawings and collections of related drawings; Security Mechanisms to verify the senders and receivers when needed, sufficiently encrypt critical messages, and validate NII users as they access the network; and Collaboration Utilities to allow distant parties to review and manage shared data, communicate synchronously or asynchronously, and conduct conferences. References: Ball, MJ and Collen MF, editors, Aspects of the Computer-based Patient Record, New York, Springer-Verlag, 1992. Barnes, JA and Mamrak, SA: "A Model and a Toolset for the Uniform Tagging of Encoded Documents," Electronic Publishing, Vol 4 (2) pp. 63-85, June 1991 Carriero N and Gelernter D: "Linda in Context," Commun. ACM, Vol 32 (4), pp. 444-459, April 1989. Essin DJ, Essin CD "Computerizing Medical Records: Software criteria for systems to document patient encounters", Critical Care Medicine, Vol 18 pp. 100-102, 1990. Essin DJ "Intelligent Processing of Loosely Structured Documents as a Strategy for Organizing Electronic Health Care Records", Methods of Information in Medicine Vol 32 pp 335 ff, 1993. Institute of Medicine, Committee on Improving the Patient Record The Computer-based Patient Record : an essential technology for health care, Dick, R S and Steen,EB, editors; Washington, D.C., National Academy Press, 1991. Lincoln TL, Essin DJ, and Ware WH: "The Electronic Medical Record: A Challenge to Computer Science to Develop Clinically and Socially Relevant Computer Systems to Coordinate Information for Patient Care and Analysis," The Information Society, Vol 9, pp. 157-188, 1993. Lincoln, TL: "Information and Reality: Where exceptions test the rule," Proceedings of the 26th International Seminar on the Teaching of Computer Science, University of Newcastle upon Tyne, Department of Computing Science, 6 - 10 September, 1993. Mamrak, SA and Barnes, JA:"Considerations for the Preparation of SGML Document Type Definitions," Electronic Publishing, Vol 4 (1) pp 27-42, March 1991. Newcomb S, Kipp N, and Newcomb V "The 'HyTime' Hypermedia/Time-based Document Structuring Language," Commun. ACM, Vol 24 (11) pp 67-83, Nov. 1991. Ware WH, "Records, Computers and the Rights of Citizens," (a report of the DHEW Secretary's Advisory Committee on Automated Personal Data Systems, chaired by W. H. Ware), RAND, P-5070, 1973 Ware, WH "The New Faces of Privacy," Santa Monica, CA, Rand, P-7831, 1993. Respectfully Submitted: Tom_Lincoln@rand.org Daniel Essin Robert_Anderson@rand.org Willis_Ware@rand.org