[Mirrored from: http://www.sgmlbelux.be/96/mikula.htm, text only]
Norbert H. Mikula
Building BE-24 - P.O. Box 218
NL-5600 MD Eindhoven
The semiconductors industry of today faces a highly competitive market. It is not enough anymore to develop high quality products in less time than others. How to get information about a product to the customer is one of the key factors for success in today's business environment. Some of the most crucial factors influencing the effective dissemination of information are:
The Pinnacles Component Information Standard (PCIS), developed by the semiconductor industry, is being used as a means of distributing rich and computer-understandable information. The approach presented in this paper targets these areas by deploying the advantages of publishing via the Internet combined with the new powerful Java programming language and support for ISO standard 10179, DSSSL (Document Style Semantics and Specification Language).
The system discussed in this paper can be divided into 3 parts :
The paper describes these 3 layers from an architectural point of view. Test results with sample "real life" data are presented. Restrictions imposed by current Intranet-Internet browser technology are pointed out and future desired enhancements are discussed. Furthermore we discuss the lessons learned and point out which obstacles still need to be removed.
Keywords: SGML (ISO-8879), DSSSL (ISO-10179), DSSSL-Online, PCIS, Java, Internet, Intranet
As the Java language plays an important role in the Electronic Databook I briefly want to introduce Java, its history and its underlying concepts.
Sun started to work on the project that eventually hit the market under the name "Java" in the early 1990"s. The primary objective at that time was to make development a more platform-neutral process (Conn-95), i.e. to develop a platform-independent programming language that could be used to program and control television set top boxes for video-on-demand and other "information highway" applications. When the Internet and the World Wide Web started to emerge as a viable computing platform Sun read the sign of times and gave its project a completely new direction.
The idea of having a virtual machine, on which a piece of code can be executed independently of the underlying operating system and the underlying physical hardware, is not new. Nobody, however, could have anticipated the impact that the Java concept would have on the computing world.
Java is an object-oriented programming language that incorporates many of the powerful features of C-C++. (Note 1; see Naugh-96, p. 547) leaving out dangerous features such as e.g. operator overloading and multiple inheritance. Furthermore the concept of pointers per se has been completely abandoned.
Yet another improvement of Java, as opposed to its ancestral programming languages, is the, long demanded, memory garbage collector. For a complete discussion of the Java programming language and its underlying design decisions see Gos-96, Gos-96B and Naugh-96.
Java code is compiled by a Java compiler into so-called byte-code. This byte-code presents instructions to a Java virtual machine on which the code is interpreted. Hence on every platform where such a virtual machine is available a Java program can be executed. (see Naugh-96, p.11). The signing of licensing agreements between Sun and companies like Novell, Microsoft, Adobe, IBM and HP, to name but a few, shows the importance that companies all over the world attach to Java and this surely will add only even more momentum to the already present "Java-mania".
Java byte-code is small and compact, making it very easy to transport it via the Internet. Together with the Java development environment you also get a so-called "applet-viewer" that enables you to download Java code via the Internet and execute it on your local virtual machine. Furthermore companies active in the Internet-WWW browser market like Netscape Corp. and Microsoft are now supporting Java in their products.
Java faces two serious problems that are heavily discussed and being addressed by the Java community.
The Java byte-code, which is interpreted when being executed by the Java virtual machine, is of course never as fast as native code is. The obstacle, which isn"t really a problem for programmers that use Java for developing small "applets" (Note 2), can cause problems in the (sometimes) computing-intensive environments of business applications. At the moment there are two approaches that could solve or at least try to address the problem:
From the very beginning of Java"s marriage with the Internet the questions of system security have been heavily discussed. To be able to compete with native code in business environments, for most applications, several features that are possible threats to system security should, nevertheless, be supported. Amongst such features are:
For a more in-depth discussion of the Java-security aspects see Yell-96. It is the authors' opinion that new configureable "security managers", i.e. run-time modules that impose a certain security policy, will emerge in the near future. They should enable the user to decide e.g. what hosts are allowed to be contacted and what portions of the file system should be accessible to the Java programs.
Manufacturers provide information on their products using so-called "databooks". These documents contain information that designers need to know to carry out their job. Traditional databooks have a number of problems. Amongst them are (See PCISW<):
ECIX - The Pinnacles Electronic Component Information Exchange Project was started to address these problems. The ECIX Project is a project within the CAD Frameworks Initiative (CFI) that currently works on two standards:
The goal of the Component Information Dictionary Standard (CIDS) is to provide a computer-understandable dictionary of characteristic properties of components.
The Pinnacles Component Information Standard (PCIS) is being developed to serve as a basis for the interchange of technical information about electronic components and to enable electronic component manufacturers to create and distribute a new form of compiled information, Electronic Data Books or EDB"s (See PCIS-95).
The PCIS Standard is an application of SGML and has been developed under the auspices of the Pinnacles Group. The Pinnacles Group is a consortium comprising many of the bigger companies in the semiconductors industry:
As the PCIS standard is an application of SGML it comprises the classical elements of every ISO 8879 application, that is a set of elements (TAG"s) and Document Type Definitions for certain classes of documents. The goal of the PCIS efforts is not to say what information has to be included in the data sheets or data books of certain companies but rather how the information should be tagged (marked up). A detailed summary of what are and what are not the goals of the Pinnacles Group can be found in the "Pinnacles Component Information Standard 1.2 Final Draft" (see PCIS-95, p. A-4 to A-6) and, for more technical background information,in a whitepaper of the ECIX project (see Jeff-96).
Cappuccino is a system that has been designed to support the following features list. The features have been chosen in this particular way to address specific requirements of the PCIS standard (see PCIS-95, p. A-22 ff.).
FEATURES MINIMIZE DATATAG NO OMITTAG YES RANK NO SHORTTAG NO LINK SIMPLE NO IMPLICIT NO EXPLICIT NO OTHER CONCUR NO SUBDOC NO FORMAL YES
The architecture of the Cappuccino system tries in its basic components to follow the suggestions of Annex F of the SGML handbook. (see Gold-90, p. 543).
The main components of the architecture are:
Figure 1 : Cappuccino Architecture
The Stream-Manager forms a layer that provides the Scanner with input. The scanner adds additional functionality to a usual input stream:
The Scanner makes use of the Stream-Manager to read in characters and tries to recognize tokens-delimiters depending on a certain scanning mode that should be used.
The parser consumes the tokens provided by the scanner and checks if the document that is being read (represented by a sequence of tokens) conforms to the document type definition that has been read prior to the document instance.
The results of the parsing process are sent, synchronized with the parsing process, to the top-level layer of this architecture, the ESIS-Interface.
ISO 8879 conforming documents can contain external and-or internal entities. There must be a way to keep track of entities, their location, content and semantics. The Entity-Manager takes care of this. The main functions provided are:
Tell the stream-manager to adjust the input stream because the parser has encountered a reference to the external entity st
The duty of a catalog is to handle catalog files in the format of an SGML Open catalog, which adheres to the SGML Open Technical Resolution 9401:1995 (Amendment 1 to TR 9401) . Catalogs define the mappings of a logical storage identifier (e.g. a formal public identifier) to a physical location.
ISO 8879 defines the interface, meaning what data has to be passed, between an SGML parser and an application that is build on top of a parser (like Cappuccino). The definition of this interface can be found in (Gold-90, p. 359-362) as Attachment 1: The ISO 8879 Element Structure Information Set (ESIS)
YADE - Yet Another DSSSL Engine - is, as the name suggests, a project whose final outcome should be a Java-based implementation of a tool that is able to process documents conforming to ISO-IEC standard 10179. In order to reduce development complexity DSSSL-Online, a subset of DSSSL that has been specifically designed to allow early software implementers to provide a common accepted minimal conformance to ISO-IEC 10179, has been chosen as the first milestone to achieve.
To show how Yade fits into the concepts of DSSSL, we will first discuss the major areas of DSSSL and its relation to DSSSL-Online.
The conceptual model underlying the DSSSL specification consists of two distinct processes:
These two processes can be used together or separately as well.
Figure 2 : The DSSSL Conceptual Model
The transformation process transforms an SGML document into another SGML document under the control of a transformation specification. The SGML document that is the result of this transformation process can then be used as the input to the formatting process. (see DS-96, p. 11).
Figure 3 : The DSSSL Transformation Model
Transformations can be done completely independent of the formatting process. Transforming in the DSSSL-Transformation sense means really the physical rearrangement of items, as opposed to the formatting process where also items can be rearranged from a user point of view.
Since the transformation part is not of high relevance to the approach described in this paper, it will not be discussed here. For further background reading please refer to the DSSSL standard at (DS-96, p. 11 ff).
Before talking about formatting it is important to understand what the term "formatting" means in a DSSSL sense. Formatting is any combination of DS-96:
Figure 4 : The DSSSL Formatting Model
The formatting process can be divided into a number of separate steps:
It is this step that e.g. transforms an SGML element of type H1 into a flow-object of type Paragraph
The section in a DSSSL specification that would achieve this could look like:
(element h1 (make paragraph font-size: 12pt)).
Read this as "For an element of type H1 create a paragraph whose content is using a font size of 12 points".
A grove is similar to an element tree, but may include other subtrees, for example, a subtree of attribute values. Relationships in a grove are expressed in terms of properties (see DS-96, p. 12).
A grove consists of grove objects. A so called grove-plan determines a set of classes. Each node in the grove is an instance of one of these classes. The node itself contains a set of property assignments. A property assignment is a property - value pair. Certain properties, who are needed together form so-called property groups.
There are 25 different types of flow objects that can be used to specify the layout of a document. Examples of such flow objects are Paragraph, Rule (to draw line(s)) or e.g. Character to represent characters. For a complete list of flow objects provided by ISO 10179 please refer to the standard (DS-96, p. 17-18).
To understand the semantics of a flow object it is important to explain the concept of areas.
An area is used to give a meaning to flow objects. The result of formatting a flow object other than the root flow object is a sequence of areas (see DS-96, p. 165).
An area can be imagined as a rectangular box with a certain width and height with respect to the output medium. There are two type of areas:
The two types differ in their semantics. Display areas are boxes that don't depend too much on their neighbors, more on the layout algorithm of the flow object which builds up the area they themselves are in. Inline areas are arranged in a way so that they follow an imaginary line of progression. To illustrate this, a character, for example, forms an inline area. Characters are not supposed to "float around", they should stay in certain arrangements: "lines".
Figure 5 : Two types of Areas
Flow Objects can be either inlined, displayed or both, whereas in the latter case the context determines which of the two has to be chosen.
DSSSL uses a so-called expression language. This expression language is the "host-language" for both the formatting and transformation specifications. Constructs of the two DSSSL languages (style or transformation) can mixed with the expression language in a well-defined way.
ISO 10179 uses an expression language that resembles closely the Lisp-like programming language Scheme. The IEEE Scheme standard R4RS was chosen to be the starting point Cling-91. The original language was modified in several minor ways e.g. only the side-effect free subset of Scheme is being used and certain functions were added and-or modified to make it more suitable for DSSSL. For a complete description please refer to the standard.(DS-96, p. 29-30).
DSSSL provides a set of constructs that can be used to query a document. Not in the sense of using full SQL queries, but it does show that a new paradigm of viewing documents as a collection of data items which can be queried like databases is beginning to emerge.
The DSSSL standard provides a very powerful base for future document processing systems. However, the DSSSL standard is as complex and voluminous as it is powerful. Anticipating the long time it will take for industry to provide full DSSSL compliant tools, certain subsets of DSSSL functionality were defined to be obligatory for early developers, while other subsets were defined as being optional. Two parts were identified as being required for a first implementation:
each of those forming a subset of the full languages defined in the standard.
The so-called DSSSL Online Application Profile tries to define a set of components of the style language that form guidelines for the first implementers trying to build systems conforming to ISO 10179. This sub-standard incorporates the Core Expression Language, the Core Query Language and a reduced set of flow-objects with a sometimes slightly different use of their principal characteristics (see Bosak-95).
The grove building processor consists mainly of the Cappuccino parser as described before. Cappuccino sends the results of the parsing process to an object that conforms to an interface (Note 6) specification that has been designed to provide methods to interpret ESIS-compliant information.
Such an object, conforming to the ESIS specification, is called by the parser for each ESIS element and builds the grove. The output of this layer forms the source grove.
Figure 6 : Preparation Step
This layer forms the heart of Yade. There are two steps involved:
Figure 7 : Grove Building
DSSSL per se does not specify how an application is supposed to locate the DSSSL specification that is to be used for a document. As of the current status of the project, Yade expects the stylesheet to be located in the same directory as the SGML document instance, or defined by an URL respectively, and to have the same name as the document instance. The only difference is that the extension for the file should be ".sch" instead of whatever you use for your documents e.g. ".sgml".
Once the style specification has been located it is parsed using Cappuccino. As the grove building processor, as described before, uses an object that conforms to the ESIS interface specification so does this layer use an object that conforms to the ESIS interface. The semantics of the method calls are differently, however. Style-specifications are provided in SGML format using a DTD conforming to an architectural form as described in DS-96, p. 20 ff. The expressions forming the style-specifications are wrapped into SGML elements. The methods in this layer "unwrap" these specifications and store them in a hashtable that allows the access to the style-specification by using a description as defined in the architectural form. Figure 7 shows this first step.
The final action in this preparation step is to feed a style-specification, the first "full"specification in the whole style-specification document, into the DSSSL engine so that the construction rules are getting defined in the system.
To understand the process of flow object tree construction, I first want to explain one important concept that is being used by Yade. It is what I call the "Flow-Object-Mapper" or short FOM.
A problem that all DSSSL engines face is that the output of the formatting process and especially the formatting of the flow objects will always depend on the medium on which this last step of the processing by the DSSSL engine takes place. As an example, output to PDF or Postscript will require different rendering processes than output which targets a windows environment (e.g. Motif). In ISO-IEC 10179 the layer that takes care of the rendering is the so-called "Formatter".
In the Java world or in general in the world of OO programming one will want to add the methods responsible for the rendering to the classes that represent a certain flow object. There will be certain methods and data (i.e. the characteristics applicable for a flow object), however, that are common to all objects of a certain flow object class, independent of the desired output medium.
In order to keep together those two approaches, which seem to be orthogonal, I've introduced the concept of the FOM. There is a library of classes that forms the basis for all flow object classes. There is e.g. a class "Paragraph". This class is the base for all other flow object classes of type paragraph, e.g. a flow object class paragraph that contains methods that are deployed for rendering the content of paragraph onto the canvas of a Java-AWT window.
Figure 8 : Inheritance and the FOM
During the process of flow object tree (FOT) construction the grove is traversed and every node of the grove is considered. For each node the system checks whether a certain rule, as defined to the system in Step 1 or explicitly defined by DSSSL, is applicable. If there is a flow object being created, the FOM starts to become active. The FOM maps the name of a flow object class e.g."Paragraph" to the class name of a flow object class that has been implemented to provide the platform-dependent methods for rendering this object.
Figure 9 : FOT construction via the FOM
In other words, to enable Yade to provide a FOT containing objects that can render their content for a certain output format (e.g. the AWT of Java) one has to:
A DSSSL engine by itself does not yet constitute an application. There needs to be an interface that allows the user to communicate his wishes to the engine. In some cases a pure command line approach will be sufficient but in many other cases one will want to have a nice GUI interface that allows even the non-expert user to make use of the full potential that a Java-based SGML-DSSSL approach to document exchange can provide.
The Philips Semiconductors Electronic Databook is such an interface. Although its final target is to be tailored to certain specific requirements in the semiconductors industry, it is primarily a DSSSL based SGML viewer implemented in Java using the Java-AWT (Abstract Window Toolkit) as the windowing interface.
As of the current status of the project, the following flow object classes, supporting the major desired viewing characteristics, are available and can be used in style-specifications:
Java is being used as the strategic language for this project, amongst other reasons, for its platform independence. Well, there is one problem that you get by using Java (Note 7). Sometimes applications can only offer a certain subset of their functionality, depending on the computing environment which provides the Java virtual machine through which the Java byte-code is executed.
The problem lies with the so-called "Security Manager" of this computing environment. For example, the security manager used by the Java runtime environment of the Netscape Navigator, is very restrictive. It does not allow applets to access the local file system. This imposes certain restrictions. For example an "Export" function that allows a user to export parts, or the SGML instance as a whole, to your file-system would trigger a security exception and prevent this operation.
Stand-alone Java applications on the other hand, usually Java programs using a Java runtime environment that is not embedded into an Internet browser, do allow full access to the local file system. Although these restrictions make sense, they also create the problem of how to build applications that can deal with both types of environments.
PSC-EDB "knows" whether is has been started as an applet or an application. Depending on this information the user is offered only the features that can be carried out without interfering with the security manager. This is not the best solution as there are applet-viewers that allow you to configure your security manager in more flexible ways. An alternative solution that might be included in a future version might be to pass the information needed through so-called "Properties", variables like i.e. environment variables in the Unix world, that allow one to pass information to an application. Yet another scenario would be to consult methods in the security manager to find out whether certain operations are allowed or not. The long awaited 1.1 release of the Java Development Kit should provide, at least according to announcements, a more flexible way to combine both the needs for security and the absolute necessity to maintain flexibility and have full usage of computing resources in whatsoever form.
To illustrate current features as provided by PSC-EDB I would like to simply list, in a handbook-like manner, the major actions a user can trigger. The features which are mentioned here are the features provided by the system if there are no restrictions imposed by the security manager.
As Java provides class libraries for network access and since the parser Cappuccino can work on local files as well as URLs, the application layer can make use of this. The user can:
Figure 10 : Open a URL
Using ISO 8879 as the document interchange format means one can exchange data that is very rich in terms of information that would be lost by using other formats like plain ASCII. PSC-EDB makes use of the power that SGML can provide by having two different types of search features provided for the user.
This first feature allows the user to ask traditional questions like "Find the part of my document that contains the (sub)string XYZ". It furthermore allows the user to make use of SGML specifics to search for a certain element type. E.g. "Find the part of my document that is constructed out of element type ABC".
By combining those two "basic" queries the user can ask e.g. "Find the part of my document that contains the (sub)string XYZ in the element type ABC".
The kind of search provided by this features goes down to the level of attributes and their values as they occur in an element. Through an input window the user can specify a certain element type he is looking for, an attribute he wants to find and the value this attribute should have.
Figure 11 : Search Window
A user can also work with the document by selecting regions in the document and triggering operations that operate on the regions selected. A selection of a region can occur by either a search operation, as mentioned before, or by simply using the mouse-pointer to select the desired area.
The only operation that can be performed on regions, as of the current status of the project, is to export the selected area to SGML. Our plan is to support export to other formats as well in the future.
A DSSSL style-document can contain more than one style-specification. The first complete style encountered by the tool is being used to do the first rendering on-screen. If there is more than one style-specification available, however, a user can select between those.
To be able to choose between a number of views on the document offers you a very flexible way of dealing with SGML data. If one takes into consideration that different classes of users need different chunks of information at different times, the full potential of this approach becomes readily apparent.
The designer of style-specifications can provide a number of different views on a document, leaving it either up to the user to pick the one which suits her best and-or for more advanced systems in the future, making it possible to actively use information about the user (Note 8) to present the document in the best way possible.
The overall system produces reasonable good results if one considers that this is still a prototype and will need further development. Future plans include work in the following areas:
Make it fully compliant to ISO 8879 and improve the performance. Release it as a beta version to get feedback and receive bug-reports.
The parser has proven to parse a variety of SGML files. Nevertheless it is still far away from being truly "ISO 8879 compliant". It was never the objective to create a parser that one can use for real SGML development. Also its list of supported features reflects its development purpose: to work in an environment that needs to parse validated SGML data and that can live without the real tricky aspects of SGML.
Implement the complete DSSSL-Online subset. and improve the performance. A special area to be addressed is memory usage. Release it as an early alpha version to interested developers.
Yade relies heavily on Kawa. As of the current status Kawa-0.2 is being used. Since this early release there have been many improvements and (as of now) Kawa 1.0 is available. To include these new functionalities into Yade, it will have to be adopted to work with Kawa 1.0 (or later).
The application interface will see further developments in two directions. A first direction of research is to investigate which interface strategies are especially useful for DSSSL-SGML based systems. This includes e.g. semantics for Multi-Mode flow objects and strategies for search interfaces that allow one to exploit the full richness of SGML tagged data.
To address the needs of the semiconductors industry, one of the next steps will be to find out how DSSSL stylesheets must be designed to present the type of data need by this community in the most convenient and powerful way possible. PCIS compliant datasheets form the bases for effective dissemination of information. DSSSL's powerful new paradigm for formatting will help to get the most out of PCIS enriched semiconductors material.
This paper essentially is a status report of my "Diploma Thesis" titled "Electronic Databooks - Proof of Concept" at the Institute of Informatics-University of Klagenfurt in Klagenfurt (Austria). The project is a cooperation with Marketing & Sales Communications-Philips Semiconductors (The Netherlands). Since there are two parties involved in this project there are two persons supervising the development of this effort. These two are, and I feel very grateful to them, O. Univ. Prof. H.C. Mayr of the Institute of Informatics and Ing. A. Elkerbout representing Philips Semiconductors.
There are also others that have substantially contributed to the project with questions, bug-reports and suggestions for improvements. I want to mention especially Gavin Nicol (firstname.lastname@example.org) who helped a lot in improving the performance of the SGML parser and was crazy enough to be one of the first to work with a tool that is still in its childhood. Understanding ISO-IEC 10179 would have been almost impossible without the continuous and patient support of James Clark (email@example.com).
1.- Which I would say causes some problems for people who, like the author, have a strong background in C. (Back)
2.- Applets: small applications used for e.g. scrolling text banners and other small applications inside Web pages. (Back)
3.- Of course in practice it will be either a network input stream or a file input stream. (Back)
4.- These two types of flow objects have been constructed in the previous step. (Back)
5.- The standard ends here, but an application will have to take care that there is actually something happening on e.g. the screen in a further step. (Back)
6.- Interface in the sense of Java interfaces. (Back)
7.- Amongst many others ... (Back)
8.- E.g. by making users log in and maintaining a database about their typical information needs. (Back)