Using content identification to achieve business benefits

[This local archive copy is from the official and canonical URL, http://info.admin.kth.se/SGML/Konferenser/SGML97/SGML97.html; please refer to the canonical source document if possible.]

SGML Sverige 97
Sammanfattningar från konferensen

Innehåll

En introduktion till SGML
Helena Antbäck, Texcel Svenska AB och Hasse Haitto, Synex Information AB

SGML för alla
Thomas Faleij, Försvarets materielverk

Using content identification to achieve business benefits
Pamela Gennusa, Managing Director, Database Publishing Systems

The SGML implementation at Norsk Hydro
Björn Peltonen, Citec Information

TT implements SGML
Johan Lindgren, Tidningarnas Telegrambyrå

DTD development for the legal domain
Cecilia Magnusson Sjöberg, Stockholms universitet

Converting Legacy Data to SGML
Elizabeth Gower, Adobe Systems Europe

Making SGML Easier with Microdocument Databases
Benoit De La Selle, OmniMark Technologies

Intranet – Getting the Most from SGML
Simon Nicholson, Chrystal Software

Kostnadseffektivt införande av SGML på bred front
Poul Kongstad, Enator Information Management

XML – Extensible Markup Language
Hasse Haitto, Synex Information AB

Building an SGML system
Steve Pepper, Falch Infotek

Vad händer med SGML-familjen?
Per-Åke Ling, Ericsson Utveckling AB

SGML Open Activities
Hasse Haitto, Synex Information AB

Publicera på Internet med hjälp av SGML och databaser
Fredrik Oskarsson, Enator Information Management

Multiagency Electronic Regulatory Submission – use of SGML in New Drug Applications; Bo Lovén and Christian Wallgren, PharmaSoft AB

En arbetsstation för översättaren
Svante Kleist, Texcel Svenska AB

Regelverk i SGML-format
Richard Nilsson, Saab

Automatkonfigurering av anläggningsdokumentation
Stefan Aldborg, ABB Industrial Systems Anders Kjellberg, Convertum

En introduktion till SGML
Helena Antbäck, Texcel Svenska AB och Hasse Haitto, Synex Information AB

View presentation

SGML för alla
Thomas Faleij, Försvarets materielverk

View presentation

Using content identification to achieve business benefits
Pamela Gennusa, Managing Director, Database Publishing Systems

View presentation

608 Delta Business Park

Great Western Way

Swindon

Wiltshire SN1 1LZ

United Kingdom

email: [email protected]

Abstract

When developing a new SGML system, many companies justify the cost of the system based on anticipated benefits. These benefits may include improved work flow, fewer errors in data consistency, greater throughput, ability to use the same data to drive multiple output formats and media, etc. These are all very tangible, measurable benefits revolving around the physical documentation or publication system.

Less obvious is how the use of content identification, and subsequently content management, can return benefit. Before they turn to the use of SGML, most companies do not identify the presence of particular pieces of content in their documents. Therefore, the impact of doing so is rarely factored into a business case for SGML.

What is content identification?

With the use of SGML, we can choose to identify objects in our documents. The objects we can identify can be very generically named, such as 'chapter', 'section', 'table', etc. or they can be very specifically named, such as 'maintenance', 'procedure', 'toollist', etc. Quite often, the choice of content-generic names versus content-specific names is seen as an all-or-nothing-at-all choice. When companies do look at the benefit of using content-specific element type names, they also see the associated costs:

those using the markup to existing documents must be skilled in the
subject matter;
the DTDs contain more element types and thus:
there are more element types to create formatting/processing instructions/macros/scripts for;
there are more element types to maintain;
more time is required to train those putting in markup; etc.

However, choosing from one of two poles need not be the case. Instead, content identification can be thought of as a choice along a continuum.

More gain for the pain

Whilst for some applications, there is no substitute for complete content-specific markup (such as documents relating to safety issues, etc.), there are many more cases where predominantly content-generic markup can be augmented with content-specific markup to great advantage.

The following are a few examples of how judicious application of content-specific markup returned a business benefit:

Use of bilingual keywords across multi-lingual documents makes it easier for users to find the appropriate content using their first language. The quick access of mission-critical information supports more timely operation of and decision-making with regard to costly-to-run equipment.
Use of content-encoded commentary and analysis in legal documents. Knowledge acquired by one employee becomes available to all other current and future employees working with contracts. The retention of this corporate knowledge leads to better management of the contracts over time.
Use of content encoding to identify those objects that support new access paths into the documentation. Whilst tables of content and indices were useful in paper delivery, other access paths are possible in electronic delivery. Such access paths can be designed based on how the user of the
documentation thinks about his/her job and the tasks they perform. Access by part number, task code, fault isolation code, etc. means faster access to information with the accompanying result of greater productivity.
Use of content encoding to categorize or subcategorize objects. Using categories, information can be resorted into new forms and, in some cases, into timely, low-cost, spin-off products.

Conclusion

In any business situation, there may be a greater or lessor need for content-specific identification. Through business requirement analysis, companies can decide what degree of content identification will have a positive impact. Use of such identification can sometimes return greater benefits than those realized through improved document production. At the very least, it can augment the benefits to provide a better business case for moving toward the use of SGML.

Biography: Pamela Gennusa

Pamela is Managing Director of Database Publishing Systems Ltd where she has led the consultancy, development, and conversion service activities since 1990. During that time, she has served as consultant for a number of SGML-related applications in the oil, pharmaceutical, telecommunications, and defence industries. Prior to joining DPSL, Pamela worked for Datalogics, Inc. in the U.S. Her last role there was Director of Marketing.

She has participated on both the ANSI and ISO committees responsible for the creation of ISO 8879. Until 1992, she served as Co-Chair of the CALS committee reponsible for MIL-M-28001. Pamela is a recipient of the GCA Tekkie Award. Each Spring since 1991, Pamela has chaired the GCA's SGML Europe Conference. She has served as President of the International SGML Users' Group since 1992. In 1992, she became a member of the GCA Board of Directors. She also served as Chief Marketing Officer and President on the first SGML Open Board of Directors (1993-1995).

The SGML implementation at Norsk Hydro
Björn Peltonen, Citec Information

Abstract

A significant economical objective at Norsk Hydro is to reduce the time and cost of maintaining equipment used in oil production.

According to NORSOK, 50% of the development cost of an off-shore installation, is related to information.

In this case study we will explain the implementation of an interactive system to improve the accessibility of technical supplier documentation by utilising the SGML standard.

Why SGML

Close co-operating and networking between operators, contractors, vendors and manufacturers is most important in large scale project such as an off-shore installation. One of the most essential success factors in these kind of cases is to use approved standards, like NORSOK, ISO / SGML /STEP, POSC/CAESAR etc.

This will, in the long run, cut costs substantially for all parties involved.

The SGML standard provides Norsk Hydro with an application- and system independent storage format for technical documentation. At any level of granularity SGML encourages the identification of important details inside the documents which subsequently may be utilised by a number of different applications and users. This will also give the information a life-time protection due to the fact that the format is independent of tools and presentation format.

The HVEDIK Concept

The goals for the project is to reduce the maintenance cost by 30% and at the same time increase the quality and increase the security level.

After analysing the situation, it became obvious that the greatest potential for improvement was related to the documentation.

The backbone of the whole new system is a Data-Model, which is compliant with the POSC/CEASAR data model, which describes all the systems and their relations, installed on a oil-producing platform and two DTD's that will describe the information structure. The system architecture is based on a combination of SGML documents and traditional database information. All documents (e.g. maintenance procedures etc.) are stored in SGML, while information used for planning, logistics, and parts management is stored in normalised database tables. There is also specific requirements for drawings. Earlier, AutoCAD and Integraph were the only accepted formats. The goal is to transform these drawings directly to CGM4 including intelligent "hot-spot" information. The intention is that these "intelligent" position numbers are defined during the drawing phase, not afterwards. Also the documentation will be produced with specially customised SGML tools which will guide the authors to write structured technical information based on SGML. The author also has the ability to define interactive links while he/she is writing This is a substantial saving in time and money. The interactive SGML documents and interactive drawings shall provide the user with automatic access to pure context/task related database information, and thereby increase the security and efficiency of the work.

The infrastructure for this concept is built by utilising WEB-technology. All information is stored in one repository, on-shore.

The importing, validation, quality check etc. is done by Norsk Hydros documentation group.

The operation/maintenance personnel at the Nordic Sea, will access the required data (e.g. procedures, database information etc.) through graphics and drawings with hotspots or by using advanced search engines, through an Intranet network utilising WEB technology and a multipurpose SGML viewer on the client side.

Summary

Streamlining and standardisation of the whole business process concerning the production, storage and usage of technical information, is one of the most important issues in this whole project.

This transformation process will not be completed over-night.

One of the problems is, how to convince some of the manufacturers to change their focus from product orientation to information orientation.

These companies are all using standards in their manufacturing. Why ?

Their components/products, (pumps, motors, valves etc.) will often be installed into larger systems. Without using standards the installation, operation and maintenance, it is more or less impossible or at least very difficult, and it would be very hard to sell such products.

The same thing will also happen with the information/documentation. It will also, in many cases, be "installed" in a e.g. operation and maintenance system. Therefore it is most important that the documentation is done according to the accepted standards.

The customer won't seee it satisfactory anymore to receive high quality components and systems with poor quality manuals. Unfortunately this is still the case in many companies.

However, there are also many companies who already have the ability to deliver structured, standardised information. These companies have a very clear business advantage in selling their products in the future.

One example from Oseberg Feltsenter / SDT ball-valves, shows clearly some of the benefits of standardised and structured information:

Previously the information consisted of 2000 documents / corresponding to 6000 A4 pages.
This was reduced to 250 documents without loosing important information.

This could be called "reward for the effort".

To assist the suppliers and manufacturers to make this transformation process as easy as possible NH has developed customised SGML authoring tools and database tools. This "toolbox", including training, will be installed at all NH main suppliers, when requested.

This set of tools will also help the supplier to meet the new information requirements from other customers.

"Working for change is never easy, and creating something new is always more difficult than finding a reason to justify the old."

By Ragnar Hurum, Sector Manager Offshore, Federation of Norwegian Engineering Industries

References

Norsk Hydro HVEDIK System architecture.
HVEDIK - et eksempel på nytenking, Johan Bredrup Dahl, Norsk Hydro

Biography

Mr. Björn Peltonen (B.Sc.) is the VP Sales & Marketing for CITEC Information Technology and also responsible for business development and international projects. Prior to joining CITEC, he has had over 16 years of experience in the computer and networking business, especially concerning CAD and Document Management applications.

TT implements SGML
Johan Lindgren, Tidningarnas Telegrambyrå

The Swedish News Agency, Tidningarnas Telegrambyrå (TT) is basing it’s new text format on SGML. The reasons for this are primarily:

We can markup the material we deliever in a clear way.
Good support for our journalists and editors when they produce news items.
A format that is useful for our customers regardless of what they will use it for.
IPTC (International Press and Telecommunictions Council) chose SGML.

TT has more than 100 journalists and most of them used to work in a terminal based system called ATEX. They now use modern PC’s with windows 95, a mouse and SGML! The distribution of material from TT to it’s several houndred customers is done through satellite.

Early in 1995 we put together a group that should suggest a new text format and a new editorial system. The group was made up of Henrik Stadler, Håkan Swedenborg, Börje Samuelsson, Ingela Ahlberg, and myself, Johan Lindgren. The new text format was to replace the old called IPTC 7901.

It was soon clear that SGML would be the base for the new format. Partly because we had realised the benefits from that and partly because IPTC’s decision to use SGML. The IPTC DTD, called NITF, is very large and provides news agencies and newspapers with a container to exchange news items. We were reluctant to use this since we felt that it made the benefits for ourselves and our customers less than something specialised.

We also had something called IIM (Information Interchange Model) to take into account. The IIM is an IPTC "envelope" standard for transmitting all sorts of news items. Not only text but also sound, pictures etc.

Late 1996 we decided to use a combination of the IIM (SGML-encoded) as a header and our own model for the text content. This was put together and named TTNITF. The header has special fields that the recieving software can use for various actions. There are elements with information to allow the recipients to sort the material and identifiers for additional actions like replace, add etc.

Parallell to developing the new text format we looked for a new editorial system. But when looking for a SGML-based editorial system for 100+ journalists working 24 hours a day year around with a very high demand for easy handling and a quick workflow the possible suppliers are not very many - if any.

We evaluated existing SGML editors. But none of them were felt to be useful. Instead we decided to develop our own SGML editor that would only support one DTD - our own.

I did that and we started using it already in mid 1996. This was possible since I also wrote a filter to make ATEX-files from the internal SGML format.

To replace the old ATEX system we decided on a workflow program called Sysdeco Editorial System (SES). But SES i not SGML based. This is how we run it:

SES is the central program where texts are stored in various "ques". The header information about each text is stored in a SQL database. When a journalist or editor would like to write a new story or to edit an existing one the text is opened in our own SGML-editor. The journalists can also use the editor as a stand alone program when they are working out of TT and then sending the SGML-files to the SES system via modem or GSM-phone.

When a text is ready for transmission it is moved by SES to a special location where unix-routines take over. They read the header information from the database and get the text content from the SGML file. This is assembled to a SGML file according to our DTD and transmitted to our customers.

Biography: Johan Lindgren

I was born in 1956 in Stockholm. Moved to Sundsvall at the age of 7. After mandatory education I went to "gymnasium" and then spent a year in the military. Following that I was accepted to the journalist school in Gothenburg and moved there for a year. As part of the journalist education I worked at a local radio station and was then employed at the Sundsvall-office of the Swedish news agency, Tidningarnas Telegrambyrå (TT).

Since early 1994 I have more and more worked with computer development projects. It started with a project to supply our customers with ready formatted financial material, radio and tv-tables and sports result. That involved setting up an MS-Access database, a FirstClass server and writing programs using Frontier in a MAC. During that work I got in touch with SGML for the first time.

Early in 1995 TT put together a group to plan for a new text format for all news distribution from TT. I was part of that group and we developed a DTD for our new text format and a new editorial system for TT. For this system I developed an SGML editor using the programming tool Delphi.

I am married and have three kids (11-8-6). Other interests include frisbees, skiing and music.

DTD development for the legal domain
Cecilia Magnusson Sjöberg, Stockholms universitet

View presentation

The so called Corpus Legis Project at Stockholm University was commenced in response to certain well established needs for a tool with which one could improve legal document management. The project’s main DTD – Legis.dtd – focuses on documents reflecting the system of lawmaking, e.g. government bills and laws. It enables markup at several levels. Elements used for references both within a document and externally are included. With regard to the logical design in terms of connectors and occurrence indicators Legis.dtd may be characterised as flexible. More precisely, it allows for alternative markup with the purpose of extracting present and future information in an optimal way.

In the process of developing DTDs for the legal domain it has proved meaningful to distinguish between the following three markup levels (a) layout, (b) structure and (c) contents. Layout markup is not of primary interest in a legally oriented DTD, but cannot be completely disregarded in the development of a legal SGML-system. For example, the use of italics in a government bill is the way of indicating changes in a law.

The legal implications of structural markup are mostly document-type-dependent. For example, in a given law the document structure has been defined, in principle, beforehand. This means that although the markup does not necessarily add any new information it represents important information. Structural markup that reflects the legal convention will help to improve information retrieval.

The choice of a method for implementing contents markup is highly application dependent. Several different approaches have been tried in the Corpus Legis project. Two major alternatives are either to create a set of specific legal elements mirroring each particular legal aspect or a general legal element covering all these legal components which are then further classified as attribute values.

It shall be noted that trying to create a complete list of common components for the legal domain in a markup system is not a particularly rewarding task. The task should be application-oriented instead, being dependent on the constant development of legal systems. Some text elements have appeared, however, as being of particular relevance. These may be categorised as: headings, paragraphs, articles, legal concepts (general or topic specific), references and other (e.g. quotations and personal data identifiers, such as personal names). In Legis.dtd these are handled by using structural elements, legal elements, the nameloc function (HyTime) or attribute values.

Concluding remarks

With regard to the introduction of SGML into the legal domain the following characteristics have come to appear as central. First, the fact that SGML is an official international standard for document markup. Secondly, the fact that it has a high level of expressiveness. Thirdly, the inherent idea of media independence is valuable. Last but not least, the SGML standard’s text orientation may be in many ways more suitable for the legal material than, for example, logically based approaches used in the area of AI-based expert systems.

Is SGML the solution to the problem of managing the rapid growth of legal information with increasing transborder data flows? Without sophisticated means for automatic markup and updating of, for example, hypertext links the majority of SGML implementations in the legal domain will remain trivial, especially as regards markup levels. In a broader perspective uncomplicated but efficient large-scale implementations may be what is actually needed most.

SGML will in this context serve as an incentive for keeping documents in order, as well as being a tool for accomplishing more uniform document type structures. Finally, seemingly insignificant markup of, for example, headings has proved to form a basis for deeper understanding of legal documents.

References

Magnusson Sjöberg, Cecilia, Corpus Legis: A Legal Document Management

Project. In: Nordisk årsbok i rättsinformatik (NÅR) 1996 pp 160-190. Telekommunikation – rättsliga aspekter. Red. Martin Brinnen. Stockholm: Norstedts, 1996.

Haider, Georg, Magnusson Sjöberg, Cecilia, Quirchmayr, Gerald, Sebald,

Verena, The Comparative Part of the Corpus Legis Project ñ Using SGML for Intelligent Information Retrieval of Legal Documents. EXPERSYS-96, Artificial Intelligence Applications. J. Zarka, E. Mercier-Laurent, D.L. Crabtree, M. Narasipuram. In: Technology Transfer Series. Series pp.181-186. Editor: A. Niku-Lari.

A project description is found at URL: http://www.juridicum.su.se/iri/corpus.

Biography

Doctor of law (LL.D). Assistant professor at the Swedish Law & Informatics Research Institute, Stockholm University. Project manager of the Corpus Legis project (a computerised text corpus for legal and linguistic studies). Swedish partner in the Telematics Applications Project DAPRO (Data Protection in Europe), Directorate General XIII.

Converting Legacy Data to SGML
Elizabeth Gower, Adobe Systems Europe

Waterview House

1 Roundwood Avenue

Stockley Park

Uxbridge

UB11 1AY

United Kingdom

phone: 44-181-848-6213

fax: 44-181-848-6220

email: [email protected]

Important things to know about converting legacy documents to SGML

Approaches and Options Your industry has an SGML interchange standard, and your company has decided to convert it's old data to SGML. What is involved, what the options, and what kinds of tools and techniques are available?

This presentation covers the different types of legacy formats, available tools, and technical approaches to data coversion. You will learn how to approach, or "size up" a conversion project, and identify the some of pitfalls and potential problems before you start.

Basic Steps in the Legacy Data Conversion Process

Analyze your data to determine complexity, consistency (or inconsistency) of the data and conversion rule requirements.

Build a draft conversion specification.
Determine the conversion tools required based on the conversion requirements.
Prepare an initial project estimate based on the conversion specifications and the available SGML, programming, and authoring resources.
Build a conversion prototype
Perform an initial conversion
Update the project plan and time estimates based on the results of the initial conversion.
Update the conversion specifications and tools based on the output of the initial conversion.
Perform the steps above until a parseable instance results.
Use the parseable SGML instance in an output process; e.g. CD-ROM data preparation, HTML output, paper publication, or other formats.
Examine the quality of the resulting production output and make adjustments to the conversion specs and tools as required.
Repeat the cycle as required.

Analysing Legacy Data

The complexity and duration of the legacy data conversion process will be affected by a number of factors:

Complexity of the output DTD
Amount and consistency of proprietary markup in the input files (e.g. word processing format codes)
Structure and consistency of the input legacy data
Amount of attribute data to be derived or computed from the input
The number, complexity, variability and structure of input tables
The number of cross references and the variability of the cross reference formats
The number of Graphics for which entities must be generated, resolved, and tracked
The number of data items which require generation, resolution, and tracking of IDs and IDREFs
The number of production outputs supported: e.g. CD-ROM, HTML, PDF, PostScript, Paper (each output may require some special markup in order to accomplish the composition and formatting of an output)

This is by no means a comprehensive list, but consideration of the above issues will help you in developing a more realistic understanding of the amount of work involved in a conversion project.

Determining the Difficulty of Data Transformation

You will notice that two words are used often in the above list:

Complexity
and
Consistency

The complexity of the legacy data will require you to develop a more detailed conversion specification, and implement more transformation rules to convert the data. The difficulty of the conversion itself is also significantly impacted by inconsistent or missing data. More detailed conversion rules and exception handling procedures will be required to handle structurally inconsistent text or documents that are missing content required by the DTD.

At worst, very old documents with lots of missing structure or content, re-authoring may be the only cost-effective option.

Guidelines for Developing Conversion Specifications

The starting point for your conversion specification is your DTD, to which your SGML output will conform. Industry interchange DTDs are a frequent starting point for an internal, or authoring DTD to which technical authors will publish.

In the past, I have created conversion specifications by organizing the elements and attributes in my DTD in a table, documenting the required output (marked up) and the pattern recognition rules required to locate the input data for markup. Here is very simple example:

Simple Conversion Specification Example:
Recognition Rule in
SGML Element Input Data Example Output Markup

<!ELEMENT NOTE - - #PCDATA> Look for ‘NOTE:’ <NOTE>note</NOTE>

<!ELEMENT WARNING - - #PCDATA> Look for ‘WARNING:’ <WARNING>warning </WARNING>

The objective of a conversion specification is to tell the developer of the conversion tools how to recognize the appropriate text in the input document, and then how to apply the markup according to the rules of SGML and the DTD.

I usually include samples from the input data as well as the example SGML markup sample, because the developer will be able to visually see how the data transformation will take place, and have an easier time interpreting the pattern recognition rule for the input data.

Looking at the ‘NOTE’ example above, I am directly the developer to write a script or program that looks for all occurrences of the string ‘NOTE:’, which is all in capitals and is immediately followed by a colon. The developer can see from the example that all the text following the ‘NOTE:’string should be tagged with NOTE tags. Of course, this implies that the developer also knows where to put the end tag, so either the specification writer or the developer must define rules for correct insertion of the end tag.

This is a very elementary example, but it does give you the flavour for the type of conversion documentation required.

Tools and Techniques for Data Conversion

The following section is a brief discussion of different tools and techniques that can be used to convert legacy data to SGML. The first two techniques, Manual Conversion and Editor-Assisted Conversion, are not recommended for the typical Engineering technical publications environment.

Manual Conversion
As mentioned above, manual application of tags to the input data is not to be considered in a high volume, high complexity engineering technical publications environment with a high degree of process automation.
Editor-assisted Conversion
Editor-Assisted Conversion uses an SGML-aware editor with a parser to assist in the identification of the input document structure and content for the application of markup. This is only one step up from Manual Conversion, and is not to be seriously considered in a complex tech pubs environment with large, complex documents.
Programmatic Conversion
Programmatic Conversion is the best approach to use in high volume, high complexity conversions. Conversion programs may be written using languages that fall into roughly three categories:
Conventional Programming Languages, such as ‘C’ or ‘C++’
UNIX Shell Script languages, such as PERL, AWK and SED
Proprietary Scripting/Programming Languages such as OmniMark, Balise, and Avalanche FastTag & SGML Hammer.

Each category of conversion language has it’s place in an SGML conversion project. I have typically developed tools using all three categories of languages for a single document.

The presentation will cover which tools and approaches are best for different kinds of data.

Bibliography

Liz Gower has been involved with engineering technical publications systems and tools since 1989. Her first involvement with SGML came in 1991, when she joined a tech pubs project at Boeing Customer Services. She was a member of the ATA/AIA Text Working Group, and contributed to the SGML interchange portions of ATA Specification 2100.

Ms. Gower was with Frame Technology when it was acquired by Adobe Systems in 1995. She is currently Business Development Manager for Frame products at Adobe Systems Europe.

Making SGML Easier with Microdocument Databases
Benoit De La Selle, OmniMark Technologies

View presentation

Manager, European Operations

OmniMark Technologies

3 bis rue du Petit Robinson

78350 Jouy en Josas

FRANCE

Tel +33 1 3070 6200

Fax +33 1 3070 6566

Email [email protected]

Introduction

The abilities to deliver vast amounts of corporate information on-line in real time, with sophisticated hypertext navigation aids, and the accelerating system complexity of products and corporate processes have converged to drive a new paradigm: component based documentation development. The Microdocument architecture -MDOC - is a vendor-independent hybrid of SGML and RDBMS methodologies that enables the delivery of personalized virtual documents. Illustrations of successful virtual document implementations, and overview of business and project leader implementation issues will be provided.

Abstract

The internet is causing a shift from book-like information structures to component-based models, allowing personalized information delivery. SGML and RDBMS methodologies both have strengths to contribute in a component- & transaction-oriented publishing paradigm. But taken independently, both SGML and RDBMS methodologies have limitations which are reached with large, complex systems.

The Microdocument architecture - MDOC - is a vendor-independent hybrid of SGML and RDBMS methodologies. Narrative text is organized into independent information units called microdocuments; related data objects and dependencies between microdocuments are expressed in the RDBMS schema. The microdocument architecture is a conceptual model for a system that can deliver user-independent virtual documents.

Virtual documents mean personalized information delivery. Online newspapers are at the crest of the wave, but technical and administrative publications will soon follow. We will examine the online virtual document model, then explore how virtual documents can benefit customers in the delivery of corporate information. This talk will focus on the high-level implementation plan to fit with a global business information strategy.

The Microdocument architecture concept was fist introduced at SGML Europe '96 in Munich. Since that time, several successful commercial implementations have occured, including the Wall Street Journal's impressive "Interactive Edition" (http://www.wsj.com).

Biography

Benoit de La Selle is Manager, European Operations at OmniMark Technologies Europe. Benoit is a graduate in economics from the University of Orleans, France and a graduate of Business Administration from the Ecole d'Administration des Affaires, Paris, France. Since 1987 he has been actively involved in sales with Data General, Site and started up Exoterica Europe in 1993. Benoit has 5 years experience in electronic publishing, complemented by an additional 4 years in the computer industry.

Intranet – Getting the Most from SGML
Simon Nicholson, Chrystal Software

View presentation

HTML, Netscape, HTTP, URLs - during the last 3 years we have all learnt a new language, as the internet becomes a dominant player in the world of information delivery. With this comes a change in the way in which people think about, create and publish information. The Web has prommised benefits of instant information access, a common platform and updatability, and is seen by many as putting the information producers and users in direct contact. Organisations are now turning their attention to how the principles and benefits of the internet can be exploited internally using an intranet. Key to this is the continued use of the tools and technologies, but what of the information itself? Can HTML meet the requirements of the editors, managers and users of complex, mission critical information?

Many leading organisations have adopted SGML for the creation and management of complex source information. Such implementations address the needs of the core documentation group for full revision history, management of relationships and links, and effective and efficient reuse. Often the information stored in such management systems requires regular edit and review input from other internal and external resources, such as engineering, marketing and translators, to achieve its business goals. Further, the core information is of significance to other parts of the organisation for reuse in other applications and publications. The challenge is to enable information access and delivery directly without requiring transformation into HTML.

Getting the most out of that SGML source means exploiting your investment by using the SGML source for intranet delivery, and effectively combining that source with other HTML, SGML, document management and intranet technologies.

This presentation:

describes the need and business case for intranets
identifies how to exploit and apply SGML to the intranet
differentiates between HTML and SGML based information delivery
lists key capabilities and users of such a system

The presentation, aimed at a managerial audience, examines the aspects and impacts of several real-world intranet applications. Relevant technologies are described, as well as how the current investment, in technology and people, can be utilised to drive this type of information delivery.

Biography

Simon Nicholson is the Business Development Manager for Northern Europe with Chrystal Software Inc. Simon has been at XSoft and the Astoria Document Management System since its inception. Prior to this he was with Rank Xerox in a number of capacities including four years in sales training and two years in business consultancy. In total, Simon has more than twelve years expertise within the document production arena.

Currently Simon is responsible for the leading edge SGML database, Astoria, and is considered a centre of expertise within Chrystal Software in Europe for SGML related topics. Simon also represents his company on the SGML Open Group on Europe.

Kostnadseffektivt införande av SGML på bred front
Poul Kongstad, Enator Information Management

Cost-effective ways to wide-spread SGML deployment

Med nya typer av billiga SGML-verktyg kan man föra SGML betydligt längre ut i organisationerna. SGML kan då att "koppla ihop" flera nivåer inom ett företag, exempelvis konstruktion och dokumentation. SGML kan i många fall "läggas till" i miljöer där man standardiserat ordbehandlare som Word och WordPerfect. Man får då möjlighet att växa in i SGML med redan kända verktyg.

"Mainstream SGML" representerar ett synsätt där man erkänner behov av enkelhet och uppnår fördelar med kontroll över innehållet iSGML-form, såsom flexibel publicering på olika medier.

Skribenterna kan använda de nya SGML-editorerna efter en kort introduktion eftersom man redan kan sin ordbehandlare. Några olika användningsmöjligheter presenteras.

De nya SGML-editorerna har kontinuerlig kontroll av strukturen i dokumentet och vägleder författaren interaktivt med vilka SGML-element som är tillåtna på varje ställe i dokumentet.

Utvecklingsinsatsen består bland annat i att koppla ihop DTD och format, och att anpassa användargränssnittet i vissa fall. Befintliga formatmallar kan i rätt stor utsträckning användas som underlag.

De här SGML-editorerna är ett nytt inslag och industrin har visat intresse för dem. Införande av nya verktyg sker gärna gradvis. Några steg på vägen mot en ny SGML-miljö berörs.

Biografi

Poul Kongstad (M.Sc; [email protected]) har sedan 1989 arbetat med SGML/CALS-strategier och -verktyg, informationshanteringsfrågor, användargränssnitt och projektledning inom Enator, och har tidigare arbetat med industriell styrning samt systemutvecklingsledning.

XML – Extensible Markup Language
Hasse Haitto, Synex Information AB

XML is an effort to provide (essentially) a subset of SGML primarily for WWW delivery. It is developed by an SGML Editorial Review Board under the auspices of the World Wide Web Consortium (W3C).

The design of XML addresses both the limitations of HTML and simplifying the use of SGML. For instance, XML documents need not necessarily have a DTD. Because SGML tools can quickly be modified to support XML, one can expect a rapid deployment of software that supports XML during 1997.

Simplifying XML from an SGML superset has mainly affected esoteric and little used features of the SGML standard. On the whole, XML should come very close to actual SGML practices. As XML is still being designed, this talk will be a report from the front.

Biography

Hasse Haitto, President of Synex Information AB (http://www.synex.se), holds a M.Sc. in Engineering Physics from the Royal Institute of Technology, Stockholm. Hasse co-founded Synex Information in 1993, a company that has achieved world-wide recognition for its browser technology, especially the SGML/HyTime engine Synex ViewPort. Synex Information is a sponsor member of SGML Open.

Building an SGML system
Steve Pepper, Falch Infotek

SGML Architect

Falch Infotek a.s

Postboks 130 Kalbakken

N-0902 Oslo

Norway

phone: +47-22902733

fax: +47-22902599

email: [email protected]

Using the Whirlwind Guide to SGML Tools (see conference handouts) as a point of reference, this presentation seeks to provide a brief overview of the tools and technologies currently available for building SGML systems.

Biography: Steve Pepper

Steve Pepper is the Chief SGML Architect with Falch Infotek as, an Oslo-based company specialising in SGML-based information re-engineering and electronic publishing.

Originally trained as a typographer, Steve has been working with SGML since 1988 and since 1990 has been responsible for all Falch's SGML applications, including major projects in industry and government. He represents Norway in WG8, the ISO committee responsible for developing SGML and related standards, and is a regular speaker at conferences and seminars devoted to SGML.

An inveterate "tools junky", he is the author and maintainer of the popular Whirlwind Guide to SGML Tools and Vendors, which is freely available via the World Wide Web at http://www.falch.no/people/pepper/sgmltool/.

Vad händer med SGML-familjen?
Per-Åke Ling, Ericsson Utveckling AB

View presentation

This presentation presents the current status of the standards that are directly or indirectly related to SGML. The objective is to give a quick overview on the general status of e.g. DSSSL and HyTime and some information on what to expect of the upcoming review of SGML.

Biografi

Per-Åke Ling har sedan 1990 varit delaktig i den tekniska utvecklingen av Ericssons SGML satsning, innan dess arbetade han bl a med kunddokumentation och är därmed kunnig inom storskalig dokumenthantering. Han har arbetat främst med metodik samt varit teknisk ansvarig för de verktyg som utvecklats för att ska skapa SGML dokument.

SGML Open Activities
Hasse Haitto, Synex Information AB

SGML Open is a consortium with over 75 members; the organization is dedicated to spreading the use of SGML. As part of this effort, its members strive to harmonize SGML implementations so that applications can co-exist and documents become more portable if they follow guidelines developed by SGML Open. Some of these technical resolutions and activities will be covered in this talk, such as the entity catalog, the CALS table exchange model, and the SGML fragment specification.

Biography

Publicera på Internet med hjälp av SGML och databaser
Fredrik Oskarsson, Enator Information Management

View presentation

Ljungadalsgatan 2

350 81 Växjö

Sverige

email: [email protected]

Abstract

För att uppnå ett effektivt användande och en effektiv distribution av information behövs rätt förutsättningar. Ett effektivt medium att sprida information är via Internet och Intranet och för

att detta ska ske på ett effektivt sätt behövs rätt metoder. En metod/teknik som med fördel kan användas är att utnyttja SGML och databaser för lagring och uppdatering av informationen. Detta föredrag tar upp fördelar, teknik och produkter, för en sådan lösning.

Innehåll

Internet används till stor del för att publicera information på "World Wide Web". Information är en färskvara och genomgår ständigt förändringar. För att kunna hålla sin information "levande" på Internet krävs verktyg och metoder som gör det möjligt att snabbt och enkelt uppdatera informationen. Eftersom väldigt mycket data idag lagras i databassystem av olika slag är det viktigt att kunna presentera den på olika media t.ex. Internet.

SGML som ofta används som kodningsspråk för dokument har en inbyggd struktur som mycket väl kan integreras med en databas. Meningen är att man kan kombinera de fördelar som finns med databashantering med de fördelar som finns med SGML.

En databas möjliggör lagring och hantering av stora datamängder som är snabbt åtkomliga via Internet och som kan plockas ihop efter användarens sökkriterier. De flesta databaser idag har ett gränssnitt mot Internet vilket gör att man kan hämta information direkt från databasen och göra HTML-sidor. En ren databasapplikation är däremot känslig för förändringar i strukturen/relationerna mellan informationselementen. SGML är mindre känslig för strukturförändringar.

Fördelen med SGML är att informationen har en klar struktur som är sökbar. HTML ger inte denna möjlighet. Att enbart konvertera ett dokument till HTML och publicera ger inte denna möjlighet till "intelligent" sökning. SGML är mer nyansrikt än en ren databas och ger större möjligheter för att använda informationen på olika sätt, där WWW är ett av många publiseringsformat. En annan stor fördel med att lagra informationen i SGML är att det är en standard. Informationen lever ofta längre än vad en viss databas gör. Därför är det fördelaktigt att kunna lagra informationen i ett applikationsoberoende format.

En kombination av SGML och databaser gör det möjligt att kombinera information som traditionellt lagras i en databas med information som traditionellt presenteras som dokument. Eftersom all informationen lagras i databaser behöver inte användaren se varifrån informationen kommer.

Presentationen beskriver:

Fördelar med att använda SGML och databaser vid Internet publicering.
Olika sätt att lagra SGML i databaser för utnyttjande i internetsammanhang
Tekniker och verktyg att använda
Hur underhålla informationen
Framtiden

Biography: Fredrik Oskarsson

Fredrik Oskarsson arbetar som SGML-konsult på Enator Information Management. Fredrik har arbetat som projektledare med SGML-tillämpningar och databaser inom företaget i 3 år. Fredrik har arbetat i SGML-projekt med projektledning, analys, design och implementeringar åt bland annat Hägglunds Vehicle, SAS, Ericsson, FMV och Atlas Copco.

Innan Fredrik började på Enator läste han till systemvetare på Lunds Universitet. Huvudinriktning på utbildningen var Informatik, nätverk och databaser.

Fredrik är ordförande i SGML-användarförenings arbetsgrupp "SGML och databaser".

Multiagency Electronic Regulatory Submission – use of SGML in New Drug Applications
Bo Lovén and Christian Wallgren, PharmaSoft AB

The pharmaceutical industry and regulatory authorities have finally realized the productivity and cost savings potential of SGML when applied to New Drug Applications. The potential benefits of SGML for the industry justifies the cost of the entry.

In the global production and regulation of new pharmaceutical products millions of pages of paper based information are exchanged and transferred by common mail, public courier and hand delivery. As the drug review and drug development life cycles are re-engineered and converted to an electronic environment, the anachronistic methods of paper creation and transfer create significant additional costs and inefficiencies above those already associated with the production and transfer of paper. The present escalating costs of producing, transferring and managing paper based information databases for both regulators and industry are profound.

Well-chosen standards for electronic transmission of regulatory information will bring significant benefits to both regulators and the pharmaceutical industry. De jure standards are preferred. The solution must allow for independent system implementation at any given pharmaceutical company and any given drug regulatory authority.

The standard needs to support a direct flow of information (knowledge) during the creation and compilation processes at pharmaceutical companies. The submission should be possible to create without converting the documents included. The received submission at the regulatory authority must be possible to decompose according to the specific authority’s business rules.

With such a solution the compiling of a submission can be part of the business processes at a pharmaceutical company where the pharmaceutical company is provided full integrity. The same is applicable for a drug regulatory authority. The possibility to decompose the submission allows a given authority to comply with the operational processes including workflow.

The MERS(Multiagency Electronic Regulatory Submission) Working Group aims to prototype and demonstrate the submission, review, and management of a cross platform, easily archived and content accessible regulatory submission using structured document (ISO and ANSI) standards. The MERS Working Group is composed of representatives of the regulatory agencies from the Drugs Directorate of Health Canada (HPB), the Food and Drug Administration (FDA) of the USA, the Therapeutic Goods Administration (TGA) of Australia, the Medical Products Agency (MPA) of Sweden and the Medicines Evaluation Board (MEB) of the Netherlands. HPB provides project management.

Cost savings and increased security directly associated with the replacement of the current paper standard with electronic standards are immediate and significant. Effective standards for the electronic transfer of regulatory information implemented globally through international harmonization processes such as the ICH M2 EWG guarantee optimization of these benefits.

The ultimate bottom line is safe and effective products delivered to the market place with improved timeliness and profitability. These efficiencies can be achieved through thoughtful process re-engineering and the implementation of the supporting standards for effective communications and the transfer of information.

Biography: Bo Lovén

Bo Lovén is Senior Consultant at PharmaSoft. Being in the IT-business since 1964 he has spent the last 18 years with Office Automation, Document Management and Retrieval. During the last three years he has been project manager and responsible for activities within the pharmaceutical industry and the agencies concerning regulatory information.

Biography: Christian Wallgren

Systems Developer. Mr. Wallgren was in 1995 awarded by the Swedish Association of IT Managers ("Hultmanska stipendiet") in recognition of his work with the software product SGML Companion®. He now holds the position as SGML Specialist.

En arbetsstation för översättaren
Svante Kleist, Texcel Svenska AB

Texcel Svenska AB

Storsätragränd 12

127 39 SKÄRHOLMEN

email: [email protected]

Abstract

För hantering av teknisk dokumentation gäller typiskt:

Dokumenten revideras ofta
Dokumenten översätts vanligen till flera språk

När en ny version av ett dokument ska översättas är det viktigt att ledtider och kostnader minimeras genom att man inte ånyo översätter de delar av dokumentet som är oförändrade sedan den tidigare versionen.

Ofta hanteras detta på så vis, att den nya versionen jämförs med den gamla medelst okulär besikting (den s.k. "stare-and-compare" metoden). Den nya versionen förses härvid med ändringsmarkeringar i form av vertikala streck i marginalen.

Denna metod är dock behäftad med följande problem:

Det tar lång tid
Det är tråkigt
Det blir ofta fel

Om den tekniska dokumentationen är skriven i SGML-format, finns mycket goda förutsättningar för att dels automatisera denna jämförelse av ny version med gammal, dels förse översättaren med en specialutvecklad "arbetsstation" som underlättar hans arbete.

Vi beskriver i vår presentation hur vi utvecklat en prototyp till sådan arbetsstation baserad på "ArborText Adept*Publisher" och "Texcel Information Manager" ("IM"). Den använder IM-modulen "SGMLdiff" för att jämföra ny version med gammal. Därefter framställs automatiskt det dokument som utgör grunden för översättarens arbete och presenteras i editor-fönstret i Adept*Publisher. Detta dokument består dels av de oförändrade delarna av dokumentet (på mål-språket), dels av modifierade/tillkomna delar (på original-språket, d.v.s. det som ska översättas).

Vi redogör också för pågående utvecklingsarbete för integrering av denna arbetsstation med produkter för "machine translation" och "translation memory".

Slutligen berör vi intressanta problem som med nödvändighet uppstår, såsom sådana orsakade av skillnader i semantisk struktur mellan europeiska och asiatiska språk.

Biography: Svante Kleist

Systemvetarlinjen Stockholms Universitet 1988-91
Ericsson Telecom 1991-94
(databasadministratör, datamodellering, utredningsarbete)
L M Ericsson Data 1994-96
(konsult, utredare, systemutveckling: integration Adept*Publisher / PDM-systemet "Metaphase 2")
Texcel Svenska AB 1996-
(teknikkonsult)

Regelverk i SGML-format
Richard Nilsson, Saab

View presentation

Vid Saab i Linköping pågår konvertering av regelverksdokumentation till SGML-format samt utveckling av ett Informationssystem för elektronisk distribution av regelverksdokument till samtliga medarbetare.

Redovisning av projektet från uppstart till dagens status. Redovisningen är inte teknisk redovisning av vårt informationssystem utan en presentation som vänder sig till "normalkonsumenten" dvs användare med liknande behov.

Biografi

Anställd vid Saab Linköping i ca 20 år. Har i huvudsak arbetat med kvalitetssystem och kvalitetssystem frågor. Har varit projektledare för Regelverksprojektet sedan projektet startade i dec1992.

Automatkonfigurering av anläggningsdokumentation
Stefan Aldborg, ABB Industrial Systems
Anders Kjellberg, Convertum

View presentation

E-mail: [email protected]
[email protected]

Bakgrund

ABBs affärsområde "Automation and Drives" (IAD) arbetar i mer än 30 länder. Antalet anställda är mer än 15000, vilket gör IAD till ABBs största affärsområde. Verksamheten består av produkter, system och anläggningar samt service. Många av affärsprocesserna inom affärsområdet går över organisations och landsgränser.

"Reengineering" av kritiska processer pågår inom hela IAD. Som en del av detta förändringsarbete har ett antal nyckelförändringar definierats. Exempel på dokumentrelaterade områden där dessa nyckelförändringar ligger är informationsdelning, återanvändning och automatisering.

Automatgenerering av dokumentation

En nyckelfaktor som berör både återanvändning och automatisering är automatgenerering av dokumentation.

Ett exempel på dokumentslag där man vill automatisera framtagningen så mycket som möjligt är offerten. Här finns olika typer av lösningar i drift redan idag. Dessa utnyttjar exempelvis teknik från Adobe Acrobat och format som PDF.

Lösningar av denna typ tar oss en bit på vägen, men fungerar inte fullt som organisationen vill. Detta beror bland annat på att lösningen låses vid en viss layout. Ett annat problem är att dokumenthanteringssystemen hanterar dokument som minsta enhet. Man kan säga att dagens lösningar hanterar dokument på produkt- eller systemnivå och kopplingen till produktens eller systemets delar har gått förlorad. Detta gör att återanvändning och enhetlig hantering av dokumentdelar är svårt eller rent av omöjligt.

SGML som möjliggörare

Ett pågående arbete inom IAD, sedan 1994, försöker ta steget mot bättre stöd för automatgenerering och automatkonfigurering av anläggningsdokumentation. Inom detta arbete kommer SGML in som en viktig byggsten.

Några viktiga egenskaper hos SGML som dokumentgenereringssystemet drar nytta av är: ·

SGML frikopplar ett stycke information (text) från hur det kommer att användas i ett publicerat dokument. Samma text kan exempelvis återanvändas i helt skilda dokument oberoende av layoutmässiga begränsningar så som rätt rubriknivå och liknande. ·
SGML möjliggör att från samma källmaterial publicera till både pappersdokument och web (HTML) sidor. · Att sätta samman dokument från mindre delar möjliggör att koppla dessa delar närmare enskilda komponenter i en produktstruktur.

SGML används inte spritt inom organisationen idag, och är heller inte någon uttalad strategi från affärsområdet. Vi kommer under detta föredrag att beskriva en del av arbetet som bedrivs inom detta område och hur SGML kan komma in som en viktig möjliggörare även där organisationen i övrigt inte har som strategi att använda tekniken idag.

Biografi: Stefan Aldborg

Stefan Aldborg är utvecklingsansvarig inom ABBs affärsområde Automation and Drives för ingenjörsverktyg avsedda att stödja anläggningskonstruktion. Han har under lång tid varit involverad och drivande inom nyutveckling av stödsystem för anläggningskonstruktion inom ABB.

Biografi: Anders Kjellberg

Anders Kjellberg har sedan 1991 arbetat med IT-stöd för dokument- och produktdatahantering inom ABB. Sedan 1996 driver Anders, tillsammans med en kollega, konsultföretaget Convertum som specialiserar sig på informationslösningar runt dokument och konfigurationshantering.