[Cache from http://helmer.hit.uib.no/claus/mecs/mecs.htm; please use this canonical URL/source if possible.]


MECS - A Multi-Element Code System
by
Claus Huitfeldt
forthcoming in
   Working Papers from the Wittgenstein Archives at the University of Bergen, No 3

   ISBN 82-91071-02-0
   ISSN 0803-3137
   Copyright: Claus Huitfeldt

First version: 1992.
This version: October 1998


CONTENTS

0 Preface

1 Introduction: MECS and SGML
1.1 Background
1.2 SGML
1.3 MECS Syntax
1.4 MECS Program Package
1.5 Conclusion

2 PART I MECS - A Multi-Element Code System Version 2.00, August 1993
2.1 Summary
2.2 Basic Code Syntax, Code Systems and Documents
2.3 Codes, Tags and Elements
2.4 Code Types
2.4.1 No-element Codes
2.4.2 One-element Codes
2.4.3 Poly-element Codes
2.4.4 N-element Codes
2.4.5 Character Representation Codes
2.4.6 Character Disambiguation Codes
2.4.7 MECS Comments
2.5 Generic identifiers and attribute strings
2.6 Markup Reduction
2.7 Document Structure
2.8 Classification of MECS systems
2.9 Character sets
2.9.1 Delimiters
2.9.1.1 Code delimiters
2.9.1.2 String delimiter
2.9.2 Nil character
2.9.3 Free characters
2.9.4 Tag characters
2.9.5 Default character sets
2.9.6 Examples of alternative character sets
2.10 Code Declaration Table (CDT)
2.11 Deducing a Minimal CDT from an Encoded Document
2.12 SGML Compatibility
2.12.1 Some general observations
2.12.2 From SGML to MECS
2.12.3 From MECS to SGML
2.13 Revision History
2.14 Plans for MECS Version 3

3 PART II MECS Program Package Version 2, August 1994 2 User Guide
3.1 Installation and System Requirements
3.2 A Note to SGML Users
3.3 Creating and Validating Documents and CDTs
3.4 Formatting Documents
3.5 Reformatting Documents
3.6 Analyzing Documents
3.6.1 Code Status Report
3.6.2 Document Structure and Overlapping Elements
3.6.3 Breakpoints and Recursion
3.6.4 Betatexts (Substitutions)
3.6.5 Spell Checking
3.6.6 Frequency Word Lists and Simple Statistical Analyses
3.6.7 Extracting Elements
3.7 Processing SGML Documents in MECS
3.7.1 Validating SGML documents for MECS Conformance
3.7.2 Converting SGML files to MECS
3.7.3 Converting MECS documents to SGML
3.8 Project management

4 Reference Guide
4.1 General Features and Command Line Parameters
4.2 MECSVAL
4.2.1 Interactive Mode
4.2.2 Command Line Parameters
4.2.3 Examples
4.2.4 MECSVAL Editor Commands
4.3 MECSFORM
4.4 MECSLYSE
4.5 MECSGRAB
4.6 MECSPRES
4.6.1 Profile Definition Table (PDT)
4.6.1.1 Overall Structure
4.6.1.2 Code Declarations
4.6.1.3 Position
4.6.1.4 Mode
4.6.1.5 MarkIn, MarkDel and MarkOut
4.6.1.6 NoteNumber and NoteType
4.6.2 Declaration of Codes of Different Types
4.6.2.1 No-element Codes
4.6.2.2 One-element Codes
4.6.2.3 Multi-element Codes
4.6.2.4 Character Codes
4.6.3 Layout and Format
4.6.3.1 Layout
4.6.3.2 Format
4.6.4 Command Line Parameters
4.6.5 Examples
4.7 MECSBETA
4.8 BETATXT
4.9 MECSSPEL
4.10 ALPHATXT
4.10.1 Command Line Parameters
4.10.2 Defining an Alphabetic Sort Order
4.10.3 Frequency Word Lists and Simple Statistical Analyses
4.10.4 Spell Checking
4.10.5 Working with Marked-up Documents
4.11 MECSSGML
4.12 SGMLVAL
4.13 SGMLMECS

Appendix A About the MECS Program Package

Appendix A: About the MECS Program Package

Appendix B: MECSPRES PDT Declaration Parameters

Appendix C: MECSPRES Predefined Layouts, Formats and Styles

Appendix D: MECSPRES User-Defined Layouts, Formats and Styles

Appendix E: Known Bugs

References


PREFACE

The subject matter of this document is text encoding. It presents what I have called the Multi-Element Code System, MECS.
   Today, text encoding is more or less synonymous with SGML (Standard Generalized Markup Language). Chapter 1 is an introduction summarising the rest of the document by way of comparing MECS to SGML. 1
   Chapter 2 provides a full description of MECS. It may be read independently of the rest of the document.
   Chapter 3 is a user guide and Chapter 4 a technical documentation of the MECS Program Package, a program package for the validation, manipulation and analysis of MECS documents.

This is a working paper in the full sense of the term, i.e. a report on work in progress. I have wanted to publish it for a long time, but a new and better version of MECS or the MECS Program Package has always seemed to be around the next corner. 2 Planned changes to MECS are described in 2.14 .

Readers who intend to use this document primarily as a practical guide to the MECS Program Package are advised to start with the Summary in 2.1 , and then proceed directly to the User Guide in Chapter 3 .
   The rest of Chapter 2 provides reference material and information of relevance to readers interested in technical aspects of the MECS syntax, e.g. with a view to redefining the delimiter set or to finding out whether a given markup syntax is MECS-conforming.
   Readers who intend to use the MECS Program Package for processing of SGML documents are strongly recommended to read the following sections carefully: 2.12, 3.2, 3.7 and 4.11-13.
   No detailed knowledge of any particular text encoding system is required. But it is presupposed that readers have some acquaintance with text encoding, or are familiar at least with the rationale behind text encoding systems in general. 3

What is published here is for the most part a result of work in the years from 1985 to 1987 for the Norwegian Wittgenstein Project and, later (since 1990), for the Wittgenstein Archives at the University of Bergen. I thank these projects for having given me the opportunity to pursue my work on text encoding, and I thank the Norwegian Research Council for Science and the Humanities for having permitted me to spend part of my time as Research Fellow in philosophy on this work.
   MECS started as an attempt to revise the code system of the Norwegian Wittgenstein Project, "CosyTrawma", which was originally developed by Associate Professor AsbjÁrn Brændeland (Huitfeldt and Rossvær 1989, pp. 177-200). Brændeland, in turn, had drawn on work done by the Tübingen Project in Germany.
   The result of the revision turned out to be an entirely different system, the earliest drafts of which were presented in Huitfeldt and Rossvær 1989, pp. 51-54 and 201-236, and in a number of unpublished working papers since 1989. I am particularly indebted to Senior Executive Officer Àystein Reigem at the Norwegian Computing Centre for the Humanities for comments and criticism of these drafts, as well as to Professor Stig Johansson at the University of Oslo, who gave many helpful comments and suggestions.

Much work is currently under way in text encoding. The most important contribution in the Humanities is arguably that of the Text Encoding Initiative (TEI). This major international cooperation project has set a standard for text encoding in the Humanities for a long time to come.
   My own participation in TEI has provided me with many opportunities to learn from discussions with colleagues. In particular, discussions with Lou Burnard, Michael Sperberg-McQueen, Allen Renear and Peter Robinson have been a recurrent source of inspiration.
   I am also indebted to my colleagues at the Wittgenstein Archives for criticism, help and encouragement. In the later years, Peter Cripps has been a particularly rich source of constructive criticism and inspiring enthusiasm.

I hereby thank the above-mentioned persons and institutions for their help and assistance, criticism and comments. Remaining errors and deficiencies are entirely my responsibility.

Bergen, September 1998
 

   Claus Huitfeldt


1 INTRODUCTION: MECS AND SGML

The Norwegian Wittgenstein Project (NWP), which started in 1980, aimed at producing a machine-readable version of Ludwig Wittgenstein's Nachlass. Like many similar projects at the time, the NWP developed its own markup system. And like most projects that did so, the NWP enjoyed not only the many advantages of explicitly marked up texts, but also the severe disadvantage of having to develop its own specialized software, even for trivial, non-specialized tasks.

The NWP was discontinued in 1987, and during the preparation for its continuation, which was later (1990) to become the Wittgenstein Archives at the University of Bergen, I set out to improve the markup system. It had turned out that the system suffered from certain deficiencies. Any revision of the markup system necessitated adjustment of the software, which had in the course of several years of ad hoc revisions grown quite complicated. The pause in the project activities was therefore well spent looking for a more viable and flexible solution.

1.1 Why not SGML?

Standard Generalized Markup Language (SGML) was adopted as an international standard for text encoding by International Organization for Standardization (ISO) in 1986, so at that time (i.e. 1988-1989), SGML was the most natural candidate for consideration. However, despite its many strengths and potential advantages, I found SGML unsuited to our needs. Among the reasons were:

My conclusion therefore was that I had to develop a different system altogether for the Wittgenstein Archives, a system which had to be considerably less demanding concerning software development, to answer the specific needs of our project, and yet be general and flexible enough to allow for extensive revision of the registration system during the course of future work without necessitating revision of application software.

At roughly the same time the Text Encoding Initiative (TEI) had just started (1987). The TEI based itself on SGML. Many of the issues TEI was expected to address were relevant to the problems listed above. Although we could not wait for TEI to be completed, it was therefore also an obvious consideration for my development work to keep as close to SGML as possible.

1.2 MECS Syntax

Consequently, MECS is in many respects similar to SGML. Like SGML, MECS is not itself a markup scheme, but a set of rules for the design of markup schemes. MECS may be accommodated to conform to SGML's reference concrete syntax. SGML documents are MECS-conforming, provided that they do not make use of markup reduction or minimization.

MECS markup schemes may be declared in separate "document definitions", similar to the SGML DTDs. Because they lack most of the expressive power of SGML's DTDs, I have chosen a different term: Where SGML speaks of Document Type Definitions (DTDs), MECS speaks of Code Declaration Tables (CDTs). Basically, a CDT is a declaration listing delimiters, other characters sets, and codes (tags for elements and entities) to be used in a document. MECS documents may be validated for conformance with a particular CDT. But unlike SGML, no CDT is required in MECS (cf. below).

MECS includes equivalents to SGML's elements and internal entities. In addition, MECS includes syntactical means for the representation of structures which in SGML are treated in a different way. There are seven syntactically distinct types of codes (examples are given in MECS's default character set):

 

No-element codes:                 <tag> 

One-element codes:                <tag/   ... /tag> 

Poly-element codes:               [tag/2| ... /tag| ... /tag] 

N-element codes:                  [tag/2\ ... /tag| ... /tag] 

Character representation codes:   {tag} 

Character disambiguation codes:   {...\tag} 

Comments:                         <| ... |> 

All delimiters may be redefined, and tags may be reduced or minimized (though not omitted) according to specific rules.

No-element codes correspond to SGML's empty elements, and mark points within the text. One-element codes correspond to ordinary SGML elements, and mark spans of text.

Multi-element codes, i.e. poly-element and N-element codes, have no obvious parallel in SGML. Poly-element codes mark two or more consecutive spans of text (typically indicating that they stand in a specific relationship to each other, e.g. that of substitution or counterposition). N-element codes are similar to poly-element codes. But whereas the number of spans (elements) marked by a poly-element code may vary from token to token, the number of elements in an N-element code is fixed.

Character representation codes correspond roughly to SGML's internal entities. Character disambiguation codes, which have no direct equivalent in SGML, are used in conjunction with character representation codes, typically to disambiguate homographic graphemes (e.g. characters which in one context may be punctuation marks, in another context logical operators).

In MECS (just as in SGML) parts of a document which should be ignored by the parser are marked as comments.

MECS has no direct parallel to SGML attributes, external entities and declarations. However multi-element codes may be used for some of the same purposes as attributes, and the MECS Program Package supports a file inclusion mechanism which performs some of the work that SGML external entities do. MECS has corrollaries to SGML comments and to marked sections with keyword CDATA, but not to the other SGML declaration types.

MECS documents contain text interspersed with codes. MECS does not presuppose any hierarchical document structure - elements may appear in any order and nest arbitrarily deeply. Multi-element codes may not overlap each other, but one-element codes may overlap all other codes without restriction.

The basic syntactic features of all tags occurring in a MECS-conforming document are directly deducible from their delimiters, even if markup is reduced to its minimum. This has at least three important consequences:

First, it increases human readability of documents. Even if heavily marked up documents are notoriously difficult to read for the human eye, MECS at least has the advantage that you may e.g. tell a no-element from a one-element tag immediately. (In SGML you do not know whether a start tag is associated with an end tag or not (i.e. whether it marks an empty element or not), until you have either inspected the DTD or scanned to the end of the current element (which in the worst case means the rest of the entire document instance).

Second, the same point applies to software development. There is no need for look-ahead to identify the basic syntactic features of a MECS tag. Therefore, as long as a MECS document includes a one-line header declaring its delimiters, the entire document can be parsed and validated for basic syntax conformance without recourse to any CDT.

Third, this means that MECS documents are in a certain fundamental sense self-documenting: If a MECS document includes a header, which is a one-line declaration of its delimiters, then a CDT to which that document conforms may be deduced directly from the document alone. The CDT thus deducible from the document is called the document's minimal CDT.

Although it is unequivocally decidable whether any particular document conforms to any particular CDT, an indefinite number of documents conform to any particular CDT, and any particular document conforms to an indefinite number of CDTs. In this respect the relationship between MECS documents and CDTs is the same as the relationship between SGML document instances and DTDs - it is a many- to-many relationship. What is special about the relationship between a document and its minimal CDT is that it is a many-to- one relationship: any particular MECS-conforming document conforms to one and only one minimal CDT.

1.3 MECS Program Package

The MECS Program Package contains programs for the creation, validation, formatting, reformatting, analysis, element extraction and spell checking of MECS-conforming documents, as well as programs for translation between MECS and SGML. All programs in the package run under MS-DOS.

MECSVAL is an interactive, validating parser-editor. MECSVAL checks CDTs and documents for MECS conformance, and may either deduce minimal CDTs from MECS-conforming documents or check that documents conform to particular CDTs.

MECSFORM formats or regularizes MECS-conforming documents by either reducing markup to its minimum or extending it to its standard form, wrapping lines to a user-specified maximum length, removing trailing blanks and trailing blank lines, optionally indenting specified elements and/or inserting reference codes in specified locations etc.

MECSPRES outputs text in various formats (HTML, WordPerfect, Folio Flat File, so-called "plain ASCII", and others). The program offers a number of options for the layout and formatting of elements (margins and marginalia, indentation, tables, columns, notes, section headers etc.; features like bold, italics, single and double underline, capitalization, letter-spacing; markers and special characters; links and anchors etc.) MECSPRES may also reformat text to other MECS-encoded formats, and to formats required by the programs ALPHATEXT and BETATEXT (cf. below). With MECSPRES the user may not only define stylesheets, but also format, layout and style specifications.

MECSLYSE analyzes relationships between the encoded elements of a document and allows the user to define breakpoints at which to display the code stack, list all recursive or overlapping elements, and create a tabulated list displaying the sequence and nesting level of all elements occurring in a document.

MECSGRAB extracts specified elements from a document and prints them and/or their line and column reference numbers in a separate file. This file may, under certain conditions, itself be a MECS-conforming document subject to further processing by MECSGRAB or other MECS programs.

ALPHATXT may be used for interactive spell checking in general, and spell checking of MECS-encoded documents in particular. The program may also perform a number of other tasks, such as the production of word lists sorted according to user-defined character sort criteria, frequency word lists, and simple statistical analyses.

BETATXT computes and displays all possible combinations of single elements of multi-element codes within segments of a document. For example, if sentences are marked and alternative readings are encoded with multi-element codes, then BETATXT may compute and display all the alternative readings of sentences containing substitutions.

MECSSGML converts MECS-conforming documents to SGML-conforming documents. The conversion may or may not lead to a certain loss or distortion of information, depending on the degree to which the document in question includes features specific to MECS, whether or not overlapping elements are retained, etc. (Though it is possible to restrict MECS so as not to allow features which cannot be translated to SGML without loss of information ().)

SGMLMECS converts SGML-documents to MECS-conforming documents. Although a number of SGML features will be converted to a form in which they are ignored by other MECS software, in a certain sense the conversion does not lead to loss or distortion of information: Documents converted to MECS with SGMLMECS may always be converted back again to their exact original SGML form with MECSSGML.

Except for MECSVAL and ALPHATXT, none of the programs in the package are interactive. However, Peter Cripps has written a menu- driven user interface, MECSPAC, for interactive use of the program package.

The lack of a rigorously defined document structure (a DTD) and the lack of restrictions against overlapping elements has been taken by some to suggest that writing programs for MECS would be more complicated than writing programs for SGML.

One difference is that where SGML programs may keep track of the document structure by means of a "last in first out" stack, MECS programs have to maintain a doubly linked list. Admittedly, this is a bit more complicated. On the other hand, the fact that the basic syntactical role of each and every tag can be inferred directly from its delimiters without look-ahead serves to simplify other matters considerably.

Another difference is that whereas with SGML programs may build internal tree representations of documents to facilitate manipulation on them, no such internal representations are built by MECS programs - because of the occurrence of overlapping elements this has so far seemed too complicated. 4 Therefore, all MECS programs read the entire document from its beginning in order to perform operations on it.

The MECS Program Package does not live up to standards of professional software. But the fact that it was possible for a sheer amateur to write the bulk of these programs as a side-activity during a couple of years indicates that programming for MECS is easy. Altogether the package comprises approximately 13,000 lines of Pascal code (excluding the editor). It is assumed that similar programs for SGML would demand code far in excess of this.

1.4 The Future

I once said 5 that when it comes to document structure, one of the main differences between SGML and MECS is that in SGML everything is forbidden unless it is explicitly permitted or mandatory, while in MECS everything is permitted unless it is explicitly forbidden.

In retrospect I realize that this is grossly unfair: SGML does after all admit quite permissive DTDs, and MECS does not have any means of forbidding or demanding particular document structures. 6 Still, the formulation points to a difference of emphasis: SGML provides strong mechanisms for exerting control over document structure, whereas MECS sacrifices such control in favor of free overlap and simplified or in-line declaration of elements.

Nine years have passed since the development of MECS started, and it has been used in the encoding of several thousand manuscript pages. The TEI guidelines has been available for quite some time, and has been discussed and used extensively by a large number of projects. The amount and range of SGML-based software has increased considerably.

Is there still a need for MECS? Despite the fact that SGML is a far more sophisticated markup language, I believe that the considerations which led me to dismiss SGML nine years ago still apply. 7 MECS is therefore in my eyes still the preferred choice for a project like the Wittgenstein Archives. However, MECS also has obvious shortcomings. If not all, then at least a number of these shortcomings are eliminated in SGML. Unfortunately, conversion from MECS to SGML without loss of information is notoriously difficult, so we cannot have the best of both worlds.

One recent development (1997) within the SGML area is particularly interesting. Extensible Markup Language (XML), which has received much attention lately, shares a couple of features with MECS: In XML, empty elements are visibly different from elements with content, and tag omission is not allowed. Consequently, a DTD is not required in XML, and a distinction is made between well-formedness ("valid" without DTD) and validity (valid according to some specific DTD) of documents. In all these respects, XML is therefore closer to MECS than SGML is. 8 (It is also interesting to note that one of the arguments often made in favor of XML is that it is easier to write programs for than SGML is.) However, in one important respect XML poses even greater difficulties than SGML: XML does not include SGML's CONCUR feature. And without CONCUR the conversion of MECS documents seems even more difficult.

Some work has been done in order to create a bridge from MECS to SGML. Sunniva Solstrand has developed a method (and a program) for automatically "deducing" DTDs from document instances converted from MECS to SGML (Solstrand 1994). Sascha Djuric has proposed a convention for automatically converting elements with overlap to hierarchical structures in a controlled manner (Djuric 199?). What remains in particular is a method for converting MECS documents with overlap to concurrent hierarchies by using the SGML CONCUR feature, and SGML software which implements this feature. Methods for MECS to SGML conversion is one of the concerns of an ongoing cooperation between C. Michael Sperberg-McQueen and myself.


2 MECS - A MULTI-ELEMENT CODE SYSTEM

2.1 Summary

MECS is a syntax for the design of text encoding systems. Documents which conform to this syntax consist of text interspersed with codes, of which there may be seven syntactically distinct types:

 

 No-element codes:                 <s> 

 One-element codes:                <a/   ... /a> 

 Poly-element codes:               [a/2| ... /a| ... /a] 

 N-element codes:                  [s/2\ ... /s| ... /s] 

 Character representation codes:   {a} 

                              or   {"---"\a} 

 Character disambiguation codes:   {a\a} 

                              or   {"---"\a} 

 Comments:                         <| xxx |> 

In these examples '...' indicate coded elements, i.e. character strings which may or may not contain further codes. 's' and 'a' exemplify generic identifiers, i.e. names of individual codes.

The first four types of codes are sometimes referred to jointly as element codes; poly-element and N-element codes are sometimes referred to jointly as multi-element codes; while character representation and disambiguation codes are sometimes referred to jointly as character codes.

The examples above are given in MECS' default character set. However all character sets in MECS may be redefined, and there are no restrictions on which characters may be used as code delimiters or which as free characters or tag characters.

Strictly speaking, MECS is therefore not in itself a code system, but a general-purpose set of rules for the design of such systems. MECS specifies how to assign specific syntactic roles to characters and character sets, how to declare generic identifiers for codes, how to use these codes in documents, etc., - in short, how to define and use a code system conforming to the basic code syntax specified by MECS.

This definition is given in the form of a Code Declaration Table (CDT). The CDT starts with a MECS header. The header assigns values to the code delimiters, which decide the most basic general features of any MECS code system. The rest of the CDT declares free characters, tag characters and generic identifiers.

Thus, a MECS-conforming document is a document conforming to a MECS CDT. The document itself may also start with a MECS header. If it does, its minimal CDT can be reconstructed on the basis of the encoded document alone.

The default MECS header is:

 

  £ < > < / / > [ / | \ / | / ] { " \ } 

Any order and nesting level of codes in documents is allowed. Codes may be contained within each other wholly (hierarchically) or only partly (overlapping each other). However, there is one restriction against overlapping: multi-element codes may nest hierarchically, but they may not overlap other multi-element codes.

Codes belonging to different code types may have identical generic identifiers, with one exception: neither no-element and one-element codes nor poly-element and N-element codes may share the same generic identifier.

Character disambiguation codes may be used in conjunction with character representation codes only. The generic identifier of the associated character representation code may be replaced by a string of free characters enclosed by character quote delimiters.

Comments may occur anywhere in a document, and they may contain any sequence of legal characters. The contents of a comment is not regarded as part of the code structure of a document.

According to the general rules for markup reduction one-element codes, poly-element codes and N-element codes may be reduced:

 

 Full markup                       Reduced markup 

 

 <a/ ... /a>                       <a/ ... > 

 [a/2| ... /a| ... /a]             [a| ... | ... ] 

 [a/3| ... /a| ... /a| ... /a]     [a/3| ... | ... | ... ] 

 [s/2\ ... /s| ... /s]             [s\ ... | ... ] 

 [t/3\ ... /t| ... /t| ... /t]     [t/3\ ... | ... | ... ] 

SGML documents are MECS-conforming, provided that they do not make use of tag minimization or end tag omission 9. Some MECS documents will be well-formed SGML documents, others may easily be converted to SGML, yet others may only be converted to SGML with a certain distortion or loss of information.

2.2 Basic Code Syntax, Code Systems and Documents

MECS is a basic code syntax for the design and specification of code systems for markup of electronic documents. Strictly speaking, MECS is therefore not in itself a code system, but a general-purpose set of rules for the design of such systems.

MECS specifies how to assign specific syntactic roles to characters and character sets, how to declare generic identifiers for codes, how to use these codes in documents, etc., - in short, how to define and use a code system conforming to the basic code syntax specified by MECS.

These assignments and declarations are listed in a Code Declaration Table - a CDT. Strictly speaking, again, it is only when adding a CDT to the basic code syntax of MECS that we have a code system. Adding a CDT to the basic code syntax of MECS is like adding an alphabet and a vocabulary to a formal grammar.

The values assigned to the code delimiters decide the most general basic features of any MECS code system. These values are declared in the MECS header. The MECS header is the very first part of the CDT and may also be included as the first part of MECS documents. Any MECS document which contains such a header is self-documenting in the sense that a minimal CDT may be reconstructed on the basis of the document alone.

An electronic document adhering to the specifications of a specific MECS code system, e.g. MECS-XXX, may be called a MECS-XXX-conforming or a MECS system-conforming document. A document adhering to the specifications of some MECS code system or other will be called a MECS-conforming document. All MECS system-conforming documents are MECS-conforming documents, but not vice versa.

2.3 Codes, Tags and Elements

In our context, a computerized text is regarded as a stream of characters.

A MECS document is a string of free characters and codes.

A code is an ordered sequence of tags and (optionally) elements. A code may consist of one single tag, or it may consist of several tags and one or more elements included between the tags.

An element is a string of free characters and tags.

An element occurring between the tags of one and the same code is called the code's coded element.

A tag consists of code delimiter(s) and/or tag characters. More specifically, a tag may consist of a tag open delimiter, a string of tag characters constituting a generic identifier, possibly followed by an attribute string, and a tag close delimiter. Or a tag may consist of a tag close delimiter only.

2.4 Code Types

There are seven types of codes. Using the MECS default delimiters (cf. 9.5), examples of these code types will appear as follows:

 

 No-element code:                 <s> 

 One-element code:                <a/   ... /a> 

 Poly-element code:               [a/2| ... /a| ... /a] 

 N-element code:                  [s/2\ ... /s| ... /s] 

 Character representation code:   {a} 

                             or   {"---"\a} 

 Character disambiguation code:   {a\a} 

                             or   {"---"\a} 

 Comment:                         <| xxx |> 

In these examples 's' and 'a' are generic identifiers, '...' indicate elements, '---' indicate strings of free characters, and 'xxx' is any sequence of legal characters. The element(s) occurring between the tags of a code is called its coded element(s). Thus, one-element codes have one coded element, poly-element and N-element codes have several coded elements, while the other code types have no coded elements.

The first four types of codes are sometimes referred to jointly as element codes. Poly-element and N-element codes are sometimes referred to jointly as multi-element codes. The last two types of codes are sometimes referred to jointly as character codes.

2.4.1 No-element Codes

A no-element code consists of one single tag, which is called the no-element tag.
   The no-element tag consists of a no-element code open delimiter (NCO), a generic identifier (optionally followed by an attribute string) and a no-element code close delimiter (NCC).

2.4.2 One-element Codes

A one-element code consists of a one-element start tag, a coded element and a one-element end tag.
   The one-element start tag consists of a one-element start tag open delimiter (OSO), a generic identifier (optionally followed by an attribute string) and a one-element start tag close delimiter (OSC).
   The one-element end tag consists of a one-element end tag open delimiter (OEO), the same generic identifier as the start tag and a one-element end tag close delimiter (OEC).

2.4.3 Poly-element Codes

A poly-element code consists of a poly-element start tag, one or more coded elements separated by multi-element separator tags and a multi-element end tag.
   The poly-element start tag consists of a multi-element start tag open delimiter (MSO), a generic identifier (optionally followed by an attribute string), a multi-element number delimiter (MNC), an element number and a poly-element start tag close delimiter (PSC).
   The multi-element separator tag consists of a multi-element separator tag open delimiter (MDO), the same generic identifier as the poly-element start tag and a multi-element separator tag close delimiter (MDC).
   The multi-element end tag consists of a multi-element end tag open delimiter (MEO), the same generic identifier as the poly-element start tag and a multi-element end tag close delimiter (MEC).
   The number of coded elements contained by a poly-element code is indicated by the element number. The number of multi-element separator tags contained by a particular poly-element code token equals the number of coded elements minus one.
   Poly-element codes may contain two or more elements and the number of elements contained by different tokens of the same poly- element code in a document may vary from token to token.

2.4.4 N-element Codes

An N-element code consists of an N-element start tag, one or more coded elements separated by multi-element separator tags and a multi-element end tag.
   N-element codes are syntactically identical to poly-element codes, except that: (1) the start tag close delimiter is an N-element start tag close delimiter (NSC); and (2) the number of elements contained by different tokens of the same N-element code in a document may not vary from token to token.

2.4.5 Character Representation Codes

A character representation code consists of one single tag, which is called the character representation tag.
   The character representation tag consists of a character representation code open delimiter (CRO), a generic identifier and either a character code close delimiter (CCC) or a character disambiguation code open delimiter (CDO).

If the generic identifier is followed by a character disambiguation code open delimiter (CDO), the character representation code is used in conjunction with a character representation code immediately succeeding it, like this:

  {a\a}
where 'a' is the generic identifier of a character representation code and also of a character disambiguation code. If accompanied by a character disambiguation code, the character representation code may, instead of a generic identifier, contain a string of free characters, enclosed by character quote delimiters (CQDs) - cf. 4.6 for further explanation of this feature.

2.4.6 Character Disambiguation Codes

A character disambiguation code consists of one single tag, which is called the character disambiguation tag.
   The character disambiguation tag consists of a character disambiguation code open delimiter (CDO), a generic identifier and a character code close delimiter (CCC).

A character disambiguation code can only be used in conjunction with a character representation code immediately preceding it. The close delimiter of the character representation code is then replaced by the open delimiter of the character disambiguation code.

The preceding character representation code may, instead of a generic identifier, contain a string of free characters, enclosed by character quote delimiters (CQD), like this:

  {"---"\a}
where 'a' is the generic identifier of a character disambiguation code and '---' is a string of free characters.

2.4.7 MECS Comments

A MECS comment may contain free characters, tag characters and code delimiters, i.e. any legal characters, in any order. The contents of a comment is not regarded as part of the code structure of a document.

A comment starts with a one-element start tag open delimiter (OSO) immediately followed by a poly-element start tag close delimiter (PSC), and ends with a poly-element start tag close delimiter (PSC) immediately followed by a one-element end tag open delimiter (OEC). Thus, in MECS' default character set a comment looks like this:

  <| xxx |>
where 'xxx' stands for any sequence of legal characters.
Exception:

If the pairs NCO - NCC and OSO - OSC are identical (cf. 9.1.1, exception to rule no 4), MECS comments end with a OEC only. A comment may then not contain code delimiters except matching pairs of NCO and NCC, OSO and OSC, and OEO and OEC. However, the comment can also include matching pairs of MSO and MEC. Inside such matching pairs of MSO and MEC the comment may contain free characters, tag characters, and code delimiters, and other legal characters, in any order.
Note:

This exception has been made to enhance compatibility with SGML concrete reference syntax (cf. 12). The exception allows SGML comments like
  <!-- xxxxxxxx -->
or other SGML declarations like e.g.
<![PCDATA[ >> < < </xxx &]]>
to be valid in SGML-like MECS documents.

2.5 Generic identifiers and attribute strings

A MECS code system may include any number of generic identifiers 10 for each of the different code types except for comments, which do not have generic identifiers.

Neither no-element and one-element codes nor poly-element and N-element codes may share the same generic identifier. Apart from this, codes belonging to different code types may have identical generic identifiers.

Thus, the following examples would be legal and might all be included in one and the same document conforming to a MECS code system:

 

 (1) <s> 

 (2) <a/ ... /a> 

 (3) [a/2| ... /a| ... /a] 

 (4) [s/2\ ... /s| ... /s] 

 (5) {a} 

 (6) {a\a} 

MECS also allows for the use of numerals as identifiers of one-element codes, so that codes may be used with natural numbers in the place of generic identifiers, e.g.
 

 (7) <1/ ... /1> 

 (8) <2/ ... /2> 

etc.

Start tags of element codes may contain attribute strings: if the tag's generic identifier is followed by a string delimiter or a nil character (i.e. in the normal position of the tag close delimiter), then the rest of the tag may contain any sequence of free characters (except for any that might be identical to code delimiters), ending with the tag close delimiter.
   I.e., given the examples (1) and (2) above, the following examples would also be legal:

 

 (9) <s This is an attribute string> 

(10) <a attribute=value n=1/ ... /a> 

2.6 Markup Reduction
The general rules for markup reduction are:
1.
End tags and separator tags may be reduced to their respective tag close delimiters, provided that their generic identifiers are identical to the generic identifier of the last preceding start or separator tag of an unterminated code belonging to the same code type.
2.
In a multi-element code with two elements the start tag may be reduced to its open delimiter, generic identifier and tag close delimiter.

The implications of these rules for each specific code type are as follows:

No-element codes cannot be reduced.

A one-element end tag with a generic identifier identical to the generic identifier of the last preceding unterminated one-element start tag may be reduced to the one-element end tag close delimiter.

A poly-element start tag may, if the code has 2 elements, be reduced to multi-element start tag open, generic identifier and poly-element start tag close.

An N-element start tag may, if the code has 2 elements, be reduced to multi-element start tag open, generic identifier and N-element start tag close.

   A multi-element separator tag with a generic identifier identical to the last preceding unterminated multi-element start tag or separator tag may be reduced to the multi-element separator tag close.
   A multi-element end tag with a generic identifier identical to the last preceding unterminated multi-element separator tag may be reduced to the multi-element end tag close delimiter.

Character representation codes, character disambiguation codes and comments cannot be reduced.

Thus, the examples (2)-(4) above may be reduced to

 

 (11) <a/ ... > 

 (12) [a| ... | ... ] 

 (13) [s\ ... | ... ] 

2.7 Document Structure

A MECS document may include the header of a MECS system to which it conforms (cf. 10). If so, the first character of the header must also be the very first character of the document.

Apart from this optional header, a MECS document consists of codes and elements appearing in any order.

That a code A contains a code B means that one or more of B's tags are contained in A's coded element(s).
   B is hierarchically nested within A if A contains B, and B does not contain A.
   A and B overlap if A contains B, and B contains A.

Any order and nesting level of codes is allowed, with one exception: no multi-element code may overlap any other multi-element code 11.

This means that the following examples are all legal:

 

 (14) <a/   /a>   <b/  /b> 

 (15) <a/   <a/   /a>  /a> 

 (16) <a/   <b/   /b>  /a> 

 (17) <a/   <b/   /a>  /b> 

 

 (18) <a/   [a/2| /a|  /a>  /a] 

 

 (19) [a/2| [s/2\ /s|  /s]  /a| /a] 

 (20) [a/2| [a/3| /a| /a| /a] /a| [t/3\ /t| /t| /t] /a] 

 (21) [a/2| <a/ <s> {a\a} [b/2| [s/2\ {a\a} /s| {b} /s] 

      <b/ {b\a} /b| /a> /b]  <s> /a| /b> /a] 

However, the following example is illegal:
 

   [s/2\  [b/2| /s|  /b|  /s]  /b] 

It should be noted that overlapping reduces the possibilities for markup reduction. For example, (21) above reduces to:
 

 (22) [a|   <a/ <s> {a\a} [b|   [s\   {a\a}   | {b}   ] 

      <b/ {b\a} /b| /a>   ]  <s> /a| /b>   ] 

2.8 Classification of MECS systems

Any MECS system is either complete or partial.
   A complete MECS system contains all the seven code types described above (cf. 4).
   A partial MECS system lacks one or more of the code types of a complete system. Partial systems are called N-type systems, where N is the number of code types contained by the system.

Any MECS system is either reduced, reducible, or irreducible.
   A reduced MECS system demands full reduction of all start tags and separator tags.
   A reducible MECS system permits but does not require reduction of start and separator tags.
   An irreducible MECS system requires that no tags are reduced.

Any MECS system is either restricted or unrestricted.
   In a restricted MECS system, no codes may overlap.
   A system which is not restricted, is unrestricted.

A reduced system is necessarily a restricted system, but not vice versa.

2.9 Character sets

A MECS document consists of legal characters only. The legal characters are the delimiters, free characters and tag characters.

There are several subsets of legal characters, whereof some sets may overlap and others not. MECS assigns a number of different roles to these character sets and to individual characters, and includes rules concerning the relationships between these sets and between particular members of particular sets.

2.9.1 Delimiters

2.9.1.1 Code delimiters

There are 18 code delimiters. They correspond to the six first types of codes (cf. 4) as indicated below.

No-element code delimiters:
NCO
No-element code open delimiter
NCC
No-element code close delimiter
One-element code delimiters:
OSO
One-element start tag open delimiter
OSC
One-element start tag close delimiter
OEO
One-element end tag open delimiter
OEC
One-element end tag close delimiter
Multi-element code delimiters:
MSO
Multi-element start tag open delimiter
MNC
Multi-element number delimiter
PSC
Poly-element start tag close delimiter
NSC
N-element start tag close delimiter
MDO
Multi-element separator tag open delimiter
MDC
Multi-element separator tag close delimiter
MEO
Multi-element end tag open delimiter
MEC
Multi-element end tag close delimiter
Character code delimiters:
CRO
Character representation code open delimiter
CCC
Character code close delimiter
CQD
Character quote delimiter
CDO
Character disambiguation code open delimiter
A distinction is made between tag open delimiters and tag close delimiters.
Tag open delimiters:
NCO, OSO, OEO, MSO, MDO, MEO and CRO
Tag close delimiters:
NCC, OSC, OEC, MNC, PSC, NSC, MDC, MEC and CCC

Assigning values to the code delimiters is one of the most basic operations in the definition of a MECS code system. The values assigned to the code delimiters decide many of the basic syntactical features of the code system (cf. 8, 9.5 and 9.6).

If a code delimiter is assigned the value nil, the delimiter itself is said to be nil, or undeclared.
   That a character which is the value of a code delimiter is a reserved delimiter value means that it can not belong to the free characters of the code system defined. A delimiter is said to be reserved if its value is a reserved delimiter value.

Values are assigned to code delimiters according to the following rules:

1. The value assigned to a delimiter may be either nil or a character.
Exception:

OEO may be assigned either nil, a single character, or a string of two characters, as its value.
Note:

The exception has been made to enhance compatibility with SGML concrete reference syntax (cf. 12).
2. Though they may all have identical values, all tag open delimiters must be either reserved or nil.
Note:

If OEO is nil, reduction of one-element end tags will be required, and the code system defined necessarily becomes restricted.
   If MDO and MEO are nil, reduction of multi-element separator and end tags will be required.
3. It is not required that any of the other delimiters (i.e. other than tag open delimiters) are reserved.
Exception:

If OEO, MDO, or MEO is nil, then respectively OEC, MDC and MEC must be reserved.
Note:

Reduction of one-element end tags, multi- element separator tags and multi-element end tags is impossible unless respectively OEC, MDC and MEC are reserved, different from each other, and different from all tag open delimiter values.
4. None of the following ten pairs of delimiters may be identical: NCO - NCC, OSO - OSC, OEO - OEC, MSO - MNC, MSO - PSC, MSO - NSC, MDO - MDC, MEO - MEC, CRO - CDO and CRO - CCC.
Exception:

The pair of delimiters NCO - NCC may be identical to the pair OSO - OSC.
Note:

This exception has been made to enhance compatibility with SGML concrete reference syntax (cf. 12).

2.9.1.2 String delimiter

The string delimiter (SD) may occur anywhere in elements. In tags, SD separates generic identifiers from attribute strings.

SD may not be nil. Its value may not be identical to the value of any code delimiter. SD is always a free character.

If SD is assigned a blank character, line endings and start and end of file will be regarded as equivalents to SD.

2.9.2 Nil character

The nil character may occur anywhere in a document. In tags, the nil character separates generic identifiers from attribute strings.

Its value may not be identical to any of the code delimiters. The nil character is always a free character.

If the nil character is identical to SD, the character value in question will be interpreted as SD, and the system defined is said to contain no nil character.

2.9.3 Free characters

Free characters may occur anywhere in elements.

All legal characters except those which are reserved delimiter values may be included in the set of free characters.

The string delimiter and the nil character always belong to the free characters.

2.9.4 Tag characters

Generic identifiers consist of tag characters.

All legal characters except those which are values of code delimiters may be included in the set of tag characters.

2.9.5 Default character sets

Default code delimiters

The default MECS code delimiters define a complete, unrestricted, reducible system, with 10 different, whereof 8 reserved, delimiter values.

 

 Header:                   £ < > < / / > [ / | \ / | / ] { " \ } 

 Code delimiters:          < / > [ | ] { \ } " 

 Reserved code delimiters: < / > [ | ] { } 

Default string delimiter: (blank) Default nil character: (blank)
Default free characters

:

 

  a b c d e f g h i j k l m n o p q r s t u v w x y z 

  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

  1 2 3 4 5 6 7 8 9 0 

  , ; . : - ( ) ! ? " ' 

  * % & = + 

  (blank) 

Default tag characters

:

 

  a b c d e f g h i j k l m n o p q r s t u v w x y z 

  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

  1 2 3 4 5 6 7 8 9 0 

  . _ - 

2.9.6 Examples of alternative character sets

Example 1

A complete, unrestricted, irreducible system is defined by the following header:

 

 £ * * * | * @ | | * \ | @ @ * @ " / @ 

The system has 6 code delimiters, whereof 3 are reserved.
Code delimiters: * | @ \ " /
Reserved code delimiters: * | @

The code syntax of the system is:

 

 No-element code:                 *s* 

 One-element code:                *a|   ... *a@ 

 Poly-element code:               |a|2* ... |a@ ... @a* 

 N-element code:                  |s|2\ ... |s@ ... @s* 

 Character representation code:   @a@    or  @"---"/a@ 

 Character disambiguation code:   @a/a@  or  @"---"/a@ 

 Comment:                         ** xxx *@ 

Example 2

A complete, unrestricted, reducible system is defined by the following header:

 

 £ * * * < * > * % [ \ * | * ] * " _ / 

The system has 11 code delimiters, whereof 4 are reserved.
Code delimiters: * < > % [ \ | ] " _ /
Reserved code delimiters: * > | ]

The code syntax of the system is:

 

 No-element code:                 *s* 

 One-element code:                *a<   ... *a> 

 Poly-element code:               *a%2[ ... *a| ... *a] 

 N-element code:                  *s%2\ ... *s| ... *s] 

 Character representation code:   *a/    or  *"---"_a/ 

 Character disambiguation code:   *a_a/  or  *"---"_a/ 

 Comment:                         *[ xxx [> 

NOTE:

The values '<', '|' and '' may be defined as free characters instead of reserved code delimiters. We will then get a complete, unrestricted, irreducible system with only one reserved code delimiter.

Example 3

A partial (4-type), restricted, reduced system is defined by the following header:

 

 £ < > < / £ > £ £ / £ £ £ £ £ / £ £ / 

The system has 3 code delimiters, whereof all are reserved.
Code delimiters: < > /
Reserved code delimiters: < > /

The code syntax of the system is:

 

 No-element code:                 <s> 

 One-element code:                <a/ ... > 

 Character representation code:   /a/ 

 Comment:                         </ --- /> 

Example 4

A partial (4-type), unrestricted, reducible system is defined by the following header:

 

 £ < > < > </ > [ £ ! £ £ £ £ ] & £ £ ; 

The system has 8 code delimiters, whereof 6 are reserved.
Code delimiters: < > </ [ ! ] & ;
Reserved code delimiters: < > [ ! ] &

The code syntax of the system is:

 

 No-element code:                 <s> 

 One-element code:                <a> ... </a> 

 Character representation code:   &a; 

 Comment:                         <! xxx [ xxx ] xxx > 

NOTE:

This example corresponds quite closely to SGML's reference concrete syntax (cf. 12).

2.10 Code Declaration Table (CDT)

It is only when adding a Code Declaration Table (CDT) to MECS' basic code syntax that we have a MECS code system. The CDT assigns values to the delimiters and other character sets, and declares the actual codes of the system. The CDT itself is a file of characters.

The very first character of the CDT declares the system's string delimiter.

The second character of the CDT declares the system's nil indicator, which in the rest of the CDT indicates an assignment of the value nil.

The third character of the CDT declares the system's nil character.

In the rest of the CDT, all character strings delimited by string delimiters are strings.

The first 18 strings of the table declare the systems's code delimiters, in the following order:

 

  NCO NCC 

  OSO OSC OEO OEC 

  MSO MNC PSC NSC MDO MDC MEO MEC 

  CRO CQD CDO CCC 

Together, the first three characters and the first 18 strings of the CDT form a string defining the system's header. The default MECS header is:
 

  £ < > < / / > [ / | \ / | / ] { " \ } 

(Note that the very first character of the header is a blank.)

The next five strings (i.e. strings nos 19, 20, 21, 22 and 23) declare the system's free characters.

Strings number 24 and 25 declare the system's tag characters.

The next six strings (i.e. strings nos 26, 27, 28, 29, 30 and 31) declare the CDT's code type indicators, which in the rest of the CDT indicate which code type a generic identifier belongs to.

String no 26 declares the no-element code indicator. String no 27 declares the one-element code indicator. String no 28 declares the numeric indicator. String no 29 declares the poly-element code indicator. String no 30 declares the character representation code indicator. String no 31 declares the character disambiguation code indicator.

The first part of the CDT, i.e. the part beginning with SD and including the first 31 strings, is the code syntax part of the CDT.

The rest of the CDT is the code inventory part. This part consists of pairs of strings, the first of which is a code type indicator and the second a generic identifier. Each pair declares a code of the indicated type and assigns a generic identifier to it.

If the numeric indicator is not nil, numbers indicate N-element codes in the rest of the table, and one-element codes with numeric identifiers (cf. 5) are declared by replacing the generic identifier by a numeric indicator.

The following is an example of a CDT defining a complete, unrestricted, reducible system with the MECS default character sets defined above (cf. 9.5).

(23)

 

+---------------------------------------+-----------+---------+ 

| £ < > < / / > [ / | \ / | / ] { " \ } | Header    |         | 

+---------------------------------------+-----------+         | 

|                                       |           |         | 

|abcdefghijklmnopqrstuvwxyz             | Free      |         | 

|ABCDEFGHIJKLMNOPQRSTUVWXYZ             | char-     |         | 

|1234567890                             | acters    |         | 

|,;.:-()!?"'                            |           |Code     | 

|*%&=+                                  |           |Syntax   | 

+---------------------------------------+-----------+Part     | 

|abcdefghijklmnopqrstuvwxyz             | Tag char- |         | 

|ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890._-| acters    |         | 

+---------------------------------------+-----------+         | 

|no   one     num  poly   rep   dis     | Code type +---------+ 

|                                       | indicators|         | 

+---------------------------------------+-----------+         | 

|no   s                                 |           |         | 

|one  a                                 |           |         | 

|one  b                                 |           |         | 

|one  num                               |           |         | 

|poly a                                 |           |Code     | 


|poly b                                 |           |Inventory| 

|2    s                                 |           |Part     | 

|3    t                                 |           |         | 

|rep  a                                 |           |         | 

|rep  b                                 |           |         | 

|dis  a                                 |           |         | 

+---------------------------------------+-----------+---------+ 

The following document, which contains the examples (1)..(22) above, conforms to the above CDT.

(24)

 

  (1) <s> 

  (2) <a/ ... /a> 

  (3) [a/2| ... /a| ... /a] 

  (4) [s/2\ ... /s| ... /s] 

  (5) {a} 

  (6) {a\a} 

  (7) <1/ ... /1> 

  (8) <2/ ... /2> 

  (9) <s This is an attribute string> 

 (10) <a attribute=value n=1/ ... /a> 

 (11) <a/ ... > 

 (12) [a| ... | ... ] 

 (13) [s\ ... | ... ] 

 (14) <a/   /a>   <b/  /b> 

 (15) <a/   <a/   /a>  /a> 

 (16) <a/   <b/   /b>  /a> 

 (17) <a/   <b/   /a>  /b> 

 (18) <a/   [a/2| /a|  /a>  /a] 

 (19) [a/2| [s/2\ /s|  /s]  /a| /a] 

 (20) [a/2| [a/3| /a| /a| /a] /a| [t/3\ /t| /t| /t] /a] 

 (21) [a/2| <a/ <s> {a\a} [b/2| [s/2\ {a\a} /s| {b} /s] <b/ 

      {b\a} /b| /a> /b]  <s> /a| /b> /a] 

 (22) [a|   <a/ <s> {a\a} [b|   [s\   {a\a}   | {b}   ] <b/ 

      {b\a} /b| /a>   ]  <s> /a| /b>   ] 

2.11 Deducing a Minimal CDT from an Encoded Document

Although it is unequivocally decidable whether any particular document conforms to the MECS code system defined by any particular CDT, an indefinite number of documents conform to any particular CDT, and any particular document conforms to an indefinite number of CDTs.

However, if a document contains the header of any CDT to which it conforms (cf. 9.1.1), one particular CDT to which the document conforms may be deduced directly from the document alone.

In virtue of the rules for assigning values to code delimiters (cf. 9.1.1), the basic syntactic features of all tags occurring in a MECS-conforming document are directly deducible from their code delimiters. The deduction can be done without look-ahead, unless the delimiter pairs NCO - NCC and OSO - OSC are identical (cf. 9.1.1, exception to rule no 4).

This holds true for all MECS-conforming documents, whether the system to which they conform is partial or complete, whether it is restricted or unrestricted, and whether it is reduced, reducible, or irreducible.

The CDT thus deducible from the document is called the document's minimal CDT. Any particular MECS-conforming document has one and only one minimal CDT 12.

It is therefore recommended that all MECS documents contain a header. Some examples follow below.

Document (24) has the following minimal CDT:

(25)

 

  £ < > < / / > [ / | \ / | / ] { " \ } 

 T 

 abeghilnrstuv 

 0123456789 

 . 

 ()= 

 abst 

 12 

 n o # p r m 

 n s 

 o # 

 o a 

 o b 

 p a 

 p b 

 2 s 

 3 t 

 r a 

 r b 

 m a 

Note:

(24) conforms to (23) as well as to (25), and, in principle, to an indefinite number of other CDTs. However, (25) is (24)'s minimal CDT.

The following document

 

  _ [ ] [ | _ ] _ _ | _ _ _ _ _ _ _ _ _ 

 

 This document contains no-element [NO] and 

 [ONE| one-element ] codes, and [|comments|]. 

 No more. 

 

 Its minimal CDT will define a system which is 

 partial, restricted (since there is no 

 one-element end tag open delimiter), and 

 reduced (for the same reason). 

 

 This document contains no-element [NO] and 

 [ONE| one-element ] codes, and [|comments|]. 

 No fun. 

has the following minimal CDT
 

  _ [ ] [ | _ ] _ _ | _ _ _ _ _ _ _ _ _ 

 CDINT 

 acdefghilmnoprstuwyz 

 _ 

 ,. 

 ()- 

 _ 

 ENO 

 n o _ _ _ _ 

 n NO 

 o ONE 

2.12 SGML Compatibility

2.12.1 Some general observations

As a first and very rough approximation, it may be said that: 1. SGML documents are MECS-conforming, provided they do not make use of tag minimization or end tag omission 13. 2. Some MECS documents are well-formed SGML documents, others may easily be converted to SGML, yet others may only be converted to SGML with a certain distortion or loss of information.

MECS no-element codes correspond to SGML empty elements (so- called milestones.) MECS one-element codes correspond to SGML elements. MECS character representation codes correspond roughly to SGML internal entities. There is nothing in SGML which corresponds directly to the MECS poly- element codes, N-element codes and character disambiguation codes. MECS-aware software will accept, but ignore, SGML attributes and declarations, and interpret SGML entity references differently from SGML applications.

MECS markup reduction rules differ from SGML markup reduction or minimization rules.

Functionally, a MECS document, if stripped of its optional MECS header, corresponds to the SGML document instance. But while the MECS CDT corresponds roughly to the SGML Document Type Definition (DTD), there are fundamental differences between MECS CDTs and SGML DTDs.

The following features of MECS are exceptions and deviations from the main outline of the basic code syntax which have been made in order to enhance SGML compatibility.

2.12.2 From SGML to MECS

An SGML document is a MECS-conforming document if no tag minimization or end tag omission has been used 14.

However, MECS software will interpret attributes, declarations and entity references differently from the way they are interpreted by an SGML application. In MECS, SGML attributes will be regarded as attribute strings and ignored. All SGML declarations, including the DTD as well as marked sections and comments, will be regarded simply as MECS comments and thus also ignored. SGML entities will be interpreted as character representation codes.

It is possible to define MECS code delimiters so that they agree closely with the corresponding parts of the SGML concrete reference syntax (cf. 9.6, example 4.)

For example, the following SGML document 15

 

 <!DOCTYPE TEI.1 SYSTEM "c:\tei\public\tei1.dtd" [ 

      <!ENTITY tla "Three Letter ACROnym"> 

      <!ELEMENT my.tag - - (#PCDATA)> 

           <!-- following line added by C.H. --> 

      <!ELEMENT my.stone - o EMPTY> 

      <!-- any other special-purpose declarations or 

           re-definitions go in here --> 

 ]> 

 <tei.1> 

      This is an instance of a modified TEI.1 type document, 

      which may contain <my.tag>my special tags</my.tag>, 

           <!-- following line added by C.H. --> 

      including milestones such as <my.stone>, and 

      references to my usual entities such as &tla;. 

 </tei.1> 

is a MECS-conforming document, with the following minimal CDT
 

  £ < > < > </ > [ £ ! £ £ £ £ ] & £ £ ; 

 EIT 

 acdefghilmnoprstuwy 

 1 

 ,. 

 £ 

 aegilmnosty 

 .1 

 n o £ £ r £ 

 n my.stone 

 o my.tag 

 o tei.1 

 r tla 

2.12.3 From MECS to SGML

A MECS document is a well-formed SGML document provided that:

1
the document does not contain the optional MECS header
2
no markup reduction has been used
3
either the document does not contain character disambiguation codes, or the character quote delimiter and the character disambiguation code open delimiter are legal characters within SGML entity references
4
the pair of delimiters NCO - NCC is identical to OEO - OEC
5
attribute strings do not occur, or all attribute strings conform to the syntax of SGML attributes
6
the document does not contain multi-element codes
7
the entire document is a one-element code
8
no codes in the document overlap

The conversion of MECS documents not satisfying conditions 1 to 7 to SGML-conforming document instances is a straightforward process, provided they satisfy condition 8.

The conversion of MECS documents not satisfying condition 8 to SGML-conforming documents is likely to be a rather complicated process, and may lead to distortion or loss of information. There are two ways in which such documents can be converted: 1) either all occurrences of overlap can be eliminated (cf. Part II, ##); 2) or one has to identify sets of codes in the document which do not overlap, and define concurrent DTDs for each of these sets.

SGML applications are likely to interpret attribute strings and character disambiguation codes differently from the way they are interpreted by MECS software.

2.13 Revision History

A preliminary version of MECS was drafted in February 1990 16. Version 1.00 was finished in February 1991 17. Version 1.01, of June 1992 18, consisted in a slight revision of the CDT format. The revision did not necessitate changes to version 1.00 documents.

Version 2.00, which was finished in August 1993 19, includes minor changes both in the CDT format, the MECS header, and the basic syntax of one of the code types, i.e. the N- element codes. Therefore, transition from earlier versions to version 2.00 necessitates changes to CDTs and may also require changes to MECS documents encoded according to these versions.

2.14 Plans for MECS Version 3

MECS version 3 will represent a simplification of the structures already present in earlier versions. At the same time, version 3 will offer new capabilities and new and more powerful mechanisms.

MECS version 2 no-element codes, one-element codes and N-element codes have one thing in common: their number of elements is fixed. In version 3, therefore, they will all be subsumed under one category, which will be called N-element codes. Poly-element codes will be retained, with the modification that they may contain any number of elements including 0 or 1 (i.e., the number of elements does not any more have to be higher than 1).

Character representation codes and character disambiguation codes will be retained.

The four remaining code types of version 3 may be exemplified as follows, in default notation:

 

  Full markup                Reduced markup 

N-element codes:
 

  <tag> 

  <tag/ ... /tag>            <tag/ ... > 

  <tag/ ... /tag| ... /tag>  <tag/ ... | ... > 

Poly-element codes:
 

  <tag_0> 

  <tag_1/ ... /tag>             <tag< ... > 

  <tag_2/ ... /tag| ... /tag>   <tag| ... | ... > 

Character codes:
 

  {tag} 

  {tag_tag} 

The character code close delimiter may be left out if immediately followed by a string delimiter or a reserved code delimiter, as follows:
 

  {tag}           {tag 

  {tag}<          {tag< 

  {tag}/          {tag/ 

  {tag}|          {tag| 

  {tag}>          {tag> 

  {tag}{          {tag{ 

Inclusion of mechanisms similar to those of SGML external entities will be considered.

Comments and marked sections

Comments will be similar to version 2 comments, but the syntax will be changed so as to facilitate processing of SGML documents:

 <|-- ... --|>
Inclusion of mechanisms similar to the SGML marked sections (with keywords IGNORE, CDATA, RCDATA and INCLUDE) will be considered:
 

 <|[IGNORE[  ... ]]|> 

 <|[CDATA[[  ... ]]|> 

 <|[RCDATA[  ... ]]|> 

 <|[INCLUDE[ ... ]]|> 

Attributes

In earlier versions, a tag consists of a generic identifier which may be followed by an attribute string. The attribute strings play no role in the earlier versions, except to increase compatibility with SGML by leaving a space open for SGML attributes. Version 3 will follow up this strategy by incorporating most or all the syntactical features of SGML attributes.

In addition, a syntax for structured attributes proposed by Peter Cripps will be considered for inclusion in MECS (cf. Cripps 1996).

Discontinuation

An element opened by a start tag or delimiter tag may be discontinued by

 

  _tag| 

and then resumed again by
 

  |tag_ 

e.g. like this:
 

<tag/ ... _tag| --- |tag_ ... /tag> 

(In the example, the element indicated by '---' does not belong to the code's coded elements.)

Overlap

In version 3 codes of all types may overlap codes of any type (whereas in version 2 multi-element cannot overlap each other).

One problem with earlier versions is that tokens of the same code type cannot overlap:

 

  <s/  <s/  /s>  /s> 

will necessarily be interpreted as two hierarchically nested codes. In MECS version 3, a tag may include a special code token identifier which serves to overcome this limitation, eg as follows:
 

 <s #1/   <s #2/  /s #1>  /s #2> 

Document Structure

In version 3, there will be no restrictions on combinations of codes whatsoever: all codes may nest arbitrarily deep and codes of all types may overlap with each other.

Master Documents

As a result of the changes described above the MECS header format will be simplified.

As in earlier versions, the syntactical role of every tag can be deduced directly from its delimiters. If a document includes a MECS header it will therefore still be possible to deduce a document's entire code syntax, including its code inventory, from the encoded document itself.

Unlike earlier versions, however, the formal specification of the code system will not be contained in a Code Declaration Table (CDT), but in a Master Document, which is itself a well-formed MECS document. Correspondingly, the master document deducible from any well-formed MECS document is called its Minimal Master Document. It also follows that any Minimal Master Document is its own Minimal Master Document.

In addition, Format Master Documents will allow for the inclusion of element format declarations. A Format Master Document specifies for each of the codes in a code system whether its coded element should correspond to some specific format such as free text, numeric characters only, a date in some standard format, a closed list of string values, and so on.

As with version 2, all SGML documents will be formally MECS- conforming documents. However, the functional compatibility of version 3 with SGML will be improved.


3 MECS PROGRAM PACKAGE: USER GUIDE

This User Guide is meant as a help to a quick start for use of the program package. It does not cover all aspects or details of the programs. For more detailed technical information, cf. 3 below. Some knowledge of the basic MECS syntax is presupposed - cf. Part I section 1 for a brief introduction.

Peter Cripps of the Wittgenstein Archives has written a menu- driven user interface integrating all aspects of the MECS Program Package. This user interface will be documented separately and made available later.

3.1 Installation and System Requirements

All programs in the package run on IBM PCs with DOS version 3.x or later, and compatibles.

Users will normally receive a copy of the package on a floppy disk or as a zip archive containing a directory called MECS. To install the package, copy all files on a separate directory on your hard disk called e.g. 'C:\MECS', and add the full path name to your path string, e.g. by adding 'C:\MECS' to the path command in your AUTOEXEC.BAT file as follows:
   PATH=C:\;C:\DOS;...;C:\MECS
In all examples given below it will be assumed that this installation procedure has been followed.

The package occupies less than 700 Kb of disk space. A hard disk is recommended, although the package will also run on a floppy disk system. Memory requirements depend on the size of your documents. It is possible to run the programs with less than 200 Kb available memory, but in most cases you will need more. The programs do not make use of extended or expanded memory.

3.2 A Note to SGML Users

Users with no intention to use the MECS Program Package for processing of SGML documents or conversion of MECS documents to SGML may skip this section.

Users who do have such intentions will find the rest of this User Guide to be of help as an introduction to the MECS Program Package, even though the examples discussed here are not SGML examples.

Roughly, all SGML documents are MECS-conforming, and all documents created or modified in MECS can be converted to SGML.

However, this requires some qualification: SGML documents are MECS-conforming only provided that they do not make use of tag minimization or end tag omission 20. The MECS Program Package provides tools which ensure that any document you create in MECS either is or can be converted to an SGML-conforming document instance (). The Program Package also allows you to take steps to ensure that such conversion leads to no loss or distortion of information ().

You can test your SGML documents for MECS conformance with the program MECSVAL. SGML documents are MECS-conforming in virtue of certain exceptions and deviations from the main outline of the basic syntax of MECS which have been made precisely in order to enhance SGML compatibility (cf. Part I, ##) 21.

MECSVAL is the only program in the MECS Program Package which takes all of these exceptions and deviations fully into consideration 22.

Therefore, if you intend to do any serious work at all with SGML documents by means of the MECS Program Package, it is highly recommended that you first use the program SGMLMECS to convert them to a format accepted also by the other MECS programs. You may convert your documents back to SGML again with the program MECSSGML.

SGML has features and capabilities which MECS does not have, and vice versa. But while MECS 'knows' at least something about SGML, SGML does not 'know about' MECS at all. Features and capabilities of MECS which are not shared by SGML may create SGML syntax errors. MECS, on the other hand, is designed simply to accept and ignore those features of SGML which it does not share with MECS.

If you want to use MECS primarily as a tool to process SGML documents, you should be aware that there are certain features of SGML which though accepted are not supported by MECS ().

If you use MECS to create SGML documents, or want to be able to convert your MECS documents to SGML, you should avoid using MECS features which have no corollary in SGML, or be aware of the consequences of doing so ().

3.3 Creating and Validating Documents and CDTs

To create your first MECS document, type the command
   MECSVAL
at the DOS prompt and press return. The following menu will be displayed in the upper part of the screen:

 

+------------------------------------------------------------+ 

|C:\MYDIR               Mem:  433018     MECSVAL version 2.01| 

+------------------------------------------------------------+ 

|L LOG:               I Info               1 List directory  | 

|C CDT:               S Switches           2 Change directory| 

|T TXT:               M Create Minimal CDT 3 Copy file       | 

|E EDT:               D Check CDT          4 Print file      | 

|Q Quit               V Check CDT and TXT  5 Delete file     | 

+------------------------------------------------------------+ 

Press 'E', and MECSVAL will prompt you for a file name. Type a file name, e.g. 'DOC1', and press Enter to activate the MECSVAL editor.

The first thing you need to do, is to include a MECS header at the very beginning of your document. We will assume that you intend to use the MECS default delimiters (.). To save yourself some typing, you may include the default MECS header by pressing Ctrl+K, then R. When prompted for a file name, type 'C:\MECS\HEADMECS' (assuming that you installed the MECS Program Package on a directory called 'C:\MECS'), and press Enter. The top of your screen will now look like this:

 

+------------------------------------------------------------+ 

|+----------------------------------------------------------+| 

||DOC1.        Line 1  Col 1  Byte 1      Insert Indent Save|| 

|+----------------------------------------------------------+| 

| £ < > < / / > [ / | \ / | / ] { " \ }                      | 

|                                                            | 

|                                                            | 

Go to the line below the header and include the file C:\MECS\EX1 (or, alternatively, type the text below in). Your document should now look like this:
 

  £ < > < / / > [ / | \ / | / ] { " \ } 

 <|From EX1: |> 

 [dmi\0|6] 

 <paragraph/<title/Sample MECS Document>>> 

 <intro/<paragraph/<indent/3>This is a sample 

 <b/MECS> document which is intended to demonstrate 

 the use of currently available <b/MECS> 

 software./paragraph>/intro> 

Press F2 to store the text and exit the editor. Note that on the main menu DOC1 is now indicated as the current editor (EDT) file. If you need to review DOC1 again before proceeding, press 'E' and then Enter. To exit the editor and save, press F2. To exit without saving, press Ctrl+K, then Q.

You need to check your text for coding errors, but you have not yet created any Code Declaration Table (CDT). You may do both these things in one operation: press 'M' and type 'DOC1' when prompted for a text (TXT) file name. Because the text contained an error, you will get an error message and an indication of the line and column number where the error was detected:

 

+------------------------------------------------------------+ 

|C:\MYDIR               Mem:  433018     MECSVAL version 2.01| 

+------------------------------------------------------------+ 

|L LOG:               I Info               1 List directory  | 

|C CDT:               S Switches           2 Change directory| 

|T TXT:               M Create Minimal CDT 3 Copy file       | 

|E EDT: DOC1          D Check CDT          4 Print file      | 

|Q Quit               V Check CDT and TXT  5 Delete file     | 

+------------------------------------------------------------+ 

|Text file:                     C:\MYDIR\DOC1                | 

|Writing code declaration table: DOC1.CDT                    | 

|  Report from MECSVAL 25.8.1994, 22:30                      | 

|                                                            | 

|                                                            | 

|                                                            | 

|Errors in DOC1:                                             | 

|                                                            | 

|   3 [dmi\0|6]                                              | 

|   4 <paragraph/<title/Sample MECS Document>>>              | 

|                                             ^              | 

|Error 68: No one-element code active                        | 

|                                                            | 

|1 errors encountered in DOC1                                | 

|WARNING: CDT file may contain ERRORS                        | 

|Press Q to quit, any key to edit                            | 

+------------------------------------------------------------+ 

The error message 'No one-element code active' indicates that you have included a superfluous one-element end tag close delimiter, i.e. a '<' too many. Press any key (except 'Q'), and the editor will be activated with the cursor positioned at the exact location of the error. Correct the error (by deleting the superfluous '>'), exit and save by pressing F2. Repeat this process until you get the message 'No errors' on pressing 'M' at the main menu. At this stage, your screen should look like this:
 

+------------------------------------------------------------+ 

|C:\MYDIR               Mem:  430456     MECSVAL version 2.01| 

+------------------------------------------------------------+ 

|L LOG:               I Info               1 List directory  | 

|C CDT: DOC1.CDT      S Switches           2 Change directory| 

|T TXT: DOC1          M Create Minimal CDT 3 Copy file       | 

|E EDT: DOC1          D Check CDT          4 Print file      | 

|Q Quit               V Check CDT and TXT  5 Delete file     | 

+------------------------------------------------------------+ 

|Text file:                     C:\MYDIR\DOC1                | 

|Writing code declaration table: DOC1.CDT                    | 

|  Report from MECSVAL 25.8.1994, 22:30                      | 

|                                                            | 

|                                                            | 

|                                                            | 

|Errors in DOC1:                                             | 

|                                                            | 

|No errors encountered in text file DOC1                     | 

|                                                            | 

|                                                            | 

|                                                            | 

|                                                            | 

|                                                            | 

|                                                            | 

|                                                            | 

+------------------------------------------------------------+ 

You have now created a (minimal) CDT called DOC1.CDT on the basis of DOC1. Review your minimal CDT by pressing 'E' and entering 'DOC1.CDT'. It will look like this:
 

  £ < > < / / > [ / | \ / | / ] { " \ } 

 CDEMST 

 abcdefhilmnoprstuvwy 

 036 

 . 

 £ 

 abdeghilmnoprt 

 £ 

 n o £ p r m 

 o b 

 o indent 

 o intro 

 o paragraph 

 o title 

 2 dmi 

As you can see, all the free characters and codes you used in DOC1 have been declared. Exit by pressing Ctrl+K, then Q. Assuming that you conclude from this inspection that you need to declare additional characters and codes used in the rest of this example, it is suggested that you extend the minimal CDT. An example CDT is supplied with the Program Package under the file name EX.CDT, so in this case you may save some typing by simply copying C:\MECS\EX.CDT. Normally, however, you would have to work some other way to create your extended CDT, e.g.: press '3' on the main menu and copy DOC1.CDT to a file called EX.CDT, and then edit EX.CDT to suit. The example EX.CDT looks like this:
 

  £ < > < / / > [ / | \ / | / ] { " \ } 

 abcdefghijklmnopqrstuvwxyz 

 ABCDEFGHIJKLMNOPQRSTUVWXYZ 

 1234567890 

 ,;.:-()!?"' 

 *%&=+ß 

 abcdefghijklmnopqrstuvwxyz 

 ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890._- 

 n o # p r d 

 o b 

 o indent 

 o intro 

 o paragraph 

 o title 

 2 dmi 

 o REF 

 n ind 

 n l 

 o example 

 o i 

 o note 

 o s 

 o u 

 p s 

 r reverse_E 

 r reverse_A 

 d exist 

Press F2 to store and exit EX.CDT. You may check EX.CDT for errors by pressing 'C' at the main menu, entering 'EX.CDT' when prompted for a file name. Then press 'D'. If errors are encountered, an error message will be displayed in the lower part of the screen:
 

+------------------------------------------------------------+ 

|C:\MYDIR               Mem:  430456     MECSVAL version 2.01| 

+------------------------------------------------------------+ 

|L LOG:               I Info               1 List directory  | 

|C CDT: EX.CDT        S Switches           2 Change directory| 

|T TXT: DOC1          M Create Minimal CDT 3 Copy file       | 

|E EDT: EX.CDT        D Check CDT          4 Print file      | 

|Q Quit               V Check CDT and TXT  5 Delete file     | 

+------------------------------------------------------------+ 

|Reading code declaration table: EX.CDT                      | 

|Text file:                     C:\MYDIR\DOC1                | 

|  Report from MECSVAL 25.8.1994, 22:33                      | 

|                                                            | 

|                                                            | 

|  20 o i                                                    | 

|  21 o note/                                                | 

|           ^                                                | 

|Error 39: Illegal character in generic identifier           | 

|                                                            | 

|1 errors encountered in code declaration table EX.CDT       | 

|Press Q to quit, any key to edit                            | 

|                                                            | 

|                                                            | 

|                                                            | 

|                                                            | 

+------------------------------------------------------------+ 

In this case, you had mistakenly included the tag close delimiter in the declaration of a generic identifier. Press any key (except 'Q'), and the editor will be activated with the cursor positioned at the location in the file where the error was detected. Correct the error, store and exit, and press 'D' at the main menu again. Repeat this process until you get no error messages. Edit DOC1 again and add the following text (by typing it in, or by including the file C:\MECS\EX2) 23:
 

 <|From EX2: |> 

 <paragraph/<s/We <s/will see <s/some examples> 

 of recursive codes, of <b/elements <u/which/b> 

 overlap/u>, of/s> special characters<ind> like 

 {reverse_A}, {reverse_E}, {reverse_E\exist}, 

 and {"E"\exist}, and of<i> substitutions in 

 <note/simplified/note>/s> MECS-WIT style:/paragraph> 

 <exmple/<paragraph/ 

 <ind><s/Ich besuche gern das alte <i/kleine> 

 [s|Schloß|<i/Haus>] meines [s|Onkels.|<i/Vaters.>]/s> 

 <ind><s/Ich besuche gern das 

 [s|alte Schloß meines Onkels|<i/kleine Haus> meines 

 <i/Vaters>]/s>/paragraph> 

 /example> 

 <paragraph/This is the end of our 

 <note/very artificial> example./paragraph> 

Press F2 to store and exit DOC1. Instead of creating yet another minimal CDT on the basis of the new version of DOC1, you may check it against EX.CDT by pressing 'V' on the main menu. An error message will be displayed in the lower part of the screen:
 

+------------------------------------------------------------+ 

|C:\MYDIR               Mem:  433018     MECSVAL version 2.01| 

+------------------------------------------------------------+ 

|L LOG:               I Info               1 List directory  | 

|C CDT: EX.CDT        S Switches           2 Change directory| 

|T TXT: DOC1          M Create Minimal CDT 3 Copy file       | 

|E EDT: DOC1          D Check CDT          4 Print file      | 

|Q Quit               V Check CDT and TXT  5 Delete file     | 

+------------------------------------------------------------+ 

|Reading code declaration table: EX.CDT                      | 

|Text file:                     C:\MYDIR\DOC1                | 

|  Report from MECSVAL 25.8.1994, 22:34                      | 

|                                                            | 

|                                                            | 

|No errors encountered in code declaration table EX.CDT      | 

|                                                            | 

|                                                            | 

|Errors in DOC1:                                             | 

|                                                            | 

|  13 {reverse_A}, {reverse_E}, {reverse_E\exist},           | 

|  14 and {"E"\exist}, and of<i> substitutions in            | 

|                             ^                              | 

|Error 55: Wrong type                                        | 

|                                                            | 

|1 errors encountered in DOC1                                | 

|Press Q to quit, any key to edit                            | 

+------------------------------------------------------------+ 

In this case, you have used 'i', which according to EX.CDT should be a one-element code, as a no-element code. We will assume that the no-element code 'l' was what you intended. Press any key (except 'Q'), and the editor will be activated with the cursor positioned at the location where the error was detected. Correct the error by changing <i> to <l>, store and exit.

You have now been through the process of creating a MECS document, reconstructing a minimal CDT, creating and validating your own CDT, and editing and validating a document in relation to a CDT. The entire process has been carried out interactively, and errors have been detected and corrected one by one.

At this stage, you may go on and repeat the process of validating and editing DOC1 until you get no more error messages. Instead, you may have MECSVAL go through the entire document and write all remaining error messages to a log file: press 'L' on the main menu and enter a log file name, e.g. DOC1.LOG. Press 'V' to check the document again. If errors are detected, you will be prompted as follows:

 

+------------------------------------------------------------+ 

|C:\MYDIR               Mem:  427416     MECSVAL version 2.01| 

+------------------------------------------------------------+ 

|L LOG: DOC1.LOG      I Info               1 List directory  | 

|C CDT: EX.CDT        S Switches           2 Change directory| 

|T TXT: DOC1          M Create Minimal CDT 3 Copy file       | 

|E EDT: DOC1          D Check CDT          4 Print file      | 

|Q Quit               V Check CDT and TXT  5 Delete file     | 

+------------------------------------------------------------+ 

|Reading code declaration table: EX.CDT                      | 

|Text file:                     C:\MYDIR\DOC1                | 

|Log file:                      C:\MYDIR\DOC1.LOG            | 

|                                                            | 

|No errors encountered in code declaration table EX.CDT      | 

|                                                            | 

|3 errors encountered in DOC1                                | 

|Press Q to quit, any key to edit                            | 

|                                                            | 

|                                                            | 

|                                                            | 

|                                                            | 

|                                                            | 

|                                                            | 

+------------------------------------------------------------+ 

A complete list of error messages has been written to the log file. Press any key, and MECSVAL will activate the editor with a split screen: the log file will be displayed in the lower window, the document file in the upper window:
 

+------------------------------------------------------------+ 

|+----------------------------------------------------------+| 

||DOC1.        Line 16 Col 8  Byte 590    Insert Indent     || 

|+----------------------------------------------------------+| 

|<b/MECS> document which is intended to demonstrate          | 

|the use of currently available <b/MECS>                     | 

|software./paragraph>/intro>                                 | 

|<|From EX2: |>                                              | 

|<paragraph/<s/We <s/will see <s/some examples>              | 

|of recursive codes, of <b/elements <u/which/b>              | 

|overlap/u>, of/s> special characters<ind> like              | 

|{reverse_A}, {reverse_E}, {reverse_E\exist},                | 

|and {"E"\exist}, and of<l> substitutions in                 | 

|<note/simplified/note>/s> MECS-WIT style:/paragraph>        | 

|<exmple/<paragraph/                                         | 

|+----------------------------------------------------------+| 

||DOC1.LOG     Line 19 Col 1  Byte 701    Insert Indent Save|| 

|+----------------------------------------------------------+| 

|  15 <note/simplified/note>/s> MECS-WIT style:/paragraph>   | 

|  16 <exmple/<paragraph/                                    | 

|           ^                                                | 

|Error 62: Illegal generic identifier                        | 

|                                                            | 

|  21 <i/Vaters>]/s>/paragraph>                              | 

|  22 /example>                                              | 

|            ^                                               | 

|Error 59: START tag missing                                 | 

|                                                            | 

|Error 99: exmple   started at line  16   8 - end tag missing| 

+------------------------------------------------------------+ 

The log file is the active window. (You may have to scroll the lower window with the arrow keys in order to see in it the exact part of the log file displayed above.) If you switch to the upper window (by pressing F6) you will see that the cursor is positioned at the location of the last error reported in the log file.

It is frequently the case that one error causes several error messages. In this particular case, all three error messages are caused by one and the same error: you have mistyped <exmple/ for the start tag <example/ . MECSVAL first reports that <exmple/ is not a legal generic identifier, then indicates that /example> closes a code that has not been opened (because the start tag was mistyped), and finally, that the erroneous code <exmple/ has not been closed.

In many cases where the text file contains more than just a few errors, it may be convenient to use the log file option. You may then correct all errors in one pass by switching between the two windows, - scrolling the log file window to see error messages and editing the text window to correct the errors.

Once you have corrected all errors, check DOC1 again by pressing 'V' on the main menu. If you get no error messages you may exit from MECSVAL by pressing 'Q' at the main menu.

MECSVAL may also be run in batch mode. You can check that DOC1 conforms to the basic syntax of MECS by entering the DOS command line
   MECSVAL - DOC1 DOC1.LOG
If DOC1 is a MECS-conforming document, this command will cause MECSVAL to display a 'No errors' message and to write DOC1's minimal CDT to a new file called DOC1.CDT. If DOC1 is not MECS-conforming, it has no minimal CDT. MECSVAL will then display an error message, and a list of errors will be found in DOC1.LOG.

If you want to check that DOC1 conforms to a specific CDT, e.g. EX.CDT, you should enter the following command at the DOS prompt:
   MECSVAL EX.CDT DOC1 DOC1.LOG
MECSVAL includes several options and features which have not been mentioned here. Cf. 2.6 for more information on MECSVAL log files, and 3 for a comprehensive documentation of the program.

Note: Any use of other programs in the MECS Program Package presupposes that the input document is a well-formed MECS document, i.e. that it has a minimal CDT. Therefore, you should make sure that MECSVAL reports no error messages when reconstructing a minimal CDT from your documents, before processing them with any other program in the package. If used on texts that contain MECS syntax errors, the other programs in the package may cause unpredictable results.

3.4 Formatting Documents

Now that you have corrected all errors DOC1 will look like this:

 

  £ < > < / / > [ / | \ / | / ] { " \ } 

 <|From EX1: |> 

 [dmi\0|6] 

 <paragraph/<title/Sample MECS Document>> 

 <intro/<paragraph/<indent/3>This is a sample 

 <b/MECS> document which is intended to demonstrate 

 the use of currently available <b/MECS> 

 software./paragraph>/intro> 

 

 <|From EX2: |> 

 <paragraph/<s/We <s/will see <s/some examples> 

 of recursive codes, of <b/elements <u/which/b> 

 overlap/u>, of/s> special characters<ind> like 

 {reverse_A}, {reverse_E}, {reverse_E\exist}, 

 and {"E"\exist}, and of substitutions in 

 <note/simplified/note>/s> MECS-WIT style:/paragraph> 

 <example/<paragraph/ 

 <ind><s/Ich besuche gern das alte <i/kleine> 

 [s|Schloß|<i/Haus>] meines [s|Onkels.|<i/Vaters.>]/s> 

 <ind><s/Ich besuche gern das 

 [s|alte Schloß meines Onkels|<i/kleine Haus> meines 

 <i/Vaters>]/s>/paragraph> 

 /example> 

 <paragraph/This is the end of our 

 <note/very artificial> example./paragraph> 

Let us assume that you would like to tidy up the layout of DOC1, and that you want to insert codes containing reference numbers (which may be useful for a variety of reasons) at intervals within the text, e.g. preceding all 'paragraph'-codes. Type the following command at the DOS prompt:
   MECSFORM DOC1 / 55 E paragraph s 2 REF 1
This command will cause the program MECSFORM to read and overwrite DOC1, format the text to a maximum line length of 55 characters, extend all reduced tags, indent all one-element paragraph- and s-codes 2 characters, and insert REF-codes containing reference numbers immediately preceding all paragraph-codes, starting with reference number 1 and incrementing the numbers throughout the file. The result is a (hopefully) more perspicuous version of DOC1 which looks like this:
 

  £ < > < / / > [ / | \ / | / ] { " \ } 

 <|From EX1: |> [dmi/2\0/dmi|6/dmi] 

 <REF/1/REF><paragraph/<title/Sample MECS 

   Document/title>/paragraph> 

   <intro/ 

 <REF/2/REF><paragraph/<indent/3/indent>This is a sample 

   <b/MECS/b> document which is intended to demonstrate 

   the use of currently available <b/MECS/b> 

   software./paragraph> 

   /intro> <|From EX2: |> 

 <REF/3/REF><paragraph/ 

   <s/We 

     <s/will see 

       <s/some examples/s> of recursive codes, of 

       <b/elements <u/which/b> overlap/u>, of/s> 

     special characters<ind> like {reverse_A}, 

     {reverse_E}, {reverse_E\exist}, and {"E"\exist}, 

     and of substitutions in 

     <note/simplified/note>/s> MECS-WIT 

   style:/paragraph> 

   <example/ 

 <REF/4/REF><paragraph/<ind> 

   <s/Ich besuche gern das alte <i/kleine/i> 

     [s/2|Schloß/s|<i/Haus/i>/s] meines 

     [s/2|Onkels./s|<i/Vaters./i>/s]/s> <ind> 

   <s/Ich besuche gern das [s/2|alte Schloß meines 

     Onkels/s|<i/kleine Haus/i> meines 

     <i/Vaters/i>/s]/s> /paragraph> 

   /example> 

 <REF/5/REF><paragraph/This is the end of our <note/very 

   artificial/note> example./paragraph> 

If we assume that you would now like to format the text of DOC1 into a more compact, even if less conspicuous form, you may type the following command at the DOS prompt:
   MECSFORM DOC1 DOC1.MIN 60 R
This command causes the program MECSFORM to create a new file called DOC1.MIN with a maximum line length of 60, reducing all reducible tags. DOC1.MIN will look like this:
 

  £ < > < / / > [ / | \ / | / ] { " \ } 

 <|From EX1: |> [dmi\0|6] <REF/1><paragraph/<title/Sample 

 MECS Document>> <intro/ <REF/2><paragraph/<indent/3>This is 

 a sample <b/MECS> document which is intended to demonstrate 

 the use of currently available <b/MECS> software.> > <|From 

 EX2: |> <REF/3><paragraph/ <s/We <s/will see <s/some 

 examples> of recursive codes, of <b/elements <u/which/b> 

 overlap>, of> special characters<ind> like {reverse_A}, 

 {reverse_E}, {reverse_E\exist}, and {"E"\exist}, and of 

 substitutions in <note/simplified>> MECS-WIT style:> 

 <example/ <REF/4><paragraph/<ind> <s/Ich besuche gern das 

 alte <i/kleine> [s|Schloß|<i/Haus>] meines 

 [s|Onkels.|<i/Vaters.>]> <ind> <s/Ich besuche gern das 

 [s|alte Schloß meines Onkels|<i/kleine Haus> meines 

 <i/Vaters>]> > > <REF/5><paragraph/This is the end of our 

 <note/very artificial> example.> 

It should be noted that DOC1 and DOC1.MIN are equivalent, and that they will produce identical output from all other MECS programs.

3.5 Reformatting Documents

The program MECSPRES enables you to create reformatted versions of your document in a number of different word processor formats and with a variety of different typographic features. First, however, you will have to create a profile definition table - a PDT. A PDT contains specifications as to how the codes in a document should be realized in the output from the program MECSPRES.

An example PDT is supplied with the Program Package under the file name EX-A.PDT, so in this case you may save some typing by simply copying C:\MECS\EX-A.PDT. Normally, however, you would have to go about in some other way to create your PDT, e.g.: press '3' on the main menu and copy EX.CDT to a file called EX-A.PDT, and then edit EX-A.PDT to suit. (You may use the MECSVAL editor or any other word processor which allows you to store files in so-called plain ASCII or DOS format.)

The example EX-A.PDT looks like this:

 

  £ < > < / / > [ / | \ / | / ] { " \ } 

 n o # p r m _ £ 

 o b         £ b  £ £ £        £  £ 

 o indent    j £  £ £ £        £  £ 

 o intro     £ i  £ £ £        £  £ 

 o paragraph e £  £ £ £        £  £ 

 o title     s le £ £ £        £  £ 

 2 dmi       7 £  £ £ £        £  £ 

 o REF       q b  £ £ £        £  £ 

 n ind       g £  £ £ £        £  £ 

 o example   £ m  £ £ £        £  £ 

 o note      £ £  ( £ )        £  £ 

 o u         £ u  £ £ £        £  £ 

 p s         5 £  £ £ £        £  £ 

 r reverse_E £ £  £ {#6#121} £ £  £ 

 r reverse_A £ £  £ {#6#122} £ £  £ 

 m exist     £ £  £ {#6#121} £ £  £ 

Having stored EX-A.PDT, type the following command at the DOS prompt:
   MECSPRES EX-A.PDT DOC1 DOC1.ANW N W I
This command causes MECSPRES to apply the profile definition table EX-A.PDT to the document DOC1 to create a reformatted document file DOC1.ANW, with the general layout 'N', in WordPerfect 5.1 format ('W'), ignoring undeclared codes (I). The reformatted document, DOC1.ANW, will look like this:
 

   [Not reproducible in HTML; see PostScript version of this text] 

Revising the PDT you may change the layout of the output file. E.g., with the following profile definition table, EX-B.PDT (also included with the MECS Program Package):
 

  £ < > < / / > [ / | \ / | / ] { " \ } 

 n o # p r m _ £ 

 # a NB:_ 

 o b         £ bp £ £ £        £  £ 

 o indent    j £  £ £ £        £  £ 

 o paragraph e £  £ £ £        £  £ 

 o title     £ s  £ £ £        £  £ 

 2 dmi       7 £  £ £ £        £  £ 

 o REF       d £  £ £ £        £  £ 

 n ind       g £  £ £ £        £  £ 

 n l         h £  £ £ £        £  £ 

 o i         £ r  £ £ £        £  £ 

 o note      b £  ( £ )        a  c 

 o u         £ u  £ £ £        £  £ 

 p s         2 £  £ £ £        £  £ 

 r reverse_E £ £  £ {#6#121} £ £  £ 

 r reverse_A £ £  £ {#6#122} £ £  £ 

 m exist     £ £  £ {#6#121} £ £  £ 

the command
   MECSPRES EX-B.PDT DOC1 /DOC1.BDW D W I CR13
 

will create the following WordPerfect 5.1 document, DOC1.BDW:

 

   [Not reproducible in HTML; see PostScript version of this text] 

In addition to WordPerfect 5.1, available output formats are so- called plain ASCII, MECS-like presentational markup format, Folio Views markup format, HTML format, so-called screen display format as well as a number of other formats and layouts. E.g., the command
   MECSPRES EX-A.PDT DOC1 /DOC1.ADC D C I - 56
will cause MECSPRES to create a plain ASCII file, DOC1.ADC, which looks like this:
 

   1                   SAMPLE MECS DOCUMENT 

   2    This is a sample MECS document which is intended to 

     demonstrate the use of currently available MECS 

     software. 

   3 We will see some examples of recursive codes, of 

     elements which overlap, of special characters 

           like •, •, •, and E, and of substitutions in 

     (simplified) MECS-WIT style: 

   4       Ich besuche gern das alte kleine Haus meines 

     Vaters. 

           Ich besuche gern das kleine Haus meines Vaters 

   5 This is the end of our (very artificial) example. 

while the command
   MECSPRES EX-A.PDT DOC1 /DOC1.ANM N M I - 50
 

will create a document DOC1.ANM formatted with MECSPRES' own, MECS-like presentational markup:

 

 <C/<l/<e/SAMPLE MECS DOCUMENT/e>/l>/C> 

 <T><i/This is a sample <b/MECS/b> document which is/i> 

 <i/intended to demonstrate the use of currently/i> 

 <i/available <b/MECS/b> software./i> 

 We will see some examples of recursive codes, of 

 <b/elements <u/which/b> overlap/u>, of special characters 

 <T>like {#6#122}, {#6#121}, {#6#121}, and 

 {#6#121}, and of substitutions in <b/(/b>simplified<b/)/b> 

 MECS-WIT style: 

 <T>Ich besuche gern das alte kleine Haus meines 

 Vaters. 

 <T>Ich besuche gern das kleine Haus meines 

 Vaters 

 This is the end of our <b/(/b>very artificial<b/)/b> 

 example. 

If you would like to review output on screen before deciding to create an output file, replace the output file name with a dash. If you want to avoid word processor-specific formatting codes displayed on screen, replace the fifth parameter with an 's'. Examples:
   MECSPRES EX-A.PDT DOC1 - N S I
   MECSPRES EX-B.PDT DOC1 - D S I
As already mentioned, MECSPRES offers a variety of output formats. In addition, the program offers a number of different layouts and a multitude of layout features. Only a tiny selection of these have been suggested by the above examples. For a full list of available formats, layouts and layout features, cf. 3

3.6 Analyzing Documents

Several of the programs in the package may help you analyze the structure of your documents: MECSVAL and MECSPRES, which we have already looked at, as well as MECSLYSE, MECSBETA, BETATXT, MECSSPEL, ALPHATXT and MECSGRAB.

3.6.1 Code Status Report

As already mentioned (cf. 2.3), MECSVAL allows you to store a log of the validation of a document. This log also contains useful status information on the document in question. The log file may be created without entering MECSVAL's interactive mode. E.g., if you type the following command at the DOS prompt:
   MECSVAL - DOC1 DOC1.LOG
the log file DOC1.LOG will look like this:

 

   Report from MECSVAL 26.8.1994, 23:15 

 

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

 Errors in DOC1: 

 

 No errors encountered in text file DOC1 

 

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

 Status report: 

 

 No-element codes: 

 ind              3 

 l                1 

 

 One-element codes: 

 REF              5 

 b                3 

 example          1 

 i                5 

 indent           1 

 intro            1 

 note             2 

 paragraph        5 

 s                5 

 title            1 

 u                1 

 

 Poly-element codes: 

 s                3 

 

 N-element codes: 

 dmi              1 

 

 Character representation codes: 

 reverse_A        1 

 reverse_E        2 

 

 Character disambiguation codes: 

 exist            2 

 

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

 S U M M A R Y : 

 

                Number of codes:  types   |   tokens 

 No-element codes:                    2           4 

 One-element codes:                  11          30 

 Poly-element codes:                  1           3 

 Character representation codes:      2           4 

 Character disambiguation codes:      1           2 

 N-element codes:                     1           1 

 

 Sum total:                          18          44 

 

 Maximum nesting level:     5 

 Overlapping codes:         1 

The first part of the log file lists all errors in the text file, if any (in this case, no errors have been found). The second part lists all codes found in the document, and indicates the number of occurrences of each code. The third part indicates the number of types and tokens of codes of each code type found in the document. The last two lines indicate the number of overlapping codes and the maximum nesting level of codes in the document.

3.6.2 Document Structure and Overlapping Elements

The program MECSLYSE gives additional information on the structure of documents. Type the following command at the DOS prompt:
   MECSLYSE DOC1 DOC1.TR1 - - O REF
The output file DOC1.TR1 will contain a list of all overlapping codes, as well as a complete listing of the document's element structure in the form of an indented table:

 

 MECSLYSE 

 File in:  DOC1 

 File out: DOC1.TR1 

 

 OVERLAP: <u/             /b>          15 21    15 29 

  <b/              started at     15  9 

   Position     15 29 Level   1 

 

 DOCUMENT STRUCTURE 

 

     2 22 |[dmi\ 

 

     3  5 |<REF/1> 

     3 22 |<paragraph/ 

     3 29 | . <title/ 

     5  9 |<intro/ 

 

     6  5 | . <REF/2> 

     6 22 | . <paragraph/ 

     6 30 | .  . <indent/ 

     7  5 | .  . <b/ 

     8 36 | .  . <b/ 

 

    11  5 |<REF/3> 

    11 22 |<paragraph/ 

    12  5 | . <s/ 

    13  7 | .  . <s/ 

    14  9 | .  .  . <s/ 

    15  9 | .  .  . <b/ 

    15 21 | .  .  .  . <u/ 

    19 10 | .  . <note/ 

    21 11 |<example/ 

 

    22  5 | . <REF/4> 

    22 22 | . <paragraph/ 

    23  5 | .  . <s/ 

    23 34 | .  .  . <i/ 

    24  9 | .  .  . [s| 

    24 21 | .  .  .  . <i/ 

    25  9 | .  .  . [s| 

    25 22 | .  .  .  . <i/ 

    26  5 | .  . <s/ 

    26 31 | .  .  . [s| 

    27 16 | .  .  .  . <i/ 

    28  7 | .  .  .  . <i/ 

 

    30  5 |<REF/5> 

    30 22 |<paragraph/ 

    30 51 | . <note/ 

 

 SUMMARY 

 

 Overlapping code types 

 

 <b/              <u/               1 

 

 Overlapping codes:                        1 

 Max. depth of overlapping codes:          1 

 Max. no of overlapping codes at           0 

 Number of pairs of overlapping codes      1 

3.6.3 Breakpoints and Recursion

The command
   MECSLYSE DOC1 DOC1.TR2 o paragraph R
will cause the output file DOC1.TR2 to contain a list of all codes active (if any) at the start and end points of all occurrences of the one-element code 'paragraph', as well as a list of all occurrences of recursive codes:

 

 MECSLYSE 

 File in:  DOC1 

 File out: DOC1.TR2 

 

 BREAKPOINT: <paragraph/    at   6  22, 

    <intro/      started at      5   9, still active 

 

 BREAKPOINT: /paragraph>    at   9  22, 

    <intro/      started at      5   9, still active 

 

 RECURSION: <s/             at     13  7 and at     12  5 

 

 RECURSION: <s/             at     14  9 and at     13  7 

                                         and at     12  5 

 

 BREAKPOINT: <paragraph/    at  22  22, 

    <example/    started at     21  11, still active 

 

 BREAKPOINT: /paragraph>    at  28  34, 

    <example/    started at     21  11, still active 

 

 SUMMARY: 

 Codes at breakpoints:      4 

 Recursive codes:           3 

3.6.4 Betatexts (Substitutions)

The concept of a betatext has been invented in the course of work related to the development of a registration standard for the Wittgenstein Archives at the University of Bergen, MECS-WIT. In MECS-WIT, multi-element codes are used to indicate substitutions (variants, parallel texts) in manuscripts. It belongs to the definition of a substitution that each of its elements are incompatible with any other element, but at the same time it is a requirement that every element can be embedded in the context of the rest of the text.

A betatext is a version of the text corresponding to one particular combination of such substitution elements. A text with many substitutions therefore has a quite considerable number of betatexts. The programs MECSBETA and BETATXT were written in order to help identify and check these possible combinations of text elements.

BETATXT serves to compute the number and display all possible combinations of elements of specified multi-element codes in a document. In order to achieve this, the document needs preprocessing by MECSPRES. The following profile definition table EX-BETA.PDT (supplied with the Program Package) specifies a profile suitable for the required preprocessing:

 

  £ < > < / / > [ / | \ / | / ] { " \ } 

 n o # p r m _ £ 

 o indent    d £  £ £ £        £ £ 

 o paragraph e £  £ £ £        £ £ 

 2 dmi       d £  £ £ £        £ £ 

 o REF       w £  £ £ £        £ £ 

 o s         2 £  £ £ £        £ £ 

 p s         b £  £ £ £        £ £ 

 r reverse_E £ £  £ {#6#121} £ £ £ 

 r reverse_A £ £  £ {#6#122} £ £ £ 

 m exist     £ £  £ {#6#121} £ £ £ 

Type the following command at the DOS prompt:
   MECSBETA EX-BETA.PDT DOC1 DOC1.BET
Or, alternatively, give the following two commands 24:
   MECSPRES EX-BETA.PDT DOC1 /TEMPFILE. B B I
   BETATXT TEMPFILE. DOC1.BET
DOC1.BET will contain a list of all possible combinations of elements of the poly-element code 's' within the scope of one-element 's'-codes of DOC1:
 

 4 

 Ich besuche gern das alte kleine 

 ->Schloß meines Onkels. 

 ->Schloß meines Vaters. 

 ->Haus meines Onkels. 

 ->Haus meines Vaters. 

 

 Ich besuche gern das 

 ->alte Schloß meines Onkels 

 ->kleine Haus meines Vaters 

 

 ----------------------------------- 

 Beta:                             8 

The last line indicates that the total number of "betatexts" generated by the document is 8.

3.6.5 Spell Checking

Spell checking may be a problem with heavily marked up files - ordinary spell checkers are not able to distinguish markup from text and are therefore also unable to identify strings to be checked for spelling. In general it is therefore necessary to perform spell checking on reformatted versions of the encoded documents - with the considerable disadvantage of having to trace the sources of errors in the marked-up files manually.

The Program Package contains two programs which may help to remedy this problem - MECSPRES and ALPHATXT. First, a word list is created by designing an appropriate profile definition table. EX- ALPHA.PDT (which comes with the program package) is an example of such a profile definition:

 

  £ < > < / / > [ / | \ / | / ] { " \ } 

 n o # p r m _ 

 ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz%&/-• 

 o example   d £  £ £ £        £  £ 

 o REF       d £  £ £ £        £  £ 

 o indent    d £  £ £ £        £  £ 

 o title     d £  £ £ £        £  £ 

 o intro     d £  £ £ £        £  £ 

 o paragraph e o  £ £ £        £  £ 

 o s         2 £  £ £ £        £  £ 

 r reverse_E £ £  £ {#6#121} £ £  £ 

 r reverse_A £ £  £ {#6#122} £ £  £ 

 m exist     £ £  £ {#6#121} £ £  £ 

This profile suppresses all one-element 'example', 'REF', 'indent', 'title' and 'intro' codes, and changes the first free character of all 'paragraph' codes to lower case. It also defines 'paragraph' as a section code and 's' as a segment code (). In effect, the profile suppresses the title and intro of DOC1 and extracts the English text from the rest of the document. Alternatively, we could have defined a different filtering profile to extract only the German text of the document. The command
   MECSPRES EX-ALPHA.PDT DOC1 DOC1.ALF B A I
creates a reformatted version of DOC1 in alpha format, DOC1.ALF, i.e. a list of all words in DOC1 preceded by line and column reference numbers:
 

 #    12   6 we 

 -    13   8 will 

 .    13  13 see 

 -    14  10 some 

 .    14  15 examples 

 .    14  27 of 

 .    14  30 recursive 

 .    14  40 codes 

 .    14  47 of 

 .    15  10 elements 

 .    15  22 which 

 .    15  31 overlap 

 .    15  43 of 

 .    16   5 special 

 .    16  13 characters 

 .    16  29 like 

 .    16  44 • 

 .    17  15 • 

 .    17  34 • 

 .    17  37 and 

 .    17  51 • 

 .    18   5 and 

 .    18   9 of 

 .    18  15 substitutions 

 .    18  29 in 

 .    19  11 simplified 

 .    19  31 MECS-WIT 

 .    20   3 style 

 #    30  23 this 

 .    30  28 is 

 .    30  31 the 

 .    30  35 end 

 .    30  39 of 

 .    30  42 our 

 .    30  52 very 

 .    31   3 artificial 

 .    31  20 example 

This format is accepted by the program ALPHATXT. Assuming that the file EX-ENG.LIS contains a master list of english words, the command
   ALPHATXT R - EX-ENG.LIS - DOC1.ALF DOC1.OK DOC1.CHK
will make ALPHATXT read DOC1.ALF and ask the user either to accept or reject all words not occurring in EX-ENG.LIS. Those which are accepted are written to DOC1.OK, while those which are rejected are written to DOC1.CHK.

Assuming that all words except 'overlap', 'artificial', 'recursive', and 'MECS-WIT' are already included in EX-ENG.LIS, and that the two first are accepted whereas the two last are rejected, DOC1.CHK will look like this:

 

 .      14 30  recursive 

 .      19 31  MECS-WIT 

while DOC1.OK will look like this:
 

 overlap 

 artificial 

To perform both the above steps in one operation, you can give the command:

MECSSPEL EX-ENG.LIS EX-ALPHA.PDT DOC1 25

Since DOC1.CHK contains references by line and column number to relevant locations in DOC1, it is easy to retrieve these locations in DOC1 by displaying DOC1.CHK in a parallel window while correcting DOC1.

Since the words listed in DOC1.OK are accepted by the user it may be convenient to add them to EX-ENG.LIS for use in later spell checking. This can be done simply by appending the file DOC1.OK to EX-ENG.LIS. However, in order to make EX-ENG.LIS an ordered list, the new words in DOC1.OK may be inserted in their proper places with the following command:
   ALPHATXT OR - EX-ENG.LIS DOC1.OK - /EX-ENG.LIS
 

3.6.6 Frequency Word Lists and Simple Statistical Analyses

MECSPRES and ALPHATXT can be used in a variety of ways and combinations () to build up and maintain master word lists and check individual document files.

ALPHATXT allows for user-defined character sort procedures and can therefore produce a wide range of differently sorted alphabetic word lists from a document - for details on this. ALPHATXT can also produce frequency word lists and simple statistical analyses. The following command:
   ALPHATXT FNRS - - - DOC1.ALF DOC1.WL - DOC1.STS
will write a frequency word list to DOC1.WL and a summary of statistical information to DOC1.STS. The frequency word list is sorted according to descending frequency. The first few lines of the file DOC1.WL will look like this:

 

 of                                       5 

 •                                        4 

 and                                      2 

 MECS-WIT                                 1 

 artificial                               1 

 characters                               1 

 codes                                    1 

The statistical summary DOC1.STS indicates total number of characters, strings (words), string types (word forms), sections and segments; their mean, maximum and minimum values and standard deviation:
 

 Chars:    170 

  Strings:  37    Chars/String:  4.59 Min: 1 Max:  8 StdDv: 2.39 

 Segments:   4 Strings/Segment:  9.25 Min: 1 Max:  1 StdDv: 4.33 

 Sections:   2 Strings/Section: 18.50 Min: 1 Max:  1 StdDv: 4.80 

    Types:  29     Tokens/Type:  1.28 Min: 1 Max: 26 StdDv: 0.97 

3.6.7 Extracting Elements

The program MECSGRAB serves to extract specified elements from a document. E.g. the command
   MECSGRAB DOC1 DOC1.GRB o note RT
will extract from DOC1 all one-element 'note' codes, preceded by their line and column reference numbers, to the file DOC1.GRB:

 

 .   19 10 <note/simplified/note> 

 .   30 51 <note/very artificial/note> 

Similarly, the command
   MECSGRAB DOC1 DOC1.GRB m s T
extracts all multi-element 's' codes to the file DOC1.GRB, but this time without line and column references:
 

 [s/2|Schloß/s|<i/Haus/i>/s] 

 [s/2|Onkels./s|<i/Vaters./i>/s] 

 [s/2|alte Schloß meines Onkels/s|<i/kleine Haus/i> meines 

 <i/Vaters/i>/s] 

Output from MECSGRAB may in turn be used as input to other MECS programs.

3.7 Processing SGML Documents in MECS

3.7.1 Validating SGML documents for MECS Conformance

It has been explained elsewhere (cf. Part I ##, ##, ##) that SGML documents are MECS-conforming, provided that they do not make use of tag minimization or end tag omission. E.g., the following SGML document (), EXSGML:

 

<!DOCTYPE TEI.1 SYSTEM "c:\tei\public\tei1.dtd" [ 

     <!ENTITY tla "Three Letter Acronym"> 

     <!ELEMENT my.tag - - (#PCDATA)> 

          <!-- following line added by C.H. --> 

     <!ELEMENT my.stone - o EMPTY> 

     <!-- any other special-purpose declarations or 

          re-definitions go in here --> 

]> 

<tei.1> 

     This is an instance of a modified TEI.1 type document, 

     which may contain <my.tag>my special tags</my.tag>, 

          <!-- following line added by C.H. --> 

     including milestones such as <my.stone>, and 

     references to my usual entities such as &tla;. 

</tei.1> 

can be validated with the following command:

SGMLVAL EXSGML EXSGML.LOG

You have now validated EXSGML and generated its minimal CDT (which has automatically been called EXSGML.CDT) without in any way interfering with the original SGML file.

Alternatively, you may perform the validation interactively. Copy the document, which is included with the MECS Program Package under the file name 'EXSGML', to the current directory. Start MECSVAL by typing the command
   MECSVAL
at the DOS prompt. Press 'S', and MECSVAL will display the following menu:

 

+------------------------------------------------------------+ 

|C:\MYDIR               Mem: 432266      MECSVAL version 2.01| 

+------------------------------------------------------------+ 

|L LOG:               I Info               1 List directory  | 

|C CDT:               S Switches           2 Change directory| 

|T TXT:               M Create Minimal CDT 3 Copy file       | 

|E EDT:               D Check CDT          4 Print file      | 

|Q Quit               V Check CDT and TXT  5 Delete file     | 

+------------------------------------------------------------+ 

|                       +---------------------------------+  | 

|                       |s SGML-mode OFF                  |  | 

|                       |n Strict hierarchical nesting OFF|  | 

|                       |7 Low ASCII OFF                  |  | 

|                       |r End tag reduction OPTIONAL     |  | 

|                       |d Reset all values to default    |  | 

|                       |q Quit                           |  | 

|                       +---------------------------------+  | 

|                                                            | 

|                                                            | 

+------------------------------------------------------------+ 

Press 'S' to turn "SGML mode" on, and then 'Q' to exit from the Switches menu. Load the document into the MECSVAL editor (i.e., press 'E', and type 'EXSGML' when prompted for a file name). Type the following MECS header at the beginning of the document (or, alternatively, press Ctrl+K, then R, and type C:\MECS\HEADSGML):
 

  £ < > < > </ > [ £ ! £ £ £ £ ] & £ £ ; 

Press F2 to store and exit EXSGML. Then press 'M' at the main menu and type 'EXSGML' when prompted for a text file name. If EXSGML is an SGML-conforming document, you should get no error messages. With this interactive process, however (and unlike SGMLVAL) you have also changed the original SGML document by adding a MECS header to it.

As has been explained elsewhere, an SGML file is MECS conforming only in virtue of the exceptions and deviations from the main outline of the basic code syntax of MECS which have been made precisely in order to enhance SGML compatibility (). It should also be noted that even if run in so-called SGML mode, MECSVAL validates for MECS conformance, not for SGML conformance (cf. ##).

Experience has shown that if an SGML document is not MECS- conforming, i.e. if MECSVAL reports errors, it is not entirely unlikely that the document is not properly SGML conforming either. So if MECSVAL reports errors in your SGML documents, it may be a good idea to check whether the error may in fact also be an SGML error.

3.7.2 Converting SGML files to MECS

The other programs in the MECS Program Package will be able to process SGML documents to some extent only. If you intend to do any serious work at all with SGML documents by means of the MECS Program Package, it is highly recommended that you first convert them to MECS default notation by means of the conversion program SGMLMECS. (You may then convert them back into SGML again with another program, MECSSGML - see below.)

In order to convert the example discussed above, EXSGML, to MECS, you may type the following command at the DOS prompt:
   SGMLMECS EXSGML EXMECS
The output file, EXMECS, which is a fully MECS-conforming document, will look like this:

 

  £ < > < / / > £ £ | £ £ £ £ £ { £ £ } 

 <|DOCTYPE TEI.1 SYSTEM "c:\tei\public\tei1.dtd" [ 

      <!ENTITY tla "Three Letter Acronym"> 

      <!ELEMENT my.tag - - (#PCDATA)> 

           <!-- following line added by C.H. --> 

      <!ELEMENT my.stone - o EMPTY> 

      <!-- any other special-purpose declarations or 

           re-definitions go in here --> 

 ]|> 

 <tei.1/ 

      This is an instance of a modified TEI.1 type document, 

      which may contain <my.tag/my special tags/my.tag>, 

           <|-- following line added by C.H. --|> 

      including milestones such as <my.stone>, and 

      references to my usual entities such as {tla}. 

 /tei.1> 

It has been mentioned several times that SGML documents which make use of end tag omission or tag minimization are not MECS-conforming. However, even SGML documents with occasional tag minimization may be converted to MECS-conforming documents. E.g. from the following SGML document instance which contains both start tag and end tag minimization:
 

+------------------------------------------------------------+ 

|<tei.1>                                                     | 

|     This is an instance of a modified TEI.1 type document, | 

|     which may contain <my.tag>my special tags</my.tag>,    | 

|          <!-- following 2 lines added by C.H. -->          | 

|     <my.tag>also</>                                        | 

|     including milestones <>such</> as <my.stone>, and      | 

|     references to my usual entities such as &tla;.         | 

|</tei.1>                                                    | 

+------------------------------------------------------------+ 

SGMLMECS will produce the following:
 

  £ < > < / / > £ £ | £ £ £ £ £ { £ £ } 

 <tei.1/ 

      This is an instance of a modified TEI.1 type document, 

      which may contain <my.tag/my special tags/my.tag>, 

           <|-- following 2 lines added by C.H. --|> 

      <my.tag/also> 

      including milestones <my.tag/such> as <my.stone>, and 

      references to my usual entities such as {tla}. 

 /tei.1> 

This is a fully MECS-conforming document.

Since MECSVAL will detect all missing end tags, also SGML documents making use of more extensive end tag omission and minimization can (after conversion by SGMLMECS) in most cases be brought to MECS conformance, even if sometimes only with some amount of manual post-editing.

3.7.3 Converting MECS documents to SGML

You may convert EXMECS back to SGML by the following command:
   MECSSGML EXMECS EXSGML2
The new file, EXSGML2, is identical to the original file, EXSGML. This is so because EXMECS contained none of the features peculiar to MECS (naturally, since we made no changes to the document at all except converting it from SGML to MECS and then back again to SGML). Documents such as our previous example, DOC1, which do make use of these special MECS features, however, can sometimes only be converted to SGML at the cost of some loss or distortion of information. The command
   MECSSGML DOC1 DOC1SGML - MECSDOC R
will give the following result:

 

+------------------------------------------------------------+ 

|<MECSDOC>                                                   | 

| <!--From EX1: -->                                          | 

|<p_dmi><p_el>0</p_el><p_el>6</p_el></p_dmi>                 | 

|<REF>1</REF><paragraph><title>Sample MECS                   | 

| Document</title></paragraph>                               | 

| <intro>                                                    | 

|<REF>2</REF><paragraph><indent>3</indent>This is a          | 

| sample <b>MECS</b> document which is intended to           | 

| demonstrate the use of currently available                 | 

| <b>MECS</b> software.</paragraph>                          | 

| </intro> <!--From EX2: -->                                 | 

|<REF>3</REF><paragraph>                                     | 

| <s>We                                                      | 

|  <s>will see                                               | 

|   <s>some examples</s> of recursive codes, of              | 

|   <b>elements <u>which</u></b><u> overlap</u>, of</s>      | 

|  special characters<ind> like &reverse_A;,                 | 

|  &reverse_E;, &reverse_E.exist;, and &qEq.exist;,          | 

|  and of<l> substitutions in                                | 

|  <note>simplified</note></s> MECS-WIT                      | 

| style:</paragraph>                                         | 

| <example>                                                  | 

|<REF>4</REF><paragraph><ind>                                | 

| <s>Ich besuche gern das alte <i>kleine</i>                 | 

|  <p_s><p_el>Schloß</p_el><p_el><i>Haus</i></p_el></p_s>    | 

|  meines <p_s><p_el>Onkels.</p_el><p_el><i>Vaters.</i>      | 

|  </p_el></p_s></s> <ind>                                   | 

| <s>Ich besuche gern das <p_s><p_el>alte Schloß meines      | 

|  Onkels</p_el><p_el><i>kleine Haus</i> meines              | 

|  <i>Vaters</i></p_el></p_s></s> </paragraph>               | 

| </example>                                                 | 

|<REF>5</REF><paragraph>This is the end of our               | 

| <note>very artificial</note> example.</paragraph>          | 

|</MECSDOC>                                                  | 

+------------------------------------------------------------+ 

This is a well-formed SGML document instance. However, the document was originated in MECS and therefore does not contain any SGML DTD. Multi-element tags have been converted to SGML elements, and character disambiguation codes have been merged with character representation codes to form SGML entities. Moreover, the document originally contained overlapping elements and MECSSGML has enforced a hierarchical structure on the output file (cf. 3.). (By omitting the last parameter 'R' you might have instructed MECSSGML not to do this, but you would then have had to design at least two DTDs and use the CONCUR feature of SGML in order to make the document completely SGML-conforming.)

Consequently, so much of the information in DOC1 may have been distorted or even lost in the conversion to SGML that there may be no way you can automatically convert DOC1SGML back to MECS and obtain a result equivalent to the document you started with.

In sum, this calls for the following precautions: if you use the MECS Program Package to process documents with the intention to convert them to SGML without loss or distortion of information, you should:

All these measures may easily be implemented by using an appropriate set of MECS code delimiters and always running MECSVAL with the 'Strict hierarchical nesting' switch on (cf. 3). A MECS header suitable for creating and processing MECS documents taking these measures into account is provided with the Program Package under the file name HEADMS:
 

  £ < > < / / > £ £ ! £ £ £ £ £ { £ £ } 

3.8 Project management

If you work with the MECS Program Package for a while, you will probably build up a number of CDTs, each of which contains a substantial number of code declarations. You will tend to define in the course of your work a corresponding or probably larger number of PDTs, and your document files may become numerous, many of them larger than just a couple of hundred Kb.

At some stage of this process you may easily loose control unless you have taken steps to prevent things from getting out of hand.

Large CDTs and PDTs are most conveniently created, maintained and documented by means of a database program. Almost any database program will serve the purpose, as long as it enables you to output your CDTs and PDTs in flat ASCII files. Maintaining them in a database has the additional advantage of enabling you to add vital information such as free text descriptions of application criteria for codes, examples of usage etc.

The Registration Standard of the Wittgenstein Archives at the University of Bergen, MECS-WIT (Huitfeldt 1997), provides an example. It is stored in a database, and the CDT, all PDTs, as well as the so-called Code Book, which includes a full description of all codes, an alphabetical summary etc., are output from this database.

CDTs and PDTs are most conveniently held in a separate directory included in the path string of your AUTOEXEC.BAT file. Both MECSVAL and MECSPRES will retrieve all files in path directories.

The MECSVAL editor is not a swapping editor, i.e. it is unable to edit files larger than the size of conventional DOS memory available. Since MECSVAL also holds the entire CDT in memory, the amount of memory available to the editor will depend on the size of your CDT.

A simple solution, if you run out of memory, is to use another editor. Any editor which enables you to output your files in so-called flat ASCII files with a maximum line length of 255 bytes may be used.

The MECS Program Package also provides another solution: you may input a sequence of document files to any of the programs by replacing the command line input file name with a slash immediately followed by the name of a file which contains a list of the document files you intend for processing. The programs will create *.ERR-files if errors are encountered.

There is no limit to the number of files you may include in such an input file list, but each file must be a well-formed MECS document on its own. E.g., you cannot include the start tag of a code in one file and the end tag of the same code in a succeeding file. If the files contain MECS headers, their headers should be identical.

Some of the programs in the package require or accept a large number of command line parameters. Within one and the same project most of the parameters will often or always be identical, and several programs will be run over and over again in identical sequences. This naturally calls for batch processing.

Let us assume that the current path string contains the directories C:\MECS and C:\MECSUTIL, that the directory C:\MECS stores the programs of the MECS Program Package, that the directory C:\MECSUTIL stores the files PROJECT.CDT, PRO-N.PDT, PRO-D.PDT, PRO-BETA.PDT and MYJOB.BAT, and that the contents of MYJOB.BAT is:

 

 echo off 

   if exist %1.err del %1.err 

 MECSVAL PROJECT.CDT /%1 %1.LOG 

   if exist %1.err goto end 

 MECSFORM /%1 / 75 R paragraph s 3 REF 1 

   if exist %1.err goto end 

 MECSLYSE /%1 /%1.tr1 o paragraph R 

   if exist %1.err goto end 

 MECSLYSE /%1 /%1.tr2 - - O REF 

   if exist %1.err goto end 

 MECSPRES PRO-BETA.PDT /%1 TEMPFILE.TMP B A I 

   if exist %1.err goto end 

 BETATXT TEMPFILE.TMP %1.BET 

   if exist %1.err goto end 

 DEL TEMPFILE.TMP 

 MECSPRES PRO-N.PDT /%1 /%1.ANW N W I 

   if exist %1.err goto end 

 MECSPRES PRO-D.PDT /%1 /%1.BDW D W I 

   if exist %1.err goto end 

 MECSSGML /%1 /%1.GML 

 :end 

   if exist %1.err type %1.err 

   if not exist %1.err echo Normal termination 

Let us further assume that the directory C:\MECSDOC stores the MECS document files DOC1, DOC2 and DOC3, that the current directory stores the file DOC, and that the contents of DOC is:
 

 C:\MECSDOC\DOC1 

 C:\MECSDOC\DOC2 

 C:\MECSDOC\DOC3 

With the assumed configuration, the command
   MYJOB DOC
will validate, format, analyze, reformat and convert all files listed in DOC. If any stage of the process terminates with an error, an error message will be written to the file DOC.ERR, and the batch process will be aborted. Otherwise, the output of the process will be found in the current directory under the file names DOC.LOG, DOC.TR1, DOC.TR2, DOC.BET, DOC.ANW, DOC.BDW and DOC.GML.


4 MECS PROGRAM PACKAGE: REFERENCE GUIDE

4.1 General Features and Command Line Parameters

All files input to the programs in the MECS Program Package must be so- called flat ASCII files with a maximum line length of 255 characters. Some of the programs may create output files of different formats and line lengths, depending on specifications given by the user. Maximum length of generic identifiers is 50 characters for MECSVAL, 30 characters for the other programs.

All programs accept or require a number of command line parameters. Optional parameters may be omitted or replaced by a dash. The following conventions are used to indicate the format of such parameters:

filename
file name given according to ordinary DOS file specification conventions, i.e. drivepathfilename
file_in
Input file: filename of an existing file. May be replaced by a dash in order to indicate that no input file is used
read_file
Input file, indicated in either of three ways:
1
filename of an existing file.
2
a slash immediately followed by a filename of an existing file list, i.e. a file containing a list of filenames, one per line. The files referenced in the list will be processed one by one in the indicated sequence as if they were one file.
3
a dash, indicating that no input file is used
file_out
Output file: filename of a non-existing file. May be replaced by a dash ('-') in order to indicate that no output file is used, or that output should be written to the screen.
write_file
Output file, indicated in either of four ways:
1
filename of a non-existing file
2
a slash ('/') immediately followed by a filename, the slash indicating that any existing file by the same name should be overwritten without notice
3
a dash, indicating that no output file is used, or that output should be written to the screen.
4
a slash, indicating that the input file(s) should be overwritten without notice (this option available only with MECSFORM)
gi
generic identifier
type
code type, indicated as follows:
o
one-element code
m
multi-element code
#
numeric value
(ABC)
any combination of 'A', 'B' and 'C'
(A|B|C)
select either 'A', 'B' or 'C'

MECSVAL should be used to check documents for syntax errors before they are treated by any other program in the package. The other programs may produce unpredictable results if used on documents which are not MECS- conforming. (ALPHATXT and BETATXT are the only exceptions from this rule.)

MECSVAL is the only program in the package accepting documents in so- called SGML mode. All other programs can only be used with documents which begin with a MECS header (except MECSPRES), which are fully MECS- conforming, and which do not require parsing in SGML mode. (ALPHATXT and BETATXT are the only exceptions from this rule.)

All the programs in the package create so-called 'err-files' if errors are encountered. I.e., if a program reports an error, it also creates a new file with the same name as the current input file and the file name extension 'err'. If the err-file already exists, the error message will be appended to the existing err-file.

All programs may create temporary files called 'TEMPFILE.*' or '*.TMP' during execution. Normally, such temporary files are deleted before the program terminates, but if the program terminates unnormally they may still remain in the local directory.

4.2 MECSVAL

MECSVAL is a validating parser-editor for MECS version 2 documents. The program checks Code Declaration Tables (CDTs) and documents for MECS conformance, and deduces CDTs from MECS-conforming documents. The program may be run either in batch mode or in interactive mode.

Usage:
   MECSVAL file_in read_file file_out SGML STRICT (RED|NORED) 7|com
 

4.2.1 Interactive Mode

The simplest way to start MECSVAL in interactive mode is to enter the command MECSVAL and press return. MECSVAL's main menu looks like this:

 

+------------------------------------------------------------+ 

|C:\MYDIR               Mem:  433018     MECSVAL version 2.01| 

+------------------------------------------------------------+ 

|L LOG:               I Info               1 List directory  | 

|C CDT:               S Switches           2 Change directory| 

|T TXT:               M Create Minimal CDT 3 Copy file       | 

|E EDT:               D Check CDT          4 Print file      | 

|Q Quit               V Check CDT and TXT  5 Delete file     | 

+------------------------------------------------------------+ 

The following commands and options are available from the main menu:
L
log file name. If a log file has been specified, MECSVAL will not halt after the first error encountered in documents or CDTs. Instead, all error messages will be written to the log file. The number of errors will be displayed in the lower window of the screen.

   If errors have been found, the user will be prompted to enter the MECSVAL editor in split screen mode, with the log file in the lower window and the error-checked file in the upper window, and the cursor positioned at the location of the last error encountered.
   If no log file is specified, MECSVAL will halt after the first error encountered and display an error message in the lower window. The user will be prompted to enter the MECSVAL editor, with the cursor positioned at the location of the error found.
C
CDT file name. MECSVAL will search for the specified CDT file in all path directories. This CDT will be used in all subsequent CDT error checking operations (options D and V).
T
document file name. This document will be checked for errors in all subsequent document error checking operations (options B and V).
E
opens the MECSVAL editor. Cf. below for a list of editor commands.
Q
ends the session and exits to DOS
S activates a menu with the following options:
s
switches SGML mode ON/OFF. (Default: OFF)

   If SGML mode is on, the no-element tag close delimiter and the one-element start tag close delimiter may be identical; the one-element end tag open delimiter may be two characters long; strict hierarchical nesting of codes is required.
n
switches Strict hierarchical nesting ON/OFF.

   (Default: OFF) With this switch off, coded elements may overlap and need not be hierarchically nested.
7
switches Low ASCII ON/OFF. (Default: OFF)

   If Low ASCII is on, 8-bit ASCII values (i.e. ASCII values above 127) are not accepted, neither in CDTs nor in documents.
r
toggles between options for markup reduction. (Default: OPTIONAL).

   The other two possible values are REQUIRED and NOT ALLOWED.
   d resets all the above values to default
q
returns to the main menu
M
deduces a minimal CDT from the current document file (see T-option). The current document file must be MECS-conforming and contain a MECS header at the very beginning.

   If these requirements are not met, an error message will be displayed and the deduced CDT may contain syntax errors.
   Note also that the deduced CDT will automatically be given the same file name as the current document file with the extension '.CDT', and that any existing file with the same name will be overwritten.
D
checks the current CDT (see C-option) for syntax errors.

   If errors are encountered, an error message will be displayed in the lower screen window, and the user will be prompted to enter the MECSVAL editor (in split screen mode if a log file has been specified, see L- option).
V
checks that the current document file conforms to the current CDT. The current CDT (see C-option) is checked for syntax errors. If errors are encountered, the operation is terminated. If no errors are encountered in the CDT, the current document file (see T-option) is checked for syntax errors.

   If errors are encountered, an error message will be displayed in the lower screen window, and the user will be prompted to enter the MECSVAL editor (in split screen mode if a log file has been specified, see L-option).

The menu options 1-5 in the rightmost menu column provide access from MECSVAL to some of the most frequently used DOS operating system commands, and should be self-explanatory.

4.2.2 Command Line Parameters

Many of the above-mentioned file specifications and other options may be initialized from the command line. MECSVAL accepts 1-7 command line parameters:
file_in 

CDT file name. See Interactive mode, B-option.

   MECSVAL searches for the specified CDT file in all path directories.
   If the parameter is replaced by a dash, MECSVAL will deduce a minimal CDT from the document file specified in parameter 2, with the file name extension 'CDT'. Any existing file by that name will be overwritten. See Interactive mode, B-option.
read_file 
Document file name. The parameter may be replaced by a dash. See Interactive Mode, T-option.

   In Batch mode (see parameter 3): if the file name is preceded by a slash, the parameter file may instead of an actual document contain a list of document files to be checked.
file_out 
Log file name. Any existing file by the same name will be overwritten.

   If this parameter is left out or replaced by a dash, MECSVAL will enter interactive mode.
4, 5, 6, 7 
These parameters may take either of the values SGML, STRICT, NORED, RED or 7.

   SGML: the no-element tag close delimiter and the one-element start tag close delimiter may be identical. The one-element end tag open delimiter may be two characters long. All codes must be hierarchically nested.
   STRICT: All codes hierarchically nested.
   NORED: No markup reduction allowed.
   RED: Full markup reduction required.
   7: 8-bit ASCII not accepted.
 

If the third command line parameter is omitted (or replaced by a dash), MECSVAL enters interactive mode, and the effect of any other command line parameter is to initialize the corresponding option listed on the main menu.

4.2.3 Examples

The command
   MECSVAL
simply starts MECSVAL; while
   MECSVAL MYDEF.CDT MYTXT.TXT
starts MECSVAL with MYDEF.CDT as default CDT and MYTXT.TXT as default document file; and
   MECSVAL MYDEF.CDT MYTXT.TXT - 7 STRICT
starts MECSVAL with MYDEF.CDT as default CDT and MYTXT.TXT as default document file. 8-bit ASCII is not allowed, and all codes must be hierarchically nested.

   MECSVAL - MYTXT.TXT
deduces a minimal CDT called MYTXT.CDT (any existing file by the name MYDEF.CDT will be overwritten) from MYTXT.TXT, and starts MECSVAL with MYTXT.CDT as default CDT and MYTXT.TXT as default document file.

The program will enter batch mode if and only if a log file name (third command line parameter) has been specified. Any existing file with the same name will be overwritten.

The command
   MECSVAL MYDEF.CDT MYTXT.TXT MYTXT.LOG NORED 7
makes MECSVAL read the CDT MYDEF.CDT and check it for errors. If no errors are found in the CDT, the document file MYTXT.TXT is checked against the declarations given in MYDEF.CDT. Status information and possible error messages are written to MYTXT.LOG. Any markup reduction and ASCII values above 127 will produce error messages.

   MECSVAL - MYTXT.TXT MYTXT.LOG
makes MECSVAL read the document file MYTXT.TXT and write MYTXT.TXT's minimal CDT to a new file, MYTXT.CDT (note that any existing file with the same name will be overwritten). Status information and possible error messages are written to MYTXT.LOG.

The command
   MECSVAL MYDEF.CDT /MYTEXTS MYTEXTS.LOG
makes MECSVAL read and check the CDT MYDEF.CDT, and then read and check all files listed in the file MYTEXTS. Status information and possible error messages are written to MYTEXTS.LOG.
   The file MYTEXTS must consist of a list of document file names, each file name on a separate line, e.g. like this:
   C:\MYDIR\MYFIRST
   C:\MYDIR\MYSECOND
   C:\MYDIR\MYLAST
 

If drives and directories are not specified, MECSVAL will assume that the files are to be found in the current drive and directory.
   If a minimal CDT is deduced (i.e., the first parameter is a dash), the first of the files listed should contain a MECS header, and the other files should contain either an identical header or no header at all.

If and only if errors are found MECSVAL will also write a brief message to a file with a name identical to the document file and the extension '.ERR'. If the file already exists, the message will be appended to it.

4.2.4 MECSVAL Editor Commands
Character left  Left arrow or Ctrl+S 
Character right  Right arrow or Ctrl+D 
Word left  Ctrl+Left arrow or Ctrl+A 
Word right  Ctrl+Right arrow or Ctrl+F 
Line up  Up arrow or Ctrl+E 
Line down  Down arrow or Ctrl+X 
Scroll up  Ctrl+W 
Scroll down  Ctrl+Z 
Page up  PgUp or Ctrl+R 
Page Down  PgDn or Ctrl+C 
Beginning of file  Ctrl+PgUp or Ctrl+Q R 
End of file  Ctrl+PgDn or Ctrl+Q C 
Beginning of line  Home or Ctrl+Q S 
End of line  End or Ctrl+Q D 
Top of screen  Ctrl+Home or Ctrl+Q E 
Bottom of screen  Ctrl+End or Ctrl+Q X 
Go to line  Ctrl+J L 
Go to column  Ctrl+J C 
Top of block  Ctrl+Q B 
Bottom of block  Ctrl+Q K 
Jump to marker  Ctrl+Q 0..Ctrl+Q 9 
Set marker  Ctrl+K 0..Ctrl+K 9 
Previous cursor position  Ctrl+Q P 
New line  Enter or Ctrl+M 
Insert line  Ctrl+N 
Insert control character  Ctrl+P 
Tab  Tab or Ctrl+I 
Delete current character  Del or Ctrl+G 
Delete character left  Backspace or Ctrl+H 
Delete word  Ctrl+T 
Delete to end of line  Ctrl+Q Y 
Delete line  Ctrl+Y 
Find pattern  Ctrl+Q F 
Find and replace  Ctrl+Q A 
Find next  Ctrl+L 
Abandon file  Ctrl+K Q 
Save and continue edit  Ctrl+K S 
Save and exit  Ctrl+K X or F2 
Save to file  Ctrl+K N 
Add window  Ctrl+O A or Shift+F3 
Next window  Ctrl+O N or F6 
Previous window  Ctrl+O P or Shift+F6 
Resize current window  Ctrl+O S 
Begin block  Ctrl+K B or F7 
End block  Ctrl+K K or F8 
Copy block  Ctrl+K C 
Move block  Ctrl+K V 
Delete block  Ctrl+K Y 
Hide block  Ctrl+K H 
Mark current word as block  Ctrl+K T 
Read block from file  Ctrl+K R 
Write block to file  Ctrl+K W 
Toggle insert mode  Ctrl+V or Ins 
Toggle autoindent mode  Ctrl+Q I 
Toggle marker display  Ctrl+K M 
Change directory  Ctrl+J D 
Show version  Ctrl+J V 
Show available memory  Ctrl+J R 
Set undo limit  Ctrl+J U 
Set default extension  Ctrl+J E 
Abort command  Ctrl+U 
Undo last deletion  Ctrl+Q U 
Restore line  Ctrl+Q L 

4.3 MECSFORM

MECSFORM is a code formatter for MECS version 2 documents. The program formats MECS-conforming documents by either extending, retaining, or reducing codes, removing trailing blanks and trailing blank lines, optionally indenting specified codes and/or inserting specified reference codes.

Usage:
   read_file write_file # (E|R|X) gi gi # gi # gi gi
 

The program accepts 1-11 command line parameters:
read_file  (required) 

Document to be formatted by MECSFORM.

   The document must begin with a MECS header, and it must be fully MECS-conforming (see MECSVAL).
write_file  (optional) 
Output file.
(optional) 
Line length of output file. Maximum value: 255.

   Specify value dash ('-') to retain line division of input file.
   Default: -
(E|R|X|-)  (optional) 
This parameter takes the following values:

   E: extend all reduced codes
   R: reduce all reducible codes
   X: replace all alphabetical free characters with the character 'X' Default: retain reduction of input file
gi  optional 
gi  optional 
required if 5 and/or 6 are set. 
Parameters no 5 and 6 should be generic identifiers of one-element codes occurring in the document specified by parameter no 1.

   Parameter no 7 specifies an indent value, which should be an integer between 1 and line length (parameter no 3).
   All codes in the document with generic identifiers specified by parameters 5 and 6 will be indented by the number of characters indicated by parameter 7.
gi  optional if parameter no 5 set 
required if parameter no 8 set 
10  gi  optional if parameter no 8 set 
11  gi  optional if parameter no 7 set 
Parameter no 8 should specify a generic identifier for a reference code, preferably a generic identifier which is not already used for other purposes in the document specified by parameter no 1.

   Parameter no 9 should be an integer value.
   Parameter no 10 should be a generic identifier of a one-element code occurring in the document specified by parameter no 1.
   Reference codes (specified by parameter no 8) containing numbers will be inserted (incrementally, starting with the number specified by parameter no 9) immediately before all occurrences of the code specified in parameter no 5.
   All existing reference codes in these positions will be deleted. Reference numbers are incremented by one, starting with the value specified by parameter no 9. Any occurrence of the code specified by parameter no 10 will reset the value of the reference numbers to 1.
   Parameter no 11 should be a generic identifier of a one-element code occurring in the document specified by parameter no 1. All occurrences of the element will be written from margin, while all elements contained by these elements will be indented to the value specified by parameter no 7.
 

Examples

The command
   MECSFORM MYFILE NEWFILE 60 R
reads MYFILE and creates a new file called NEWFILE, with a maximum line length of 60 characters, removes trailing blanks and trailing blank lines, and reduces codes to their minimal form wherever possible.

The command
   MECSFORM /MYFILES /NEWFILE 78 E
reads all files specified in the file MYFILES and writes all output to one file called NEWFILE, which will be overwritten without notice if it already exists. The document is formatted to a maximum line length of 78, and all reduced codes are extended.

The command
   MECSFORM /MYFILES / 65 E sec s 2 R 1 doc
reads and overwrites all files specified in the file MYFILES, with a maximum line length of 65 characters, removes trailing blanks and trailing blank lines, extends all reduced codes to their full form, inserts linebreaks before every one-element s- and sec-codes, indents all s- and sec- elements 2 characters, inserts one-element 'R'-codes containing numbers (incrementally, starting with 1) immediately before every sec-code, and resets the reference numbering sequence to 1 after every occurrence of a 'doc'-code.

4.4 MECSLYSE

MECSLYSE is a document analyzer for MECS version 2 documents. The program analyzes relationships between the codes of a MECS-conforming document and allows the user to define breakpoints, list all recursive or overlapping codes, and create a tabulated list of the structure of the encoded elements of a document.

Usage:
   read_file write_file type gi (R|O) gi
 

The program accepts 1 - 8 command line parameters:
read_file  required 

Document to be analyzed by MECSLYSE.

   The document must begin with a MECS header, and it must be fully MECS-conforming (see MECSVAL).
write_file  optional 
Output file.
type  optional 
gi  (required if parameter no 3 set 
Together, parameters no 3 and 4 define a breakpoint by indicating the code type and generic identifier of a code occurring in the document.

   The output file will contain a list of all start and end points of the code in question where other codes are active, and specify the start points of all such active codes.
(R|O)  optional 
R

Recursion: if a code nests within itself (i.e., two codes of identical type and with identical generic identifiers active at the same point, e.g. '<s/ <s/ /s>
/s>'), MECSLYSE will list all these codes with an indication of start points.
O

Overlap: if two codes overlap, (i.e. they are both active at some point but do not nest hierarchically, e.g. '<a/ <b/ /a> /b>'), MECSLYSE will list all such occurrences with an indication of their start points.
gi  optional 
This parameter should indicate a one-element code generic identifier which will serve to segment a full list of all codes occurring in the document.

   In the list, all occurrences of the code in question will be indicated by a blank line followed by the entire code, its coded element included. (If the indicated code does not occur in the document, the list will be printed without such segmentation.) All other codes will be listed with start positions, and indented to indicate their nesting level within the code structure of the document.
gi  optional if parameter no 5 set to 'O' 
This parameter should indicate a one-element code generic identifier.

   For all occurrences of the code specified which contain overlapping elements, MECSLYSE will indicate the number of overlaps in that code and its location reference (line and column number).
gi  optional if parameter no 5 set to 'O' 
This parameter should indicate a one-element code generic identifier.

   MECSLYSE will ignore any occurrences of overlap with the code specified.
 

Examples

   MECSLYSE MYFILE MYFILE.OUT - - O
writes a list of all overlapping codes in MYFILE to MYFILE.OUT

   MECSLYSE /MYFILES /RESULTS o sentence R ref
reads all files specified in the file MYFILES and writes all output to one file called RESULTS, which will be overwritten without notice if it already exists. RESULTS will contain a list of all codes active at start and end-points of the one-element code 'sentence', all occurrences of recursive codes, and a code list segmented at all occurrences of the one-element code 'ref'.

4.5 MECSGRAB

MECSGRAB is an element extraction program for MECS version 2 documents. The program 'grabs' specified elements from a document and prints them and/or their line and column reference numbers in a separate file.

Usage:
   read_file write_file type gi (AKRT)
 

The program accepts 5 command line parameters:
read_file  required 

Document to be extracted by MECSGRAB.

   The document must begin with a MECS header, and it must be fully MECS-conforming (see MECSVAL).
write_file  optional 
Output file.
type  required 
gi  required 
Together, parameters no 3 and 4 define an extraction element by indicating its code type and generic identifier.
(AKRT)  required 
If A is specified, the output file will contain the actual contents of all occurrences of the elements indicated by parameters no 3 and 4, including their start and end tags, enclosed within the start and end tags of all codes containing the element in question.

   If T is specified, the output file will contain the actual contents of all occurrences of the elements indicated by parameters no 3 and 4, including their start and end tags.
   If R is specified, the output file will contain line and column reference numbers for all elements in question.
   If K is specified, the output file will preserve the line division of the original elements extracted.
   If both T and R (or A and R) are specified, the reference numbers will always precede the extracted elements.
   At least A, R or T must be specified.
 

for examples of usage of MECSGRAB.

4.6 MECSPRES

MECSPRES is a reformatter for MECS version 2 documents. The program reads a profile definition table and a MECS-conforming document, and produces a new text formatted according to the profile definition table.

Usage:
   file_in read_file write_file (layout) (format) (C|D|I) (style) (#|M) (title)
 

4.6.1 Profile Definition Table (PDT)

4.6.1.1 Overall Structure

A profile definition table (PDT) consists of a MECS header, a list of code type indicators, a declaration of valid output characters, optionally a list of fixed notes, and finally a series of profile definitions for individual codes.

A PDT should always start with a MECS header. The header should be followed by a declaration of six code type indicators. The code type indicators are identical to the code type indicators found in Code Declaration Tables (), and in the rest of the PDT they serve to reference codes by indicating their type followed by their generic identifier.

The code type indicators are followed by a blank indicator. In the rest of the PDT the blank indicator serves to indicate blank characters for output to the reformatted text.

The blank indicator should be followed by a string declaring valid output characters for output in alpha-format (). If this string is replaced by a nil indicator, all input characters are valid output characters.

Optionally, the PDT part may also contain declarations of so-called fixed notes. Fixed notes are strings which may be listed in the beginning of the output text for later reference by letter indexes, or printed in notes. Fixed notes may be referenced with letters 'a'-'u'. (I.e., maximum number of fixed notes is 20.)

In the following example PDT

 

  £ < > < / / > [ / | \ / | / ] { " / } 

 n o # p r d _ 

 £ 

 # a Linker_Rand 

 # b Rechter_Rand 

 o comment   b   bi    Comment:_ £       !   £   £ 

 o doc       e   b     £         £       £   £   £ 

 o vpline    n   £     £         {#6#39} £   £   £ 

 p blort     3   b|u|i ->        /       <-  b   a 

 r GAMMA     £   £     £         {#8#6}  £   £   £ 

the first line contains a MECS header. The second line declares the six code type indicators 'n', 'o', '#', 'p', 'r' and 'd', followed by a declaration of the blank indicator, '_'. The third line contains a declaration of the valid output characters (in this case nil they are nil, which means that all characters are valid output characters, ). The next two lines begin with a numeric indicator, which indicates that they declare fixed notes (note 'a' and 'b', with the values 'Linker_Rand' and 'Rechter_Rand', respectively). The last five lines of the table define the profiles for the one-element codes 'comment', 'doc', 'and 'vpline', the poly-element code 'blort' and the character representation code 'GAMMA'.

4.6.1.2 Code Declarations

Each profile definition references a code and declares its values for seven parameters. These parameters are called Position, Mode, MarkIn, MarkDel, MarkOut, NoteNumber and NoteType.

The following line from the above example PDT contains a code type and a generic identifier identifying the code in question, followed by values for the seven parameters:

o  comment   b   bi   Comment:_ £   !  £   £
This definition says that a one-element code ('o') with the generic identifier 'comment' should be printed in a note ('b'), in bold italics ('bi'), that it should be preceded by the string 'Comment: ' and succeeded by an exclamation mark. Since '£' is the nil indicator in this example PDT, the character '£' in the fourth, sixth and seventh parameter positions indicate that the code is not given any value for these parameters.

With this definition, the input text

xxx <comment/bla bla bla> xxx
will be printed like this:
xxx1 xxx

with the following note:
1 Comment: bla bla bla!


4.6.1.3 Position

The first parameter, Position, accepts single character values. Since positions are mutually exclusive, only one position can be declared for any code. The type of the code in question, the layout and the format decide which values are available, and what they mean - .

Text may be output to five different buffers: The main buffer, the note buffer, the main line buffer, the left margin buffer and the right margin buffer.

The value of the Position parameter decides which buffer the contents of a code should be sent to. By default, all output is sent to the main buffer, unless otherwise indicated by the Position value.

The Position value may also be used to decide the relative position of a text element within the main buffer, e.g. text may be indented, centered, aligned with right margin, printed in tables or columns etc.

and ## for further details.

4.6.1.4 Mode

The second parameter, Mode, accepts the following values:

u
underline
v
double underline
b
bold
q
shadow
j
outlined
w
redline
d
overstruck
i
italics
m
mathfont
f
fine
s
small
l
large
y
subscript
z
superscript
r
slightly above line
x
slightly below line
a
3/4 above line
c
3/4 below line
g
overlay with MarkDel
1
excerpt element (relevant only in layout X, cf. ##)
H
hidden
P
index as phrase
N
don't index
e
capitalize
h
change case
o
change first free character to lower case
n
keep case, even if o-mode active
k
reversed alphabet
p
letterspaced

The way in which all except the last six of these modes are represented will depend on the format chosen for the output file (). Only in one of the available formats, i.e. WordPerfect 5.1 format, will all modes be distinguished from each other.

Modes are not mutually exclusive, and therefore a combination of modes may be declared for a code by giving the mode parameter a string value. For multi-element codes different modes may be assigned to different elements by delimiting the mode indicators by bars. E.g. the following profile definition:

p  blort     £  b|u|bi £ £ £ £ £
says that for the poly-element code with the generic identifier 'blort' all elements should be printed to the current buffer, the first element in bold ('b'), the last in bold italics ('bi'), and any elements between the first and the last should be underlined ('u'). With this definition, the input text
[blort/4| first | second | third | fourth ]
will be printed like this:
first second third fourth


Markers and note indices are always printed in Systems Mode and Note Reference Mode, which are decided by layout and format, - cf. ##.

4.6.1.5 MarkIn, MarkDel and MarkOut

The third, fourth and fifth parameters, i.e. MarkIn, MarkDel and MarkOut, are strings which are printed respectively before, between, or after the elements of a code. (This is the general rule, for exceptions).

MarkIn, MarkDel and MarkOut are printed in Systems Mode, - cf. ##.

Characters not included in the ASCII character set may be indicated by means of a convention borrowed from WordPerfect 5.1: &A.-35-n-35-nnn;, where n is the WordPerfect character set number and nnn the character number.

In other than WordPerfect formats, characters indicated in this way will be printed as the corresponding ASCII character, or, if no corresponding ASCII character exists, as '•' (ASCII 254).

E.g. the following profile definition:

p  blort     3  £ ->         /   <-  £ £
says that for the poly-element code with the generic identifier 'blort' all elements should be printed to the current buffer. An arrow pointing right should be printed before the first element, a slash between each element, and an arrow pointing left after the last element. With this definition, the input text
[blort/4| first | second | third | fourth ]
will be printed like this: -> first / second / third / fourth <-

4.6.1.6 NoteNumber and NoteType

The sixth parameter, NoteNumber, refers to one of the fixed notes declared in the beginning of the PDT ().

The seventh parameter, NoteType, decides how the fixed note will be indicated in the output text:

a
Print element between x .. x, where x is fixed note index
b
Print fixed note index in bold superscript after element
c
Print fixed note as note to text at end point of code

E.g. the following profile definition:
   p blort 3 £ £ £ £ b a
says that for the poly-element code with the generic identifier 'blort' all elements should be printed to the current buffer. The text should also reference the fixed note b in style a.

For example, the input text

[blort/4| first | second | third | fourth ]
will be printed like this:
b first second third fourth b

where b is a reference to the fixed note b, which is listed at the beginning of the printout.

4.6.2 Declaration of Codes of Different Types

In general, the type of the code in question, the layout and the format, decides which values are available for each of the seven parameters in a code's profile definition, and what they mean.

4.6.2.1 No-element Codes

Position:
X, e, g, h, o, s, v available; o and s require MarkDel
X
print code
e
print buffers
g
linebreak, indent to default indentation value
h
linebreak after
o
linebreak before and after
s
centered, linebreak before and after
v
if folioformat then print gi as folio group marker
7
linebreak and tab before element
Mode:
All modes except overlay and excerpt available
MarkIn:
Not available
MarkDel:
Print to current buffer
MarkOut:
Not available
NoteNumber:
Not available
NoteType:
Not available

4.6.2.2 One-element Codes

Position:
m, n, k, p require MarkDel. j, k, p require numeric element (#). f, q, r, m and n available only if minimal margin is 4 or greater (Layout D).
X
print code
a
if folioformat then as position d (suppress element).
b
In note buffer.
c
In current and note buffer.
d
Suppress element.
e
Print buffers on exit from element. If alphaformat then print section marker in first column.
E
As position e, but pagebreak first
f
In left margin buffer, linebreak after element.
g
If folioformat then print element between folio group markers, else suppress element.
h
Linebreak after element.
i
Suppress element, print one blank line.
j
Suppress element, linebreak and indentation to numeric value indicated by element.
k
Suppress element, print number of blank lines indicated by element, each line containing MarkDel.
l
Linebreak, element printed with right justification, linebreak.
m
If fixed margin>0 then print element with MarkDel in left margin
n
If fixed margin>0 then print element with MarkDel in right margin
o
Linebreak before and after element.
p
Suppress element, print number of MarkDel indicated by element.
P
Suppress element, print 1 MarkDel.
q
In left margin buffer.
r
In right margin buffer.
s
Linebreak, element centered, linebreak.
t
Pagebreak before element, linebreak after.
u
Store status and print one blank line before element, print one blank line and restore status after element.
v
If folioformat then print element between folio group markers (even if otherwise suppressed), otherwise suppress.
w
If main line on then write to main line's field 2 else if reference markers then write at top left of main buffer between reference markers else if margin>0 then as position f else as position o.
W
Similar to position w
y
If folioformat then print in subsequent main lines' field 3, else suppress element
z
If folioformat then print in subsequent main lines' field 4
0
If folioformat then print in current and subsequent main lines' field 1, otherwise print in current buffer.
1
if betaformat then print ASCII 1 before element and ASCII 2 after
2
if betaformat then print ASCII 3 before element and ASCII 4 after else if alphaformat then print segment marker ('-') in first column.
3
if alphaformat then replace blanks in element with · (ASCII 250) and print MarkDel before element
4
Print one blank line before element
5
Print one blank line after element
6
Print one blank line and one tab before element
7
Linebreak and one tab before element, suppress element
8
Print element with linebreak before in main and front buffers, if folioformat then link to subsequent main buffers
9
Print element in main and front buffers with linebreak after
!
If links defined then write element as pointer to parallel version
/
If folioformat then print element in current and subsequent main buffers between folio group markers, else suppress element.
Mode:
All modes available, overlay-mode requires MarkDel
MarkIn:
Printed to current buffer after pending marks, modes, and note references
MarkDel:
Printed if position in (m,n,p) or overlay-mode
MarkOut:
Printed to current buffer before pending note references
NoteNumber:
All declared fixed notes available
NoteType:
All types available

4.6.2.3 Multi-element Codes

Position:
Position 7 requires two numeric elements. Note elements of Position 1, 2, and 3 and main elements of position h are delimited by the markers '', '|', and ''
X
Print code
a
if folioformat then print first element between folio link group markers, else suppress first element. Print second element in current buffer. (Cf. position l)
b
If betaformat then all elements in current buffer, delimited by ASCII 5, 6, and 7, else as position 3 (first element in current buffer, all elements in note)
d
Suppress all elements
e
First element in current buffer, all elements in note
f
Suppress first element
g
if folioformat then print first element between folio group markers, else suppress first element. Print subsequent elements in current buffer.
h
all elements in current buffer, delimited by the markers '', '|', and ''
i
print second element in note if graphics markers not defined, otherwise print last element between graphics markers
j
last element in note, all others suppressed
l
if folioformat then print first element between folio link group markers, else suppress first element. Print subsequent elements in current buffer.
n
All except last element in note
p
If alphaformat then suppress first element else if folioformat print first element in current buffer, second between folio group markers, else as position 4 (first element in current buffer)
q
print first element, if folioformat then print second element in subsequent main lines' field 4
r,s,t
if folioformat then print first element between folio group markers in current and all subsequent main buffers, otherwise suppress first element.
v
Overlay between first and last element (i.e., if wpformat or mecsformat then second element is preceded by a number of back markers equal to the number of characters in first element, else second element is preceded by one back marker
V
As position v, but small back marker used (for small print).
w
Insert blank between elements
c
Each element in a separate column, width of columns decided by contents of each column
0
Each element in a separate column, width of columns decided by page width (ie maximum line length)
1
All elements in note
2
Last element in current buffer, all elements in note
3
First element in current buffer, all elements in note
4
First element in current buffer
5
Last element in current buffer
6
All elements in current buffer
7
First element sets default margin, last element sets default indentation
8
First element in current buffer, last element(s) in note
9
First element in note, last element(s) in current buffer
!
If links defined print both elements between link markers, otherwise suppress
Mode:
All modes except overlay available
MarkIn:
Printed to current buffer before element but after pending marks, modes, and note references
MarkDel:
Printed to current buffer, between elements
MarkOut:
Printed to current buffer after last element but before pending note references
NoteNumber:
All declared fixed notes available
NoteType:
All types available

4.6.2.4 Character Codes

Position:
X Print code
Mode:
All modes except overlay and excerpt available for MarkDel
MarkIn:
Not available
MarkDel:
Print to current buffer
MarkOut:
Not available
NoteNumber:
Not available
NoteType:
Not available

4.6.3 Layout and Format

Global features are features which are decided by command line parameters given to MECSPRES. They affect the general layout and format of the output file, and sometimes also the effect of certain profile definition parameters as well.

4.6.3.1 Layout

The layout of the output document is selected by a command line parameter ().

Six different layouts are available - they are simply called layouts B, C, D, N, P, and X.

As explained above (), text may be output to five different buffers: The main buffer, the note buffer, the main line buffer, the left margin buffer and the right margin buffer. The layout decides whether and how these buffers are printed in the output file. E.g., in layouts D and X the buffers are laid out as follows:

 

+•Main line buffer 

|     +•Left margin buffer 

|     |     +• Main buffer 

|     |     |                            +•Right margin buffer 

|     |     |                            | 

•     •     •                            • 

----++---++----------------------------++---+ 

----+|   ||                            ||   | 

     |   ||                            ||   | 

     |   ||                            ||   | 

     |   ||                            ||   | 

 

     +--------------------------------------+ 

     |                                      | 

     |                                      | 

     |                                      | 

 

            • 

Note buffer•+ 

Other layouts are printed similarly, with the following exceptions:

If the left margin width is less than 4 or there is no maximum line length, left and right margin buffers are suppressed.

In some layouts tabulators are represented as blanks, in others as tabulator codes or marks appropriate to the selected output file format. If the latter case, the number of tabs is rounded off to a preset interval value.

In some layouts character disambiguation codes are ignored, in others character representation codes are ignored if they occur in conjunction with character disambiguation codes, and in some layouts both character representation codes and character disambiguation codes are printed.

MarkIn, MarkDel, and MarkOut are always printed in Systems Mode, character disambiguation codes are printed in Systems Mode in some layout.

The layout also affects default values for certain other general features, such as style, maximum line length and positioning of notes.

With some layouts the note buffer is printed at the end of the output file, with others it is printed after each main buffer ().

It should be noted that also certain formats affect the positioning of text in the output file, . (E.g. fields 2-4 of the main line buffer are only available with format F, while in format A all other buffers than the main buffer are unavailable.

The effects of the various layout values can be summarized as follows:

 Layout value D B N P C X Available buffers Main line field 1 X X X X X X Main buffer X X X X X X Left margin buffer X X Right margin buffer X X Note buffer X X X X X X Layout Feature Systems mode b b b - b b Left margin width 4 0 0 0 0 4 Tabs printed as Blank Tab Tab Tab Blank Blank Tabs roundoff value 1 5 4 3 1 1 Character codes printed rep dis dis dis both rep Character disambiguation codes in systems mode N Y N Y Y N Default notes position End End End Main Main End Default style CR13 CR13 PL14 PL12 CR13 CR13 Default max. line length 54 - - - 78 70
In layout D and format W, the selected style affects the default maximum line length: With CR12 maximum line length is 57, with CR13 54, and with CR14 48.

Layout X affects the way MECSPRES interprets profile definitions in the PDT in a rather special way: Only codes which have been given the value '1' (excerpt mode) are printed in the output file, and all other parts of the input document are suppressed.

4.6.3.2 Format

The format of the output document is selected by a command line parameter ().

Seven different formats are available:
A
so-called alpha format
B
so-called beta format
C
so-called plain ASCII format
S
so-called screen display format
M
a MECS-like markup format
F
FolioViews 2.1 markup format
W
Word-Perfect 5.1 format

MECSPRES layouts and profile definitions provide access to various text buffers and text fonts and styles. However, not all buffers, fonts and styles are available in all output formats.

E.g. while bold characters can easily be printed in most word processor formats, it is not possible to print bold characters in flat ASCII files. Therefore, a text element which is assigned mode 'b' (bold) in a PDT will be printed in bold if WordPerfect format is chosen; but if the reformatted file is output in flat ASCII format, the same text will be printed without any indication whatsoever that it was assigned a bold value.

Modes 1, e, h, o, n, k, p are generally available and realized in the same way in all formats:
1
excerpt element (relevant only in layout X, cf. ##)
e
capitalize
h
change case
o
change first free character to lower case
n
keep case, even if o-mode active
k
reversed alphabet
p
letterspaced

The availability of other modes is limited in some of the formats, and their kind of realization may also vary between formats.

Format A (alpha format)

This format has been designed to facilitate preparation of files for input to the program MECSSPEL ().

Alpha format suppresses all other buffers than the main buffer, and prints the contents of the main buffer with one word per line, each word being preceded by point or file name, line number and column number.

If a string of valid alpha characters has been defined in the PDT (), only characters included in this string will be output to the reformatted text file, - all other characters will be suppressed.

There are some position values for one- and multi-element codes which are affected by the choice of alpha format, - .

In addition to the generally available modes mentioned above, the following modes are available:
Mode  Realization 
overstrike  < ... > 
redline  < ... > 
superscript 

Format B (beta format)

This format has been designed to facilitate preparation of files for input to the program MECSPRES ().

There are some position values for one- and multi-element codes which are affected by the choice of beta format, - .

In addition to the generally available modes mentioned above, the following modes are available:
Mode  Realization 
overstrike  < ... > 
redline  < ... > 
superscript 

Note indices are printed in superscript, i.e. in the form '^#', where # is the note number.

Format C (plain ASCII format)

This format has been designed to facilitate preparation of files in so-called flat ASCII or DOS format.

In addition to the generally available modes mentioned above, the following modes are available:
Mode  Realization 
overstrike  < ... > 
redline  < ... > 
superscript 

Note indices are printed in superscript, i.e. in the form '^#', where # is the note number.

Format S (screen display format)

This format has been designed for previewing of output on screen in text mode. (By replacing the third command line parameter with a dash, output is sent to the screen, and ##).

In addition to the generally available modes mentioned above, the following modes are available:
Mode  Realization 
overstrike  < ... > 
redline  < ... > 
underline  underlined 
double underline  underlined 
bold  intense video 
superscript  inverse video 

Note indices are printed in superscript, i.e. inverse video.

Format M (MECS-like markup format)

This format has been designed to facilitate preparation of ASCII files with presentational markup suited for further processing by other programs, e.g. word-processor macros.

In addition to the generally available modes mentioned above, all other modes are available. Text elements printed in these modes are marked '<x/ ... /x>', where x is the relevant MECSPRES mode indicator (). All such elements will also be delimited by end and start tags at line endings and buffer limits.

Note indices are printed in superscript, i.e. in the form '<z/#/z>', where # is the note number.

Format F (FolioViews markup format)

This format has been designed to facilitate preparation of files for reading and processing by Folio corporation's desktop publishing program Views, version 2.1.

FolioViews format prints the main line buffer as a separate line, starting with a FolioViews folio marker and a FolioViews group marker containing a replica of the contents of field 1.

Fields 2-4 of the main line buffer are available only with this format. Field 1 is printed in the left margin, field 2 aligned with the left margin, field 3 centered, and field 4 aligned with the right margin.

There are some position values for one- and multi-element codes which are affected by the choice of FolioViews format, - .

In addition to the generally available modes mentioned above, the following modes are available:
Mode  Realization 
overstrike  < ... > 
redline  < ... > 
underline  underlined (red) 
double underline  underlined (red) 
bold  bold (blue) 
superscript  ^.^.^. 

Note indices are printed in superscript, i.e. in the form '^#', where # is the note number.

Format W (Word-Perfect 5.1 format)

WordPerfect format is the only format in which all modes and styles are both available and realized as indicated by the various modes' and styles' names. (Mode b, 'bold', is printed in bold, mode i, 'italics', is printed in italics etc.) Note indices are printed in superscript.

4.6.4 Command Line Parameters

MECSPRES accepts 2 - 9 command line parameters:
file_in  (required) 

PDT file name (). MECSPRES retrieves PDTs in all path directories.
read_file  (required) 
Document to be reformatted by MECSPRES.
write_file  (optional) 
Output file.
layout (B|C|D|N|P|X)  (optional) 
This parameter decides the layout of the output file ().

   Default value: D
format (A|B|C|F|S|M|W)  (optional) 
This parameter decides the format of the output file ().

   Default value: C
   Note: If A (alpha) format is specified, MECSPRES takes only 8 parameters, and the 8th parameter specifies the number of words to be extracted from the document.
(C|D|I)  (optional) 
Default: I

   This parameter decides how MECSPRES deals with undeclared codes encountered in a document, i.e. codes which have not been declared in the PDT file.
I
ignores undeclared codes, i.e. they are treated as if they had been given nil values for all seven parameters in the PDT (cf. below).
C
prints all tags of undeclared codes in Systems Mode
D
as I, but the encoded elements of undeclared one- and poly- element codes are suppressed.
style (CR|PL)(12|13|14)  (optional) 
This parameter decides the print style of the output file, and takes the following values: CR12, CR13, CR14, PL12, PL13, PL14, where CR indicates Courier font, PL indicates Palatino font, and the numbers indicate font size.

   Default value: CR13 for layout D, PL14 for layout N, PL12 for layout P, CR12 for all other layouts, .
(#|M)  (optional) 
This parameter decides the maximum line length of the output file, and accepts integers or an 'M' as values. 'M' (for maximum) indicates that no maximum line length is given.

   Default value: Layout- and format-dependent, .
Note:

If A (alpha) format is specified, this parameter specifies the number of words to be extracted from the document.
title (string)  (optional) 
Default value: none

   This parameter will be used as first part of document title in Folio format.
 

4.6.5 Examples

The command
   MECSPRES - MYFILE
reformats MYFILE in layout D and displays output on the computer screen in flat Ascii format, ignoring all codes.

   MECSPRES DIPLO.PDT MYFILE NEWFILE
reformats MYFILE according to the profile definition table DIPLO.PDT in layout D and writes output to the file NEWFILE in "flat" ASCII format. Undeclared codes are ignored.

   MECSPRES NORM.PDT /MYFILES NEWFILE N W D
reformats all files listed in MYFILES according to the profile definition table NORM.PDT in layout N and writes output to the file NEWFILE in WordPerfect 5.1 format. Undeclared codes are suppressed.

   MECSPRES PROOF.PDT MYFILE /NEWFILE P W C PL13 - N
reformats MYFILE according to profile definition table PROOF.PDT in layout P and writes output to the file NEWFILE in WordPerfect 5.1 format, in Palatino 13 with no maximum line length and no title page. If NEWFILE already exists, it will be overwritten. Undeclared codes are printed as codes.

   MECSPRES BASE.PDT /MYFILES /NEWFILE B F I - 65
reformats all files listed in /MYFILES according to the profile definition table BASE.PDT in layout B and writes output to the file /NEWFILE in Folio Views markup format, with maximum line length 65. If NEWFILE already exists, it will be overwritten. Undeclared codes are ignored.

   MECSPRES DIPLO.PDT MYFILE - D S
reformats MYFILE according to the profile definition table DIPLO.PDT in layout D and displays output on the computer screen in Screen display format. Undeclared codes are ignored.

4.7 MECSBETA

MECSBETA is a document analysis program for MECS version 2 documents. The program computes and prints the betatexts of an input document.

Usage:
   MECSBETA file_in read_file file_out
 

Roughly, a betatext is a text resulting from excluding all except one of the elements of specific multi-element codes, defined as substitutions. All the betatexts of a particular document are generated by systematically varying which element to include from each substitution, until all possible combinations have been exhausted. Cf. ## for a fuller explanation of the concept of a betatext.

Since the number of betatexts generated by a document may be very large, MECSBETA allows the user to define reference points within the document, divide the document into segments, and generate all possible betatexts for each segment separately.

MECSBETA requires a profile definition table (). All the usual profile defintions are available in this PDT. In addition, the PDT should define certain one-element codes as reference and segmentation codes, and certain multi-element codes as substitutions. This is indicated by the position values of the codes in question, as follows:

 

+------------------+-----------------+-----------------------+ 

|                  | Code type       | PDT position value    | 

+------------------+-----------------+-----------------------+ 

|Reference code    | one-element     |  1 or w               | 

+------------------+-----------------+-----------------------+ 

|Segment code      | one-element     |  2                    | 

+------------------+-----------------+-----------------------+ 

|Substitution code | multi-element   |  b                    | 

+------------------+-----------------+-----------------------+ 

MECSBETA excerpts all and only segments containing substitions, prints their references and displays them as follows: If the contents of the last preceding reference is different from the previous reference printed, the contents of the reference will be printed on a separate line. The part of the segment which precedes the first substitution of the segment will be printed on a separate line, followed by each betatext generated from the substitutions of the segment, each on a separate line starting with '->', followed by the part of the segment succeeding its last substitution on a separate line. If a substitution crosses segment borders all segments containing the substitution will be treated as one segment.

MECSBETA accepts 3 command line parameters:
file_in  required 

PDT file name (). MECSBETA retrieves PDTs in all path directories.
read_file  required 
Input document. The file name should be a 1-8 character DOS file name with no file name extension.
file_out  required 
Output file.

 

 

MECSBETA is a batch program calling two of the other programs included in the MECS program package. Thus, MECSBETA can be included in another batch program either by a CALL command, or by including a copy of it in the other batch program. The contents of MECSBETA.BAT is:

 

 echo off 

 if exist %2.err del %2.err 

 MECSPRES %1 %2 /TEMPFILE.  B B I 

 if exist %2.err goto end 

 BETATXT TEMPFILE. %3 

 DEL TEMPFILE 

 :end 

Examples

With the following PDT, MYBETA.PDT:

 

  £ < > < / / > [ / | £ / | / ] £ £ £ £ 

 n o £ p £ £ £ 

 £ 

 o R 1 £ £ £ £ £ £ 

 o s 2 £ £ £ £ £ £ 

 p s b £ £ £ £ £ £ 

and the following document, MYDOC:
 

 <R/1/R> <s/xxx xxx/s> 

 <R/2/R> <s/xx xx [s/2|pp pp/s|qq qq/s] yy yy/s> 

 <R/3/R> xxxxx <s/yy yyy/s> 

 <R/4/R> mmm <s/lll/s> <s/ttt/s> <s/xx [s/2|aa 

         [s/2| bb/s|cc/s] dd /s| ee /s] ff [s/2|gg /s| hh/s] 

         yy/s> <s/mmm ttt/s> 

the command
   MECSBETA MYBETA.PDT MYDOC DOCBETA
will produce the following output, DOCBETA:
 

 2 

 xx xx 

 ->pp pp 

 ->qq qq 

  yy yy 

 

 4 

 xx 

 ->aa  bb dd  ff gg 

 ->aa  bb dd  ff  hh 

 ->aa cc dd  ff gg 

 ->aa cc dd  ff  hh 

 -> ee  ff gg 

 -> ee  ff  hh 

  yy 

 

 ----------------------------------- 

 Beta:                            12 

4.8 BETATXT

Like MECSBETA, BETATXT is a document analysis program which computes and prints the betatexts of an input document. Unlike MECSBETA, however, BETATXT requires input not in MECS format but in a special format called beta format.

Usage:
   BETATXT file_in file_out A
 

Beta format files are flat ASCII files containing special markers for references, segments and substitutions. Therefore, documents will normally require preprocessing by some other program before they can be input to BETATXT. From MECS documents such preprocessing can be done by means of MECSPRES (). Irrespective of how the input document is created, however, BETATXT expects to find the following beta markers:

 

+--------------------------+--------+-------+ 

|                          | Default|Alter- | 

|                          |   ASCII|native | 

|                          |   value|value  | 

+--------------------------+--------+-------+ 

|Reference start           |      1 |   {   | 

|Reference end             |      2 |   }   | 

+--------------------------+--------+-------+ 

|Segment start             |      3 |   <   | 

|Segment end               |      4 |   >   | 

+--------------------------+--------+-------+ 

|Substitution start        |      5 |   [   | 

|Substitution delimiter    |      6 |   |   | 

|Substitution end          |      7 |   ]   | 

+--------------------------+--------+-------+ 

BETATXT accepts 2 - 3 command line parameters:
file_in  required 
Input file name.
file_out  required 
Output file name.
optional 
If this parameter is left out, BETATXT will assume the beta markers to be ASCII characters 1..7.

   If the parameter is given the value 'A', BETATXT will look for the alternative beta markers '{', '}', '<', '>', '[', '|',
and ']'.
 

Examples

For example, with the following input file BETAFORM:

 

 {1} <xxx xxx> 

 {2} <xx xx [pp pp|qq qq] yy yy> 

 {3} xxxxx <yy yyy> 

 {4} mmm <lll> <ttt> <xx [aa 

         [ bb|cc] dd | ee ] ff [gg | hh] 

         yy> <mmm ttt> 

The command
   BETATXT BETAFORM BETATXT A
will produce the following output file BETATXT:
 

 2 

 xx xx 

 ->pp pp 

 ->qq qq 

  yy yy 

 

 4 

 xx 

 ->aa  bb dd  ff gg 

 ->aa  bb dd  ff  hh 

 ->aa cc dd  ff gg 

 ->aa cc dd  ff  hh 

 -> ee  ff gg 

 -> ee  ff  hh 

  yy 

 

 ----------------------------------- 

 Beta:                            12 

4.9 MECSSPEL

MECSSPEL is an interactive spell checking program for MECS version 2 documents.

Usage:
   MECSSPEL file_in file_in file_in
 

MECSSPEL reads a master word list, a PDT and a MECS document. If the program encounters a word in the document which is not included in the master word list (i.e. a "new" word), the user is prompted to reject or accept the new word. Finally, the program produces three separate output files - one containing new accepted words, one containing new rejected words, and one containing statistical information on the document.

MECSSPEL calls the reformatting program MECSPRES in alpha format (). All the ususal profile definitions are available in the PDT input to MECSSPEL. It should be noted that some mode and position values are provided especially for alpha format, or have special functions in this format:

The following modes are of particular relevance to spell checking ():

h
change case
o
change first free character to lower case
n
keep case, even if o-mode active

MECSSPEL also allows the user to specify that the contents of certain codes should be regarded as phrases (even though they contain word delimiters). In addition, the PDT may define certain codes as section or segment codes used in the statistical calculations performed by the program. This is indicated by the PDT position values of the codes in question, as follows ():

 

+------------------+-----------------+-----------------------+ 

|                  | Code type       | PDT position value    | 

+------------------+-----------------+-----------------------+ 

|Section code      | one-element     |  e                    | 

+------------------+-----------------+-----------------------+ 

|Segment code      | one-element     |  2                    | 

+------------------+-----------------+-----------------------+ 

|                  | one-element     |  3                    | 

|Phrase code       +-----------------+-----------------------+ 

|                  | multi-element   |  p                    | 

+------------------+-----------------+-----------------------+ 

MECSSPEL requires 3 command line parameters:
file_in  required 
Main word list.
file_in  required 
PDT file name (). MECSSPEL retrieves PDTs in all path directories.
file_in  required 
Input document. The file name should be a 1-8 character DOS file name with no file name extension.

 

 

MECSSPEL is a batch program calling two of the other programs included in the MECS program package. Thus, MECSSPEL can be included in another batch program either by a CALL command, or by including a copy of it in the other batch program. The contents of MECSSPEL.BAT is:

 

 echo off 

 if exist %3.err del %3.err 

 MECSPRES %2 %3 %3.TMP B A I 

 if exist %3.err goto end 

 if not exist %3.TMP goto end 

 ALPHATXT -R - %1 - %3.TMP %3.WL %3.CHK %3.STS 

 del %3.TMP 

 echo. 

 echo Accepted words on %3.WL 

 echo Rejected words on %3.CHK 

 echo Statistics     on %3.STS 

 :end 

for examples of usage of MECSSPEL.

4.10 ALPHATXT

Like MECSSPEL, ALPHATXT is a program which may be used for interactive spell checking of documents. Unlike MECSSPEL, however, ALPHATXT may also perform certain additional tasks, such as the production of word lists sorted according to user-defined character sort criteria, frequency word lists, and simple statistical analyses. ALPHATXT accepts input files in so-called flat ASCII format as well as alpha format ().

Usage:
   ALPHATXT (ACEFILNORS) file_in file_in file_in file_in file_out file_out file_out
 

Although ALPHATXT does not itself accept input in MECS format, the program has been developed precisely to satisfy the need for spell- checking and vocabluary control on MECS documents. Ordinary spell checkers are not able to distinguish markup from content and are therefore not suitable for use directly on marked-up documents. The normal procedure is rather to spell check reformatted versions of marked-up documents. However detecting errors in derived rather than primary documents leads to problems in tracking their exact sources in the primary, marked-up documents. With a combined use of MECSPRES and ALPHATXT these problems can be overcome.

4.10.1 Command Line Parameters

ALPHATXT accepts 6-8 command line parameters:
options  optional 

This parameter may take any combination of the values listed below. If the parameter is given the value '-' all options will take on default value.
A
Append statistics information to existing file (cf. parameter 8)

   Default: Overwrite existing statistics file (cf. parameter 8)
C
Merge lower- and uppercase letters during sort (AaBbCc...)

   Default: Sort with uppercase letters first, lower-case letters last (ABCabc...)
E
Ignore input characters not listed in the character sort order file (cf. parameter 2)

   Default: Accept all input characters
F
Sort accepted words (cf. parameter 6) in order of descending frequency

   Default: Sort accepted words (cf. parameter 6) in ascending alphabetical order
I
Ignore case, - i.e. convert all input characters to lower case

   Default: Do not ignore case
L
Output statistics as a one-line tabular summary only

   Default: Output full statistics with summary in columns
N
Quote frequencies for accepted words (cf. parameter 6)

   Default: Do not quote frequencies for accepted words (cf. parameter 6)
O
Input master word list (cf. parameter 3) sorted

   Default: Input master word list (cf. parameter 3) not sorted
   (Note: If the master word list is large, processing time may be drastically reduces by preordering the master word list and running ALPHATXT with the O-option on.)
R
Input file (cf. parameter 4) in ASCII format, rejected word list (cf. parameter 7) in alpha format referencing input file.

   Default: Input file (cf. parameter 4) in alpha format, rejected word list (cf. parameter 7) in alpha format quoting input file references.
S
Output statistics summary only (cf. parameter 8)

   Default: Output full statistics with summary (cf. parameter 8)
file_in  optional 
Character sort order file name.

   The sort order file is an ASCII file containing one or more strings defining the character sort order. If no file is specified, the sort order is according to the current ASCII table.
file_in  optional 
Master word list file name. Cf. parameter 1, option O.
file_in  optional 
Additional word list file name.
file_in  optional 
Input file name. Either running ASCII or alpha format file: Cf. parameter 1, option R.
file_out  optional 
Accepted word list file name.

   If no file name is specified here, all new words will be written to the list of rejected words (cf. parameter 7).
file_out  optional 
Rejected word list file name.

   If no file name is specified here, all new words will be written to the list of accepted words (cf. parameter 6).
file_out  optional 
Statistics file

 

 

If parameters 6 and 7 are both specified, ALPHATXT enters interactive mode and prompts the user to accept or reject each word in the input file which is not found in the master or additional word lists.

Alpha format files are flat ASCII files containing four strings per line. The first string consists of either a point, a dash or a number sign and/or a file name. The second and third strings are numbers (indicating line number and column number, respectively). The fourth string is a word. .

4.10.2 Defining an Alphabetic Sort Order

From the following document, MYFILE:

 

 This is an exercise. The German word for 

 this thing is Übung. 

the command
   ALPHATXT - - - - MYFILE LIST
will produce the following output file LIST:
 

 German 

 The 

 This 

 an 

 exercise. 

 for 

 is 

 thing 

 this 

 word 

 Übung. 

As can be seen, the file is listed according to conventional ASCII sort order, with English capital letters first and the German Umlaut 'Ü' last. Punctuation marks have been included in the word strings. The command
   ALPHATXT C - - - MYFILE /LIST
will produce the following output file LIST:
 

 an 

 exercise. 

 for 

 German 

 is 

 The 

 thing 

 This 

 this 

 word 

 Übung. 

In this case, upper-case and lower-case characters have been merged in the sort order. However, the German Umlaut still comes last in the alphabet, and punctuation marks are still included. In order to avoid these problems, it is useful to define a character sort order file. With the following sort order file ALPHABET:
 

 AaÄäBbCcDdEeFfGgHhIiJjKkLlMmNn 

 OoÖöPpQqRrSsTtUuÜüVvWwXxYyZz 

 0123456789 

the command
   ALPHATXT CE ALPHABET - - MYFILE /LIST
will produce the following output file LIST:
 

 an 

 exercise 

 for 

 German 

 is 

 The 

 thing 

 This 

 this 

 Übung 

 word 

Thanks to the character sort order file the German Umlaut has been sorted in its proper place, and because option 'e' has been specified on parameter 1 all characters not included in the sort order file (such as punctuation marks) have been excluded.

However, the word 'the' with a capital 'T' and the word 'this' with both upper-case and lower-case first letter are still included in the list. It is difficult to implement reliable procedures for correct handling of case in texts which are not marked up in any way. (However, in combination with MECSPRES, ALPHATXT is capable of implementing such distinctions on suitably marked-up files - cf. further below.) It may therefore often be convenient to convert all upper-case characters to lower-case. The command
   ALPHATXT CEI ALPHABET - - MYFILE /LIST
will produce the following output file LIST:

 

 an 

 exercise 

 for 

 german 

 is 

 the 

 thing 

 this 

 übung 

 word 

4.10.3 Frequency Word Lists and Simple Statistical Analyses

ALPHATXT can also be used to produce frequency word lists and simple statistics. The command
    ALPHATXT EFIN ALPHABET - - MYFILE /LIST - /STAT
will produce the following output file LIST:

 

 is                                     2 

 this                                   2 

 an                                     1 

 exercise                               1 

 for                                    1 

 german                                 1 

 the                                    1 

 thing                                  1 

 übung                                  1 

 word                                   1 

The file STAT will contain frequency distribution lists for word lengths and word forms etc. Again, however, this part of the program works best with marked-up files - .

4.10.4 Spell Checking

In the previous example, the file LIST was sorted in order of descending frequency. If we want to produce from MYFILE a word list for use in later spell checking, it must be sorted in either default or user-specified alphabetical order. The command
   ALPHATXT EI ALPHABET - - MYFILE /MASTER
will produce the following output file MASTER:

 

 an 

 exercise 

 for 

 german 

 is 

 the 

 thing 

 this 

 übung 

 word 

In order to show how spell checking works, let us assume that you add some new text to MYFILE, e.g. as follows:
 

 This is an exercise. The German word for 

 this thing is Übung. 

 

 This ist a new exercise. 

The command
   ALPHATXT EIO ALPHABET MASTER - MYFILE /ACCEPT /REJECT
will make ALPHATXT prompt you to accept or reject each of the "new" words 'ist', 'a' and 'new'. On the assumption that you reject 'ist' and accept 'a' and 'new', the file ACCEPT will contain the two latter:
 

 a 

 new 

while REJECT will look like this:
 

 .     3   9 ist 

indicating that the misspelt word 'ist' occurs in line 3, column 9. After you have corrected the misspelt word 'ist' to 'is', you can check the document again with the command
   ALPHATXT EIO ALPHABET MASTER ACCEPT MYFILE /ACCEPT2 /REJECT
 

Since MYFILE should by now not contain any word not included either in MASTER or ACCEPT, you should not be prompted for any words, and the files ACCEPT2 and REJECT should be empty. Alternatively, you may give the command
   ALPHATXT EIO ALPHABET MASTER ACCEPT MYFILE - /REJECT
and check that the file REJECT is empty.

Normally, you would now want to include the new accepted words in the file ACCEPT into the master word list MASTER for later use. This can be done with the command:
   ALPHATXT EIO ALPHABET MASTER ACCEPT - /MASTER
which will produce the following new MASTER:

 

 a 

 an 

 exercise 

 for 

 german 

 is 

 new 

 the 

 thing 

 this 

 übung 

 word 

In the examples above, ALPHATXT was run with the O-option active on parameter 1. This may speed up processing with large master word lists quite drastically, but presupposes that the master word list is already ordered in accordance with the specified alphabetical sort order.

4.10.5 Working with Marked-up Documents

So far we have been looking at examples of uses of ALPHATXT with ordinary ASCII format documents (running text files). The strength of ALPHATXT, however, is its ability to work with files in alpha format files produced from MECS documents with MECSPRES.

With the following MECS document MYCODE:

 

  £ £ £ < / / > £ £ £ £ £ £ £ £ £ £ £ £ 

 <sec/ 

  <s/This is an exercise.> 

  <s/The <nationality/German> word for 

   this thing is <german/Übung>.>> 

 <sec/ 

  <s/This is a new exercise.>> 

and the following PDT, MYOLD.PDT:
 

  £ £ £ < / / > £ £ £ £ £ £ £ £ £ £ £ £ 

 n o # p r d _ 

 £ 

 o sec         4 £ £ £ £ £ £ 

the command
   MECSPRES MYOLD.PDT MYCODE /MYCODE.OLD N C I - 45
will produce the following output file MYCODE.OLD:
 

 This is an exercise. The German word for 

 this thing is Übung. 

 

 This is a new exercise. 

This file is exactly identical to the document MYFILE, which was the departure point for previous examples. However, since the source file, MYCODE, is suitably marked up with codes for sections, sentences, foreign words etc., we are in a better position to perform case sensitive vocabulary control and simple statistical analyses with ALPHATXT. For example, with the following PDT, MYCODE.PDT:
 

  £ £ £ < / / > £ £ £ £ £ £ £ £ £ £ £ £ 

 n o # p r d _ 

 £ 

 o german      d £ £ £ £ £ £ 

 o nationality £ n £ £ £ £ £ 

 o s           2 o £ £ £ £ £ 

 o sec         e £ £ £ £ £ £ 

the command
   MECSPRES MYCODE.PDT MYCODE /MYCODE.TMP B A I
will produce the following output file MYCODE.TMP:
 

 #     3   5 this 

 .     3  10 is 

 .     3  13 an 

 .     3  16 exercise. 

 -     4   5 the 

 .     4  22 German 

 .     4  30 word 

 .     4  35 for 

 .     5   3 this 

 .     5   8 thing 

 .     5  14 is 

 .     5  31 . 

 #     7   5 this 

 .     7  10 is 

 .     7  13 a 

 .     7  15 new 

 .     7  19 exercise. 

This is an alpha format file, which can be input to ALPHATXT. The command
   ALPHATXT ER ALPHABET - - MYCODE.TMP /MYMAIN
will produce the following output file MYMAIN:
 

 a 

 an 

 exercise 

 for 

 German 

 is 

 new 

 the 

 thing 

 this 

 word 

According to the specifications in MYCODE.PDT, upper-case letters at the beginning of sentences have been changed to lower case, the upper-case 'G' in 'German' has been preserved, and the foreign word 'Übung' has been left out.

and ## for an example of how ALPHATXT can be used in combination with MECSPRES for spell-checking of MECS documents.

The command
   ALPHATXT EFNR ALPHABET - - MYCODE.TMP /LIST - /STAT
produces a frequency word list LIST and a statistical analysis file STAT. The contents of LIST is:

 

 is                                     3 

 this                                   3 

 exercise                               2 

 a                                      1 

 an                                     1 

 for                                    1 

 German                                 1 

 new                                    1 

 the                                    1 

 thing                                  1 

 word                                   1 

whereas STAT looks like this:
 

   mycode.tmp 

 ------------------------------ 

 String length in characters 

 

        Chars   Strings 

           1         1 

           2         4 

           3         3 

           4         4 

           5         1 

           6         1 

           8         2 

 Sum: 

          61        16 

 

 ------------------------------ 

 Segment length in strings 

 

      Strings  Segments 

           4         1 

           5         1 

           7         1 

 Sum: 

          16         3 

 

 ------------------------------ 

 Section length in strings 

 

      Strings  Sections 

           5         1 

          11         1 

 Sum: 

          16         2 

 

 ------------------------------ 

 String tokens per string type 

 

       Tokens     Types 

           1         8 

           2         1 

           3         2 

 Sum: 

          16        11 

 

 ------------------------------ 

 Chars:    61 

  Strings: 16    Chars/String: 3.81 Min: 1 Max: 4 StdDv: 1.98 

 Segments:  3 Strings/Segment: 5.33 Min: 1 Max: 1 StdDv: 2.22 

 Sections:  2 Strings/Section: 8.00 Min: 1 Max: 1 StdDv: 2.94 

    Types: 11     Tokens/Type: 1.45 Min: 1 Max: 8 StdDv: 0.97 

In the summary at the bottom of the file, the leftmost column of numbers indicates absolute counts, - i.e. MYCODE.TMP contains 61 characters, 16 strings (words), 3 segments (sentences), 2 sections and 11 string types (word forms). The second column of numbers indicates average values, the third and fourth columns indicate maximum and minimum values, and the fifth column indicates standard deviation.

4.11 MECSSGML

MECSSGML is a code converter for MECS version 2 documents. The program converts MECS-conforming documents to SGML-conforming document instances.

Usage:
   read_file write_file # element (R7)
 

The program accepts 1-5 parameters:
read_file  required 

Input document. The document must contain a MECS header, and it must be fully MECS-conforming.
write_file  optional 
Output file. If the parameter is unspecified or replaced by a dash, output will be sent to the screen instead.
optional 
Line length of output file. Default 78, maximum 255.
element  optional 
If this parameter is specified, MECSSGML will enclose the output document between an SGML <element> start tag and an SGML </element> end tag.
(R7)  optional 
This parameter may take any combination of the values 'R' and '7'.
R
Overlapping elements will be modified so that the output file forms a proper hierarchy of elements. E.g., instead of '<a> <b> </a> </b>' MEGSSGML will write '<a> <b> </b></a><b> </b>' to the output file.
7
Output 8-bit ASCII characters as SGML entities '&CHRnnn;', where nnn is a number indicating the descimal value of the character in question.

No matter which delimiters are used in the input file, the delimiters used in the output file will be those of SGML's concrete reference syntax. In the table below, MECS input is examplified by MECS default delimiters.

 

+----------------------------+--------------------------+ 

|Input file - MECS           | Output file - SGML       | 

|default delimiters          | Concrete Reference Syntax| 

+----------------------------+--------------------------+ 

|No-element code             | Empty element            | 

|<tag>                       | <tag>                    | 

+----------------------------+--------------------------+ 

|One-element code            | Element                  | 

|<tag/ ... /tag>             | <tag> ... </tag>         | 

|<tag/ ... >                 |                          | 

+----------------------------+--------------------------+ 

|Poly-element code           | Elements                 | 

|[tag/#| ... /tag| ... /tag] | <p_tag><p_el> ... </p_el>| 

|[tag| ...       | ...     ] |        <p_el> ... </p_el>| 

|                            | </p_tag>                 | 

+----------------------------+--------------------------+ 

|Character codes             | Entities                 | 

|{tag}                       | &tag;                    | 

|{tag\tag}                   | &tag.tag;                | 

|{"x"\tag}                   | &qxq.tag;                | 

+----------------------------+--------------------------+ 

Examples

The command
   MECSSGML MYFILE NEWFILE
reads MYFILE and writes an SGML document instance to NEWFILE.

The command
   MECSSGML MYFILE NEWFILE 60 document R7
reads MYFILE and writes an SGML document instance called 'document' to NEWFILE. Maximum line length is 60 characters, all overlapping elements will be modified to hierarchical structures, and 8-bit ASCII will be converted to SGML entities.

4.12 SGMLVAL

SGMLVAL is a validating MECS parser for SGML documents. The program validates SGML documents for MECS conformance.

Usage:
   SGMLVAL read_file file_out file_out
 

The program accepts 2-3 parameters:
read_file  required 

Input SGML document.
file_out  required 
Output log file.

   Any existing file with the same name will be overwritten without notice.
file_out  optional 
Output CDT file.

   If this parameter is left out, SGMLVAL will create a CDT file with the same name as the first parameter and 'CDT' as extension. Any existing file with the same name will be overwritten without notice.
 

SGMLVAL is a batch program calling MECSVAL. Thus, SGMLVAL can be included in another batch program either by a CALL command, or by including a copy of it in the other batch program. The contents of SGMLVAL.BAT is:

 

 ECHO OFF 

 COPY C:\MECS\HEADSGML + %1 TEMPFILE.TMP 

 MECSVAL - TEMPFILE.TMP %2 SGML 

 IF NOT "%3" == "" COPY TEMPFILE.CDT %3 

 IF     "%3" == "" COPY TEMPFILE.CDT %1.CDT 

 DEL TEMPFILE.TMP 

 DEL TEMPFILE.CDT 

It should be noted that SGMLVAL presupposes that the MECS Program Package has been installed on drive C:, directory MECS ().

SGMLVAL provides a quick and easy-to use test of SGML documents for MECS conformance. for further details, and other options provided by the program MECSVAL. Please note also that even if SGMLVAL reports errors, it may be that the program SGMLMECS () may successfully convert a document MECS.

4.13 SGMLMECS

SGMLMECS is a code converter for SGML files. The program converts SGML-conforming files to MECS-conforming documents.

Usage:
   read_file write_file
 

The program accepts 1-2 parameters:
read_file  required 

Input SGML file.
write_file  optional 
Output MECS file. If the parameter is unspecified or replaced by a dash, output will be sent to the screen instead.

 

 

The input file must use SGML's reference concrete syntax. SGMLMECS will convert the file to a well-formed MECS document using a subset of MECS default delimiters, as follows:

 

+---------------------------------+--------------------------+ 

|                SGML             |          MECS            | 

+-----------------+---------------+------------+-------------+ 

|     element     |  reference    |  code      |  default    | 

|                 |concrete syntax|            | delimiters  | 

+-----------------+---------------+------------+-------------+ 

|empty element    | < >           | no-element | < >         | 

+-----------------+---------------+------------+-------------+ 

|element          | < >     </ >  | one-element| < /    / >  | 

+-----------------+---------------+------------+-------------+ 

|entity           | &  ;          | char.rep.  | {  }        | 

+-----------------+---------------+------------+-------------+ 

|comment          | <!--      --> | comment    | <|--   --|> | 

+-----------------+---------------+------------+-------------+ 

|marked section   | <!  [   ]   > | comment    | <|  [  ] |> | 

+-----------------+---------------+------------+-------------+ 

The output document's MECS header is:
 

  £ < > < / / > £ £ | £ £ £ £ £ { £ £ } 

A successful conversion presupposes that the input SGML file is MECS-conforming. SGML files can be checked for MECS conformance with MECSVAL or SGMLVAL (). In particular, SGML files which make use of end tag omission or tag minimization are not MECS-conforming.

However, even SGML documents with occasional tag minimization may often be converted successfully with SGMLMECS (). SGMLMECS reads the input file in two passes: In the first pass, it identifies the generic identifiers of all non-minimized end tags. This information is used in order to extend minimized end tags in the second pass. In practice, therefore, it is mostly sufficient that the generic identifier is included in the end tag of at least one of the occurences of an element.

It should also be noted that even if an SGML file has been successfully converted to MECS, that does not necessarily mean that MECS applications will interpret the SGML mechanisms in the way that SGML applications do. E.g., SGML declarations (including comments, marked sections, element and entity declarations etc.) will be ignored by MECS software.


Appendix A ABOUT THE MECS PROGRAM PACKAGE

The MECS Program Package contains programs for the creation, validation, formatting, reformatting and analysis of documents conforming to MECS version 2. The package also contains programs for conversion of MECS version 2 documents to SGML and vice versa.

Det som er dokumentert her, er versjon 2....

The MECS Program Package is under constant revision and development. Comments, bug reports and suggested improvements are most welcome and will as far as possible be taken into consideration in future versions of the programs. Comments should be addressed to:

Claus Huitfeldt

   The Wittgenstein Archives at the
   University of Bergen
   Harald Haarfagresgt 31
   N-5007 Bergen
   Norway

Email: Claus.Huitfeldt@hd.uib.no

All programs were written and compiled in Borland Corporation's Turbo Pascal, versions 5.5 and 6.0. with Editor Toolbox version 4.0.

The Program Package is made available as copyrighted software free of charge and may be freely redistributed and used. Commercial use, reverse-engineering of executable files, or usage of documentation files in any other form is infringement of copyright. Use of the program package for creating or editing documents shared with third parties should be acknowledged. The copyright holder cannot be held responsible for possible inconvenience, loss or damage which might be caused by the use of the software.


Appendix B: MECSPRES PDT DECLARATION PARAMETERS


Appendix C: MECSPRES PREDEFINED LAYOUTS, FORMATS AND STYLES


Appendix D: MECSPRES USER-DEFINED LAYOUTS, FORMATS AND STYLES


Appendix E: KNOWN BUGS

memory-manager-problemer for MECSVAL

diskettstasjon-drap (visse programmer)

visse programmer (som SGMLMECS) for strenge, aborterer men resultat ligger da på tempfile.tmp

MECSVAL teller ikke N-element-koder med mer enn 2 elementer, regner sammen siste summarium feil i basic-mode.


References
Huitfeldt and Rossvær 1989:
Huitfeldt, Claus and Rossvær, Viggo: "The Norwegian Wittgenstein Project Report 1988", Norwegian Computing Centre for the Humanities, Report Series no 44, Bergen 1989.
ISO 8879-1986:
International Organization for Standardization: "Information Processing - Text and Office Systems Standard Generalized Markup Language (SGML)", International Organization for Standardization, ISO 8879-1986, Geneva 1986
Huitfeldt 1993:
ACH-ALLC-paperet "MECS" fra Washington
TEI P1
C.M. Sperberg-McQueen and Lou Burnard (eds.): "Guidelines for the Encoding and Interchange of Machine-Readable Texts", Draft Version 1.1, Chicago and Oxford November 1990.
TEI P3
C.M. Sperberg-McQueen and Lou Burnard (eds.): "Guidelines for the Encoding and Interchange of Machine-Readable Texts (TEI P3)", Chicago and Oxford April 1994.
Huitfeldt 1990
"Toward a Machine-Readable Version of Wittgenstein's Nachlass", unpublished working paper, August 1989 - February 1990, in private circulation.
Huitfeldt 1997
'MECS-WIT - A Registration Standard for the Wittgenstein Archives at the University of Bergen', Working Papers from the Wittgenstein Archives at the University of Bergen (forthcoming).
Huitfeldt 1998
"Tekstkoding og tekststrukturer", i Aarseth et al 1998
Aarseth 1998
Datahåndbok for humanister
Cripps 1996
Djuric 199
Solstrand 1994
ACH-ALLC 1993


1The introductory chapter is an updated and slightly revised version of Huitfeldt 1993.
2Incidentally, this also explains why the present document has been given number 3 in the Wittgenstein Archives' series of working papers - publication was originally planned for 1992.
3See e.g. Huitfeldt 1998 and TEI P3 Chapter 2 ("A Gentle Introduction to SGML") for elementary introductions.
4C. Michael Sperberg-McQueen and I are working on a data structure for MECS similar to that of SGML. This work has been promising, but it is too early to make any final judgement, and no implementation exists.
5Huitfeldt 1995, p 238
6The few things which are forbidden or mandatory in MECS are implicitly required by the basic syntax, not explicitly stated.
7Even though the amount of SGML software has increased, no SGML software to my knowledge yet replicates the functionality of the MECS Program Package. In particular, there is still little software that supports the CONCUR feature.
8Strictly speaking, XML is a subset of SGML. Therefore, the comparisons made here are not between a non-SGML system and SGML, but between the SGML subset XML and "full" or "unrestricted" SGML.
9However cf. Part II, ##
10In earlier versions of MECS generic identifiers were called 'code names', and attribute strings were called 'code name extensions'. The new terminology has been adopted in order to approximate to SGML terminology. However, this may have the disadvantage of being slightly misleading: MECS' generic identifiers and attribute strings are not identical to SGML's generic identifiers and attributes.
11This exception is inconsistent with the general tendency of MECS to allow overlap anywhere, and is the (unintended) result of influence from a constraint enforced at the Wittgenstein Archives.
12A small reservation is required here: the indicators, i.e. character no 2 and words nos 26..31 of the CDT cannot be unequivocally deduced from the document alone. However this is rather insignificant since these indicators play no role except as householding characters internally within the CDT.
13However cf. Part II, ##
14Although the conversion program SGMLMECS (cf. Part II, ##) does handle SGML tag minimization and end tag omission to some extent.
15The example is taken from TEI P1, page 30. Cf. also TEI P3, section 2.9.2 on pp. 34-35.
16Documented in Huitfeldt 1990.
17Documentation in the Code Syntax Part of "Registration Standard for The Wittgenstein Archives at the University of Bergen", unpublished working paper.
18Documented in an earlier draft of the current document of September 1992, unpublished but widely circulated.
19In intermediate versions of this document version 2.00 was referred to as version 1.02.
20SGML software is able to automatically modify SGML documents so that they comply with this condition. Even if your document should for some reason fail to satisfy the condition, it may often be processed with the MECS Program Package after all, .
21For example, SGML attributes and declarations (including comments and marked sections) are regarded by MECSVAL as attribute strings and comments, respectively. This means that MECS programs will simply disregard SGML attributes and declarations, including the entire SGML DTD.
22Though even if you run MECSVAL in so-called SGML mode, it validates for MECS conformance, not for SGML conformance.
23MECS-WIT, which is referred to in this example, is described Huitfeldt 1997.
24MECSBETA is a batch file which calls the programs MECSPRES and BETATXT as exemplified in the two command lines below.
25MECSSPEL is a batch file which calls the programs MECSPRES and ALPHATXT as indicated above. In addition, MECSSPEL will write statistical data to a separate file called DOC1.STS.