[Cache from http://helmer.hit.uib.no/claus/mecs/mecs.htm; please use this canonical URL/source if possible.]
ISBN 82-91071-02-0
ISSN 0803-3137
Copyright: Claus Huitfeldt
First version: 1992.
This version: October 1998
1 Introduction: MECS and SGML
1.1 Background
1.2 SGML
1.3 MECS Syntax
1.4 MECS Program Package
1.5 Conclusion
2 PART I MECS - A Multi-Element Code System
Version 2.00, August 1993
2.1 Summary
2.2 Basic Code Syntax, Code Systems and Documents
2.3 Codes, Tags and Elements
2.4 Code Types
2.4.1 No-element Codes
2.4.2 One-element Codes
2.4.3 Poly-element Codes
2.4.4 N-element Codes
2.4.5 Character Representation Codes
2.4.6 Character Disambiguation Codes
2.4.7 MECS Comments
2.5 Generic identifiers and attribute strings
2.6 Markup Reduction
2.7 Document Structure
2.8 Classification of MECS systems
2.9 Character sets
2.9.1 Delimiters
2.9.1.1 Code delimiters
2.9.1.2 String delimiter
2.9.2 Nil character
2.9.3 Free characters
2.9.4 Tag characters
2.9.5 Default character sets
2.9.6 Examples of alternative character sets
2.10 Code Declaration Table (CDT)
2.11 Deducing a Minimal CDT from an Encoded Document
2.12 SGML Compatibility
2.12.1 Some general observations
2.12.2 From SGML to MECS
2.12.3 From MECS to SGML
2.13 Revision History
2.14 Plans for MECS Version 3
3 PART II MECS Program Package Version
2, August 1994 2 User Guide
3.1 Installation and System Requirements
3.2 A Note to SGML Users
3.3 Creating and Validating Documents and CDTs
3.4 Formatting Documents
3.5 Reformatting Documents
3.6 Analyzing Documents
3.6.1 Code Status Report
3.6.2 Document Structure and Overlapping Elements
3.6.3 Breakpoints and Recursion
3.6.4 Betatexts (Substitutions)
3.6.5 Spell Checking
3.6.6 Frequency Word Lists and Simple Statistical
Analyses
3.6.7 Extracting Elements
3.7 Processing SGML Documents in MECS
3.7.1 Validating SGML documents for MECS Conformance
3.7.2 Converting SGML files to MECS
3.7.3 Converting MECS documents to SGML
3.8 Project management
4 Reference Guide
4.1 General Features and Command Line Parameters
4.2 MECSVAL
4.2.1 Interactive Mode
4.2.2 Command Line Parameters
4.2.3 Examples
4.2.4 MECSVAL Editor Commands
4.3 MECSFORM
4.4 MECSLYSE
4.5 MECSGRAB
4.6 MECSPRES
4.6.1 Profile Definition Table (PDT)
4.6.1.1 Overall Structure
4.6.1.2 Code Declarations
4.6.1.3 Position
4.6.1.4 Mode
4.6.1.5 MarkIn, MarkDel and MarkOut
4.6.1.6 NoteNumber and NoteType
4.6.2 Declaration of Codes of Different Types
4.6.2.1 No-element Codes
4.6.2.2 One-element Codes
4.6.2.3 Multi-element Codes
4.6.2.4 Character Codes
4.6.3 Layout and Format
4.6.3.1 Layout
4.6.3.2 Format
4.6.4 Command Line Parameters
4.6.5 Examples
4.7 MECSBETA
4.8 BETATXT
4.9 MECSSPEL
4.10 ALPHATXT
4.10.1 Command Line Parameters
4.10.2 Defining an Alphabetic Sort Order
4.10.3 Frequency Word Lists and Simple Statistical
Analyses
4.10.4 Spell Checking
4.10.5 Working with Marked-up Documents
4.11 MECSSGML
4.12 SGMLVAL
4.13 SGMLMECS
Appendix A About the MECS Program Package
Appendix A: About the MECS Program Package
Appendix B: MECSPRES PDT Declaration Parameters
Appendix C: MECSPRES Predefined Layouts, Formats and Styles
Appendix D: MECSPRES User-Defined Layouts, Formats and Styles
The subject matter of this document is text encoding. It presents what
I have called the Multi-Element Code System, MECS.
Today, text encoding is more or less synonymous with SGML
(Standard Generalized Markup Language). Chapter 1 is an introduction
summarising the rest of the document by way of comparing MECS to SGML.
1
Chapter 2 provides a full description of MECS.
It may be read independently of the rest of the document.
Chapter 3 is a user guide and Chapter 4
a technical documentation of the MECS Program Package, a program package
for the validation, manipulation and analysis of MECS documents.
This is a working paper in the full sense of the term, i.e. a report on work in progress. I have wanted to publish it for a long time, but a new and better version of MECS or the MECS Program Package has always seemed to be around the next corner. 2 Planned changes to MECS are described in 2.14 .
Readers who intend to use this document primarily as a practical guide
to the MECS Program Package are advised to start with the Summary in 2.1
, and then proceed directly to the User Guide in Chapter 3 .
The rest of Chapter 2 provides reference material and
information of relevance to readers interested in technical aspects of
the MECS syntax, e.g. with a view to redefining the delimiter set or to
finding out whether a given markup syntax is MECS-conforming.
Readers who intend to use the MECS Program Package for
processing of SGML documents are strongly recommended to read the following
sections carefully: 2.12, 3.2, 3.7 and 4.11-13.
No detailed knowledge of any particular text encoding
system is required. But it is presupposed that readers have some acquaintance
with text encoding, or are familiar at least with the rationale behind
text encoding systems in general. 3
What is published here is for the most part a result of work in the
years from 1985 to 1987 for the Norwegian Wittgenstein Project and,
later (since 1990), for the Wittgenstein Archives at the University
of Bergen. I thank these projects for having given me the opportunity
to pursue my work on text encoding, and I thank the Norwegian Research
Council for Science and the Humanities for having permitted me to spend
part of my time as Research Fellow in philosophy on this work.
MECS started as an attempt to revise the code system of
the Norwegian Wittgenstein Project, "CosyTrawma", which was originally
developed by Associate Professor AsbjÁrn Brændeland
(Huitfeldt and Rossvær 1989, pp. 177-200). Brændeland, in turn,
had drawn on work done by the Tübingen Project in Germany.
The result of the revision turned out to be an entirely
different system, the earliest drafts of which were presented in Huitfeldt
and Rossvær 1989, pp. 51-54 and 201-236, and in a number of unpublished
working papers since 1989. I am particularly indebted to Senior Executive
Officer Àystein Reigem at the Norwegian Computing Centre
for the Humanities for comments and criticism of these drafts, as well
as to Professor Stig Johansson at the University of Oslo, who gave
many helpful comments and suggestions.
Much work is currently under way in text encoding. The most important
contribution in the Humanities is arguably that of the Text Encoding Initiative
(TEI). This major international cooperation project has set a standard
for text encoding in the Humanities for a long time to come.
My own participation in TEI has provided me with many
opportunities to learn from discussions with colleagues. In particular,
discussions with Lou Burnard, Michael Sperberg-McQueen, Allen
Renear and Peter Robinson have been a recurrent source of inspiration.
I am also indebted to my colleagues at the Wittgenstein
Archives for criticism, help and encouragement. In the later years, Peter
Cripps has been a particularly rich source of constructive criticism
and inspiring enthusiasm.
I hereby thank the above-mentioned persons and institutions for their help and assistance, criticism and comments. Remaining errors and deficiencies are entirely my responsibility.
Bergen, September 1998
Claus Huitfeldt
The Norwegian Wittgenstein Project (NWP), which started in 1980, aimed at producing a machine-readable version of Ludwig Wittgenstein's Nachlass. Like many similar projects at the time, the NWP developed its own markup system. And like most projects that did so, the NWP enjoyed not only the many advantages of explicitly marked up texts, but also the severe disadvantage of having to develop its own specialized software, even for trivial, non-specialized tasks.
The NWP was discontinued in 1987, and during the preparation for its continuation, which was later (1990) to become the Wittgenstein Archives at the University of Bergen, I set out to improve the markup system. It had turned out that the system suffered from certain deficiencies. Any revision of the markup system necessitated adjustment of the software, which had in the course of several years of ad hoc revisions grown quite complicated. The pause in the project activities was therefore well spent looking for a more viable and flexible solution.
Standard Generalized Markup Language (SGML) was adopted as an international standard for text encoding by International Organization for Standardization (ISO) in 1986, so at that time (i.e. 1988-1989), SGML was the most natural candidate for consideration. However, despite its many strengths and potential advantages, I found SGML unsuited to our needs. Among the reasons were:
My conclusion therefore was that I had to develop a different system altogether for the Wittgenstein Archives, a system which had to be considerably less demanding concerning software development, to answer the specific needs of our project, and yet be general and flexible enough to allow for extensive revision of the registration system during the course of future work without necessitating revision of application software.
At roughly the same time the Text Encoding Initiative (TEI) had just started (1987). The TEI based itself on SGML. Many of the issues TEI was expected to address were relevant to the problems listed above. Although we could not wait for TEI to be completed, it was therefore also an obvious consideration for my development work to keep as close to SGML as possible.
Consequently, MECS is in many respects similar to SGML. Like SGML, MECS is not itself a markup scheme, but a set of rules for the design of markup schemes. MECS may be accommodated to conform to SGML's reference concrete syntax. SGML documents are MECS-conforming, provided that they do not make use of markup reduction or minimization.
MECS markup schemes may be declared in separate "document definitions", similar to the SGML DTDs. Because they lack most of the expressive power of SGML's DTDs, I have chosen a different term: Where SGML speaks of Document Type Definitions (DTDs), MECS speaks of Code Declaration Tables (CDTs). Basically, a CDT is a declaration listing delimiters, other characters sets, and codes (tags for elements and entities) to be used in a document. MECS documents may be validated for conformance with a particular CDT. But unlike SGML, no CDT is required in MECS (cf. below).
MECS includes equivalents to SGML's elements and internal entities. In addition, MECS includes syntactical means for the representation of structures which in SGML are treated in a different way. There are seven syntactically distinct types of codes (examples are given in MECS's default character set):
No-element codes: <tag> One-element codes: <tag/ ... /tag> Poly-element codes: [tag/2| ... /tag| ... /tag] N-element codes: [tag/2\ ... /tag| ... /tag] Character representation codes: {tag} Character disambiguation codes: {...\tag} Comments: <| ... |>All delimiters may be redefined, and tags may be reduced or minimized (though not omitted) according to specific rules.
No-element codes correspond to SGML's empty elements, and mark points within the text. One-element codes correspond to ordinary SGML elements, and mark spans of text.
Multi-element codes, i.e. poly-element and N-element codes, have no obvious parallel in SGML. Poly-element codes mark two or more consecutive spans of text (typically indicating that they stand in a specific relationship to each other, e.g. that of substitution or counterposition). N-element codes are similar to poly-element codes. But whereas the number of spans (elements) marked by a poly-element code may vary from token to token, the number of elements in an N-element code is fixed.
Character representation codes correspond roughly to SGML's internal entities. Character disambiguation codes, which have no direct equivalent in SGML, are used in conjunction with character representation codes, typically to disambiguate homographic graphemes (e.g. characters which in one context may be punctuation marks, in another context logical operators).
In MECS (just as in SGML) parts of a document which should be ignored by the parser are marked as comments.
MECS has no direct parallel to SGML attributes, external entities and declarations. However multi-element codes may be used for some of the same purposes as attributes, and the MECS Program Package supports a file inclusion mechanism which performs some of the work that SGML external entities do. MECS has corrollaries to SGML comments and to marked sections with keyword CDATA, but not to the other SGML declaration types.
MECS documents contain text interspersed with codes. MECS does not presuppose any hierarchical document structure - elements may appear in any order and nest arbitrarily deeply. Multi-element codes may not overlap each other, but one-element codes may overlap all other codes without restriction.
The basic syntactic features of all tags occurring in a MECS-conforming document are directly deducible from their delimiters, even if markup is reduced to its minimum. This has at least three important consequences:
First, it increases human readability of documents. Even if heavily marked up documents are notoriously difficult to read for the human eye, MECS at least has the advantage that you may e.g. tell a no-element from a one-element tag immediately. (In SGML you do not know whether a start tag is associated with an end tag or not (i.e. whether it marks an empty element or not), until you have either inspected the DTD or scanned to the end of the current element (which in the worst case means the rest of the entire document instance).
Second, the same point applies to software development. There is no need for look-ahead to identify the basic syntactic features of a MECS tag. Therefore, as long as a MECS document includes a one-line header declaring its delimiters, the entire document can be parsed and validated for basic syntax conformance without recourse to any CDT.
Third, this means that MECS documents are in a certain fundamental sense self-documenting: If a MECS document includes a header, which is a one-line declaration of its delimiters, then a CDT to which that document conforms may be deduced directly from the document alone. The CDT thus deducible from the document is called the document's minimal CDT.
Although it is unequivocally decidable whether any particular document conforms to any particular CDT, an indefinite number of documents conform to any particular CDT, and any particular document conforms to an indefinite number of CDTs. In this respect the relationship between MECS documents and CDTs is the same as the relationship between SGML document instances and DTDs - it is a many- to-many relationship. What is special about the relationship between a document and its minimal CDT is that it is a many-to- one relationship: any particular MECS-conforming document conforms to one and only one minimal CDT.
The MECS Program Package contains programs for the creation, validation, formatting, reformatting, analysis, element extraction and spell checking of MECS-conforming documents, as well as programs for translation between MECS and SGML. All programs in the package run under MS-DOS.
MECSVAL is an interactive, validating parser-editor. MECSVAL checks CDTs and documents for MECS conformance, and may either deduce minimal CDTs from MECS-conforming documents or check that documents conform to particular CDTs.
MECSFORM formats or regularizes MECS-conforming documents by either reducing markup to its minimum or extending it to its standard form, wrapping lines to a user-specified maximum length, removing trailing blanks and trailing blank lines, optionally indenting specified elements and/or inserting reference codes in specified locations etc.
MECSPRES outputs text in various formats (HTML, WordPerfect, Folio Flat File, so-called "plain ASCII", and others). The program offers a number of options for the layout and formatting of elements (margins and marginalia, indentation, tables, columns, notes, section headers etc.; features like bold, italics, single and double underline, capitalization, letter-spacing; markers and special characters; links and anchors etc.) MECSPRES may also reformat text to other MECS-encoded formats, and to formats required by the programs ALPHATEXT and BETATEXT (cf. below). With MECSPRES the user may not only define stylesheets, but also format, layout and style specifications.
MECSLYSE analyzes relationships between the encoded elements of a document and allows the user to define breakpoints at which to display the code stack, list all recursive or overlapping elements, and create a tabulated list displaying the sequence and nesting level of all elements occurring in a document.
MECSGRAB extracts specified elements from a document and prints them and/or their line and column reference numbers in a separate file. This file may, under certain conditions, itself be a MECS-conforming document subject to further processing by MECSGRAB or other MECS programs.
ALPHATXT may be used for interactive spell checking in general, and spell checking of MECS-encoded documents in particular. The program may also perform a number of other tasks, such as the production of word lists sorted according to user-defined character sort criteria, frequency word lists, and simple statistical analyses.
BETATXT computes and displays all possible combinations of single elements of multi-element codes within segments of a document. For example, if sentences are marked and alternative readings are encoded with multi-element codes, then BETATXT may compute and display all the alternative readings of sentences containing substitutions.
MECSSGML converts MECS-conforming documents to SGML-conforming documents. The conversion may or may not lead to a certain loss or distortion of information, depending on the degree to which the document in question includes features specific to MECS, whether or not overlapping elements are retained, etc. (Though it is possible to restrict MECS so as not to allow features which cannot be translated to SGML without loss of information ().)
SGMLMECS converts SGML-documents to MECS-conforming documents. Although a number of SGML features will be converted to a form in which they are ignored by other MECS software, in a certain sense the conversion does not lead to loss or distortion of information: Documents converted to MECS with SGMLMECS may always be converted back again to their exact original SGML form with MECSSGML.
Except for MECSVAL and ALPHATXT, none of the programs in the package are interactive. However, Peter Cripps has written a menu- driven user interface, MECSPAC, for interactive use of the program package.
The lack of a rigorously defined document structure (a DTD) and the lack of restrictions against overlapping elements has been taken by some to suggest that writing programs for MECS would be more complicated than writing programs for SGML.
One difference is that where SGML programs may keep track of the document structure by means of a "last in first out" stack, MECS programs have to maintain a doubly linked list. Admittedly, this is a bit more complicated. On the other hand, the fact that the basic syntactical role of each and every tag can be inferred directly from its delimiters without look-ahead serves to simplify other matters considerably.
Another difference is that whereas with SGML programs may build internal tree representations of documents to facilitate manipulation on them, no such internal representations are built by MECS programs - because of the occurrence of overlapping elements this has so far seemed too complicated. 4 Therefore, all MECS programs read the entire document from its beginning in order to perform operations on it.
The MECS Program Package does not live up to standards of professional software. But the fact that it was possible for a sheer amateur to write the bulk of these programs as a side-activity during a couple of years indicates that programming for MECS is easy. Altogether the package comprises approximately 13,000 lines of Pascal code (excluding the editor). It is assumed that similar programs for SGML would demand code far in excess of this.
I once said 5 that when it comes to document structure, one of the main differences between SGML and MECS is that in SGML everything is forbidden unless it is explicitly permitted or mandatory, while in MECS everything is permitted unless it is explicitly forbidden.
In retrospect I realize that this is grossly unfair: SGML does after all admit quite permissive DTDs, and MECS does not have any means of forbidding or demanding particular document structures. 6 Still, the formulation points to a difference of emphasis: SGML provides strong mechanisms for exerting control over document structure, whereas MECS sacrifices such control in favor of free overlap and simplified or in-line declaration of elements.
Nine years have passed since the development of MECS started, and it has been used in the encoding of several thousand manuscript pages. The TEI guidelines has been available for quite some time, and has been discussed and used extensively by a large number of projects. The amount and range of SGML-based software has increased considerably.
Is there still a need for MECS? Despite the fact that SGML is a far more sophisticated markup language, I believe that the considerations which led me to dismiss SGML nine years ago still apply. 7 MECS is therefore in my eyes still the preferred choice for a project like the Wittgenstein Archives. However, MECS also has obvious shortcomings. If not all, then at least a number of these shortcomings are eliminated in SGML. Unfortunately, conversion from MECS to SGML without loss of information is notoriously difficult, so we cannot have the best of both worlds.
One recent development (1997) within the SGML area is particularly interesting. Extensible Markup Language (XML), which has received much attention lately, shares a couple of features with MECS: In XML, empty elements are visibly different from elements with content, and tag omission is not allowed. Consequently, a DTD is not required in XML, and a distinction is made between well-formedness ("valid" without DTD) and validity (valid according to some specific DTD) of documents. In all these respects, XML is therefore closer to MECS than SGML is. 8 (It is also interesting to note that one of the arguments often made in favor of XML is that it is easier to write programs for than SGML is.) However, in one important respect XML poses even greater difficulties than SGML: XML does not include SGML's CONCUR feature. And without CONCUR the conversion of MECS documents seems even more difficult.
Some work has been done in order to create a bridge from MECS to SGML. Sunniva Solstrand has developed a method (and a program) for automatically "deducing" DTDs from document instances converted from MECS to SGML (Solstrand 1994). Sascha Djuric has proposed a convention for automatically converting elements with overlap to hierarchical structures in a controlled manner (Djuric 199?). What remains in particular is a method for converting MECS documents with overlap to concurrent hierarchies by using the SGML CONCUR feature, and SGML software which implements this feature. Methods for MECS to SGML conversion is one of the concerns of an ongoing cooperation between C. Michael Sperberg-McQueen and myself.
MECS is a syntax for the design of text encoding systems. Documents which conform to this syntax consist of text interspersed with codes, of which there may be seven syntactically distinct types:
No-element codes: <s> One-element codes: <a/ ... /a> Poly-element codes: [a/2| ... /a| ... /a] N-element codes: [s/2\ ... /s| ... /s] Character representation codes: {a} or {"---"\a} Character disambiguation codes: {a\a} or {"---"\a} Comments: <| xxx |>In these examples '...' indicate coded elements, i.e. character strings which may or may not contain further codes. 's' and 'a' exemplify generic identifiers, i.e. names of individual codes.
The first four types of codes are sometimes referred to jointly as element codes; poly-element and N-element codes are sometimes referred to jointly as multi-element codes; while character representation and disambiguation codes are sometimes referred to jointly as character codes.
The examples above are given in MECS' default character set. However all character sets in MECS may be redefined, and there are no restrictions on which characters may be used as code delimiters or which as free characters or tag characters.
Strictly speaking, MECS is therefore not in itself a code system, but a general-purpose set of rules for the design of such systems. MECS specifies how to assign specific syntactic roles to characters and character sets, how to declare generic identifiers for codes, how to use these codes in documents, etc., - in short, how to define and use a code system conforming to the basic code syntax specified by MECS.
This definition is given in the form of a Code Declaration Table (CDT). The CDT starts with a MECS header. The header assigns values to the code delimiters, which decide the most basic general features of any MECS code system. The rest of the CDT declares free characters, tag characters and generic identifiers.
Thus, a MECS-conforming document is a document conforming to a MECS CDT. The document itself may also start with a MECS header. If it does, its minimal CDT can be reconstructed on the basis of the encoded document alone.
The default MECS header is:
£ < > < / / > [ / | \ / | / ] { " \ }Any order and nesting level of codes in documents is allowed. Codes may be contained within each other wholly (hierarchically) or only partly (overlapping each other). However, there is one restriction against overlapping: multi-element codes may nest hierarchically, but they may not overlap other multi-element codes.
Codes belonging to different code types may have identical generic identifiers, with one exception: neither no-element and one-element codes nor poly-element and N-element codes may share the same generic identifier.
Character disambiguation codes may be used in conjunction with character representation codes only. The generic identifier of the associated character representation code may be replaced by a string of free characters enclosed by character quote delimiters.
Comments may occur anywhere in a document, and they may contain any sequence of legal characters. The contents of a comment is not regarded as part of the code structure of a document.
According to the general rules for markup reduction one-element codes, poly-element codes and N-element codes may be reduced:
Full markup Reduced markup <a/ ... /a> <a/ ... > [a/2| ... /a| ... /a] [a| ... | ... ] [a/3| ... /a| ... /a| ... /a] [a/3| ... | ... | ... ] [s/2\ ... /s| ... /s] [s\ ... | ... ] [t/3\ ... /t| ... /t| ... /t] [t/3\ ... | ... | ... ]SGML documents are MECS-conforming, provided that they do not make use of tag minimization or end tag omission 9. Some MECS documents will be well-formed SGML documents, others may easily be converted to SGML, yet others may only be converted to SGML with a certain distortion or loss of information.
2.2 Basic Code Syntax, Code Systems and Documents
MECS is a basic code syntax for the design and specification of code systems for markup of electronic documents. Strictly speaking, MECS is therefore not in itself a code system, but a general-purpose set of rules for the design of such systems.
MECS specifies how to assign specific syntactic roles to characters and character sets, how to declare generic identifiers for codes, how to use these codes in documents, etc., - in short, how to define and use a code system conforming to the basic code syntax specified by MECS.
These assignments and declarations are listed in a Code Declaration Table - a CDT. Strictly speaking, again, it is only when adding a CDT to the basic code syntax of MECS that we have a code system. Adding a CDT to the basic code syntax of MECS is like adding an alphabet and a vocabulary to a formal grammar.
The values assigned to the code delimiters decide the most general basic features of any MECS code system. These values are declared in the MECS header. The MECS header is the very first part of the CDT and may also be included as the first part of MECS documents. Any MECS document which contains such a header is self-documenting in the sense that a minimal CDT may be reconstructed on the basis of the document alone.
An electronic document adhering to the specifications of a specific MECS code system, e.g. MECS-XXX, may be called a MECS-XXX-conforming or a MECS system-conforming document. A document adhering to the specifications of some MECS code system or other will be called a MECS-conforming document. All MECS system-conforming documents are MECS-conforming documents, but not vice versa.
In our context, a computerized text is regarded as a stream of characters.
A MECS document is a string of free characters and codes.
A code is an ordered sequence of tags and (optionally) elements. A code may consist of one single tag, or it may consist of several tags and one or more elements included between the tags.
An element is a string of free characters and tags.
An element occurring between the tags of one and the same code is called the code's coded element.
A tag consists of code delimiter(s) and/or tag characters. More specifically, a tag may consist of a tag open delimiter, a string of tag characters constituting a generic identifier, possibly followed by an attribute string, and a tag close delimiter. Or a tag may consist of a tag close delimiter only.
There are seven types of codes. Using the MECS default delimiters (cf. 9.5), examples of these code types will appear as follows:
No-element code: <s> One-element code: <a/ ... /a> Poly-element code: [a/2| ... /a| ... /a] N-element code: [s/2\ ... /s| ... /s] Character representation code: {a} or {"---"\a} Character disambiguation code: {a\a} or {"---"\a} Comment: <| xxx |>In these examples 's' and 'a' are generic identifiers, '...' indicate elements, '---' indicate strings of free characters, and 'xxx' is any sequence of legal characters. The element(s) occurring between the tags of a code is called its coded element(s). Thus, one-element codes have one coded element, poly-element and N-element codes have several coded elements, while the other code types have no coded elements.
The first four types of codes are sometimes referred to jointly as element codes. Poly-element and N-element codes are sometimes referred to jointly as multi-element codes. The last two types of codes are sometimes referred to jointly as character codes.
A no-element code consists of one single tag, which is called the no-element
tag.
The no-element tag consists of a no-element code open
delimiter (NCO), a generic identifier (optionally followed by an attribute
string) and a no-element code close delimiter (NCC).
A one-element code consists of a one-element start tag, a coded element
and a one-element end tag.
The one-element start tag consists of a one-element start
tag open delimiter (OSO), a generic identifier (optionally followed by
an attribute string) and a one-element start tag close delimiter (OSC).
The one-element end tag consists of a one-element end
tag open delimiter (OEO), the same generic identifier as the start tag
and a one-element end tag close delimiter (OEC).
A poly-element code consists of a poly-element start tag, one or more
coded elements separated by multi-element separator tags and a multi-element
end tag.
The poly-element start tag consists of a multi-element
start tag open delimiter (MSO), a generic identifier (optionally followed
by an attribute string), a multi-element number delimiter (MNC), an element
number and a poly-element start tag close delimiter (PSC).
The multi-element separator tag consists of a multi-element
separator tag open delimiter (MDO), the same generic identifier as the
poly-element start tag and a multi-element separator tag close delimiter
(MDC).
The multi-element end tag consists of a multi-element
end tag open delimiter (MEO), the same generic identifier as the poly-element
start tag and a multi-element end tag close delimiter (MEC).
The number of coded elements contained by a poly-element
code is indicated by the element number. The number of multi-element separator
tags contained by a particular poly-element code token equals the number
of coded elements minus one.
Poly-element codes may contain two or more elements and
the number of elements contained by different tokens of the same poly-
element code in a document may vary from token to token.
An N-element code consists of an N-element start tag, one or more coded
elements separated by multi-element separator tags and a multi-element
end tag.
N-element codes are syntactically identical to poly-element
codes, except that: (1) the start tag close delimiter is an N-element start
tag close delimiter (NSC); and (2) the number of elements contained by
different tokens of the same N-element code in a document may not vary
from token to token.
2.4.5 Character Representation Codes
A character representation code consists of one single tag, which is
called the character representation tag.
The character representation tag consists of a character
representation code open delimiter (CRO), a generic identifier and either
a character code close delimiter (CCC) or a character disambiguation code
open delimiter (CDO).
If the generic identifier is followed by a character disambiguation code open delimiter (CDO), the character representation code is used in conjunction with a character representation code immediately succeeding it, like this:
{a\a}where 'a' is the generic identifier of a character representation code and also of a character disambiguation code. If accompanied by a character disambiguation code, the character representation code may, instead of a generic identifier, contain a string of free characters, enclosed by character quote delimiters (CQDs) - cf. 4.6 for further explanation of this feature.
2.4.6 Character Disambiguation Codes
A character disambiguation code consists of one single tag, which is
called the character disambiguation tag.
The character disambiguation tag consists of a character
disambiguation code open delimiter (CDO), a generic identifier and a character
code close delimiter (CCC).
A character disambiguation code can only be used in conjunction with a character representation code immediately preceding it. The close delimiter of the character representation code is then replaced by the open delimiter of the character disambiguation code.
The preceding character representation code may, instead of a generic identifier, contain a string of free characters, enclosed by character quote delimiters (CQD), like this:
{"---"\a}where 'a' is the generic identifier of a character disambiguation code and '---' is a string of free characters.
A MECS comment may contain free characters, tag characters and code delimiters, i.e. any legal characters, in any order. The contents of a comment is not regarded as part of the code structure of a document.
A comment starts with a one-element start tag open delimiter (OSO) immediately followed by a poly-element start tag close delimiter (PSC), and ends with a poly-element start tag close delimiter (PSC) immediately followed by a one-element end tag open delimiter (OEC). Thus, in MECS' default character set a comment looks like this:
<| xxx |>where 'xxx' stands for any sequence of legal characters.
<!-- xxxxxxxx -->or other SGML declarations like e.g.
<![PCDATA[ >> < < </xxx &]]>to be valid in SGML-like MECS documents.
2.5 Generic identifiers and attribute strings
A MECS code system may include any number of generic identifiers 10 for each of the different code types except for comments, which do not have generic identifiers.
Neither no-element and one-element codes nor poly-element and N-element codes may share the same generic identifier. Apart from this, codes belonging to different code types may have identical generic identifiers.
Thus, the following examples would be legal and might all be included in one and the same document conforming to a MECS code system:
(1) <s> (2) <a/ ... /a> (3) [a/2| ... /a| ... /a] (4) [s/2\ ... /s| ... /s] (5) {a} (6) {a\a}MECS also allows for the use of numerals as identifiers of one-element codes, so that codes may be used with natural numbers in the place of generic identifiers, e.g.
(7) <1/ ... /1> (8) <2/ ... /2>etc.
Start tags of element codes may contain attribute strings: if the tag's
generic identifier is followed by a string delimiter or a nil character
(i.e. in the normal position of the tag close delimiter), then the rest
of the tag may contain any sequence of free characters (except for any
that might be identical to code delimiters), ending with the tag close
delimiter.
I.e., given the examples (1) and (2) above, the following
examples would also be legal:
(9) <s This is an attribute string> (10) <a attribute=value n=1/ ... /a>2.6 Markup Reduction
The implications of these rules for each specific code type are as follows:
No-element codes cannot be reduced.
A one-element end tag with a generic identifier identical to the generic identifier of the last preceding unterminated one-element start tag may be reduced to the one-element end tag close delimiter.
A poly-element start tag may, if the code has 2 elements, be reduced to multi-element start tag open, generic identifier and poly-element start tag close.
An N-element start tag may, if the code has 2 elements, be reduced to multi-element start tag open, generic identifier and N-element start tag close.
A multi-element separator tag with a generic identifier
identical to the last preceding unterminated multi-element start tag or
separator tag may be reduced to the multi-element separator tag close.
A multi-element end tag with a generic identifier identical
to the last preceding unterminated multi-element separator tag may be reduced
to the multi-element end tag close delimiter.
Character representation codes, character disambiguation codes and comments cannot be reduced.
Thus, the examples (2)-(4) above may be reduced to
(11) <a/ ... > (12) [a| ... | ... ] (13) [s\ ... | ... ]2.7 Document Structure
A MECS document may include the header of a MECS system to which it conforms (cf. 10). If so, the first character of the header must also be the very first character of the document.
Apart from this optional header, a MECS document consists of codes and elements appearing in any order.
That a code A contains a code B means that one or more of B's tags are
contained in A's coded element(s).
B is hierarchically nested within A if A contains B, and
B does not contain A.
A and B overlap if A contains B, and B contains A.
Any order and nesting level of codes is allowed, with one exception: no multi-element code may overlap any other multi-element code 11.
This means that the following examples are all legal:
(14) <a/ /a> <b/ /b> (15) <a/ <a/ /a> /a> (16) <a/ <b/ /b> /a> (17) <a/ <b/ /a> /b> (18) <a/ [a/2| /a| /a> /a] (19) [a/2| [s/2\ /s| /s] /a| /a] (20) [a/2| [a/3| /a| /a| /a] /a| [t/3\ /t| /t| /t] /a] (21) [a/2| <a/ <s> {a\a} [b/2| [s/2\ {a\a} /s| {b} /s] <b/ {b\a} /b| /a> /b] <s> /a| /b> /a]However, the following example is illegal:
[s/2\ [b/2| /s| /b| /s] /b]It should be noted that overlapping reduces the possibilities for markup reduction. For example, (21) above reduces to:
(22) [a| <a/ <s> {a\a} [b| [s\ {a\a} | {b} ] <b/ {b\a} /b| /a> ] <s> /a| /b> ]2.8 Classification of MECS systems
Any MECS system is either complete or partial.
A complete MECS system contains all the seven code types
described above (cf. 4).
A partial MECS system lacks one or more of the code types
of a complete system. Partial systems are called N-type systems, where
N is the number of code types contained by the system.
Any MECS system is either reduced, reducible, or irreducible.
A reduced MECS system demands full reduction of all start
tags and separator tags.
A reducible MECS system permits but does not require reduction
of start and separator tags.
An irreducible MECS system requires that no tags are reduced.
Any MECS system is either restricted or unrestricted.
In a restricted MECS system, no codes may overlap.
A system which is not restricted, is unrestricted.
A reduced system is necessarily a restricted system, but not vice versa.
A MECS document consists of legal characters only. The legal characters are the delimiters, free characters and tag characters.
There are several subsets of legal characters, whereof some sets may overlap and others not. MECS assigns a number of different roles to these character sets and to individual characters, and includes rules concerning the relationships between these sets and between particular members of particular sets.
There are 18 code delimiters. They correspond to the six first types of codes (cf. 4) as indicated below.
Assigning values to the code delimiters is one of the most basic operations in the definition of a MECS code system. The values assigned to the code delimiters decide many of the basic syntactical features of the code system (cf. 8, 9.5 and 9.6).
If a code delimiter is assigned the value nil, the delimiter itself
is said to be nil, or undeclared.
That a character which is the value of a code delimiter
is a reserved delimiter value means that it can not belong to the free
characters of the code system defined. A delimiter is said to be reserved
if its value is a reserved delimiter value.
Values are assigned to code delimiters according to the following rules:
The string delimiter (SD) may occur anywhere in elements. In tags, SD separates generic identifiers from attribute strings.
SD may not be nil. Its value may not be identical to the value of any code delimiter. SD is always a free character.
If SD is assigned a blank character, line endings and start and end of file will be regarded as equivalents to SD.
The nil character may occur anywhere in a document. In tags, the nil character separates generic identifiers from attribute strings.
Its value may not be identical to any of the code delimiters. The nil character is always a free character.
If the nil character is identical to SD, the character value in question will be interpreted as SD, and the system defined is said to contain no nil character.
Free characters may occur anywhere in elements.
All legal characters except those which are reserved delimiter values may be included in the set of free characters.
The string delimiter and the nil character always belong to the free characters.
Generic identifiers consist of tag characters.
All legal characters except those which are values of code delimiters may be included in the set of tag characters.
Default code delimiters
The default MECS code delimiters define a complete, unrestricted, reducible system, with 10 different, whereof 8 reserved, delimiter values.
Header: £ < > < / / > [ / | \ / | / ] { " \ } Code delimiters: < / > [ | ] { \ } " Reserved code delimiters: < / > [ | ] { }
Default string delimiter: | (blank) Default nil character: | (blank) |
:
a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 3 4 5 6 7 8 9 0 , ; . : - ( ) ! ? " ' * % & = + (blank)Default tag characters
:
a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 3 4 5 6 7 8 9 0 . _ -2.9.6 Examples of alternative character sets
Example 1
A complete, unrestricted, irreducible system is defined by the following header:
£ * * * | * @ | | * \ | @ @ * @ " / @The system has 6 code delimiters, whereof 3 are reserved.
Code delimiters: | * | @ \ " / |
Reserved code delimiters: | * | @ |
The code syntax of the system is:
No-element code: *s* One-element code: *a| ... *a@ Poly-element code: |a|2* ... |a@ ... @a* N-element code: |s|2\ ... |s@ ... @s* Character representation code: @a@ or @"---"/a@ Character disambiguation code: @a/a@ or @"---"/a@ Comment: ** xxx *@Example 2
A complete, unrestricted, reducible system is defined by the following header:
£ * * * < * > * % [ \ * | * ] * " _ /The system has 11 code delimiters, whereof 4 are reserved.
Code delimiters: | * < > % [ \ | ] " _ / |
Reserved code delimiters: | * > | ] |
The code syntax of the system is:
No-element code: *s* One-element code: *a< ... *a> Poly-element code: *a%2[ ... *a| ... *a] N-element code: *s%2\ ... *s| ... *s] Character representation code: *a/ or *"---"_a/ Character disambiguation code: *a_a/ or *"---"_a/ Comment: *[ xxx [>
Example 3
A partial (4-type), restricted, reduced system is defined by the following header:
£ < > < / £ > £ £ / £ £ £ £ £ / £ £ /The system has 3 code delimiters, whereof all are reserved.
Code delimiters: | < > / |
Reserved code delimiters: | < > / |
The code syntax of the system is:
No-element code: <s> One-element code: <a/ ... > Character representation code: /a/ Comment: </ --- />Example 4
A partial (4-type), unrestricted, reducible system is defined by the following header:
£ < > < > </ > [ £ ! £ £ £ £ ] & £ £ ;The system has 8 code delimiters, whereof 6 are reserved.
Code delimiters: | < > </ [ ! ] & ; |
Reserved code delimiters: | < > [ ! ] & |
The code syntax of the system is:
No-element code: <s> One-element code: <a> ... </a> Character representation code: &a; Comment: <! xxx [ xxx ] xxx >
2.10 Code Declaration Table (CDT)
It is only when adding a Code Declaration Table (CDT) to MECS' basic code syntax that we have a MECS code system. The CDT assigns values to the delimiters and other character sets, and declares the actual codes of the system. The CDT itself is a file of characters.
The very first character of the CDT declares the system's string delimiter.
The second character of the CDT declares the system's nil indicator, which in the rest of the CDT indicates an assignment of the value nil.
The third character of the CDT declares the system's nil character.
In the rest of the CDT, all character strings delimited by string delimiters are strings.
The first 18 strings of the table declare the systems's code delimiters, in the following order:
NCO NCC OSO OSC OEO OEC MSO MNC PSC NSC MDO MDC MEO MEC CRO CQD CDO CCCTogether, the first three characters and the first 18 strings of the CDT form a string defining the system's header. The default MECS header is:
£ < > < / / > [ / | \ / | / ] { " \ }(Note that the very first character of the header is a blank.)
The next five strings (i.e. strings nos 19, 20, 21, 22 and 23) declare the system's free characters.
Strings number 24 and 25 declare the system's tag characters.
The next six strings (i.e. strings nos 26, 27, 28, 29, 30 and 31) declare the CDT's code type indicators, which in the rest of the CDT indicate which code type a generic identifier belongs to.
String no 26 declares the no-element code indicator. String no 27 declares the one-element code indicator. String no 28 declares the numeric indicator. String no 29 declares the poly-element code indicator. String no 30 declares the character representation code indicator. String no 31 declares the character disambiguation code indicator.
The first part of the CDT, i.e. the part beginning with SD and including the first 31 strings, is the code syntax part of the CDT.
The rest of the CDT is the code inventory part. This part consists of pairs of strings, the first of which is a code type indicator and the second a generic identifier. Each pair declares a code of the indicated type and assigns a generic identifier to it.
If the numeric indicator is not nil, numbers indicate N-element codes in the rest of the table, and one-element codes with numeric identifiers (cf. 5) are declared by replacing the generic identifier by a numeric indicator.
The following is an example of a CDT defining a complete, unrestricted, reducible system with the MECS default character sets defined above (cf. 9.5).
(23)
+---------------------------------------+-----------+---------+ | £ < > < / / > [ / | \ / | / ] { " \ } | Header | | +---------------------------------------+-----------+ | | | | | |abcdefghijklmnopqrstuvwxyz | Free | | |ABCDEFGHIJKLMNOPQRSTUVWXYZ | char- | | |1234567890 | acters | | |,;.:-()!?"' | |Code | |*%&=+ | |Syntax | +---------------------------------------+-----------+Part | |abcdefghijklmnopqrstuvwxyz | Tag char- | | |ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890._-| acters | | +---------------------------------------+-----------+ | |no one num poly rep dis | Code type +---------+ | | indicators| | +---------------------------------------+-----------+ | |no s | | | |one a | | | |one b | | | |one num | | | |poly a | |Code | |poly b | |Inventory| |2 s | |Part | |3 t | | | |rep a | | | |rep b | | | |dis a | | | +---------------------------------------+-----------+---------+The following document, which contains the examples (1)..(22) above, conforms to the above CDT.
(24)
(1) <s> (2) <a/ ... /a> (3) [a/2| ... /a| ... /a] (4) [s/2\ ... /s| ... /s] (5) {a} (6) {a\a} (7) <1/ ... /1> (8) <2/ ... /2> (9) <s This is an attribute string> (10) <a attribute=value n=1/ ... /a> (11) <a/ ... > (12) [a| ... | ... ] (13) [s\ ... | ... ] (14) <a/ /a> <b/ /b> (15) <a/ <a/ /a> /a> (16) <a/ <b/ /b> /a> (17) <a/ <b/ /a> /b> (18) <a/ [a/2| /a| /a> /a] (19) [a/2| [s/2\ /s| /s] /a| /a] (20) [a/2| [a/3| /a| /a| /a] /a| [t/3\ /t| /t| /t] /a] (21) [a/2| <a/ <s> {a\a} [b/2| [s/2\ {a\a} /s| {b} /s] <b/ {b\a} /b| /a> /b] <s> /a| /b> /a] (22) [a| <a/ <s> {a\a} [b| [s\ {a\a} | {b} ] <b/ {b\a} /b| /a> ] <s> /a| /b> ]2.11 Deducing a Minimal CDT from an Encoded Document
Although it is unequivocally decidable whether any particular document conforms to the MECS code system defined by any particular CDT, an indefinite number of documents conform to any particular CDT, and any particular document conforms to an indefinite number of CDTs.
However, if a document contains the header of any CDT to which it conforms (cf. 9.1.1), one particular CDT to which the document conforms may be deduced directly from the document alone.
In virtue of the rules for assigning values to code delimiters (cf. 9.1.1), the basic syntactic features of all tags occurring in a MECS-conforming document are directly deducible from their code delimiters. The deduction can be done without look-ahead, unless the delimiter pairs NCO - NCC and OSO - OSC are identical (cf. 9.1.1, exception to rule no 4).
This holds true for all MECS-conforming documents, whether the system to which they conform is partial or complete, whether it is restricted or unrestricted, and whether it is reduced, reducible, or irreducible.
The CDT thus deducible from the document is called the document's minimal CDT. Any particular MECS-conforming document has one and only one minimal CDT 12.
It is therefore recommended that all MECS documents contain a header. Some examples follow below.
Document (24) has the following minimal CDT:
(25)
£ < > < / / > [ / | \ / | / ] { " \ } T abeghilnrstuv 0123456789 . ()= abst 12 n o # p r m n s o # o a o b p a p b 2 s 3 t r a r b m a
The following document
_ [ ] [ | _ ] _ _ | _ _ _ _ _ _ _ _ _ This document contains no-element [NO] and [ONE| one-element ] codes, and [|comments|]. No more. Its minimal CDT will define a system which is partial, restricted (since there is no one-element end tag open delimiter), and reduced (for the same reason). This document contains no-element [NO] and [ONE| one-element ] codes, and [|comments|]. No fun.has the following minimal CDT
_ [ ] [ | _ ] _ _ | _ _ _ _ _ _ _ _ _ CDINT acdefghilmnoprstuwyz _ ,. ()- _ ENO n o _ _ _ _ n NO o ONE2.12 SGML Compatibility
2.12.1 Some general observations
As a first and very rough approximation, it may be said that: 1. SGML documents are MECS-conforming, provided they do not make use of tag minimization or end tag omission 13. 2. Some MECS documents are well-formed SGML documents, others may easily be converted to SGML, yet others may only be converted to SGML with a certain distortion or loss of information.
MECS no-element codes correspond to SGML empty elements (so- called milestones.) MECS one-element codes correspond to SGML elements. MECS character representation codes correspond roughly to SGML internal entities. There is nothing in SGML which corresponds directly to the MECS poly- element codes, N-element codes and character disambiguation codes. MECS-aware software will accept, but ignore, SGML attributes and declarations, and interpret SGML entity references differently from SGML applications.
MECS markup reduction rules differ from SGML markup reduction or minimization rules.
Functionally, a MECS document, if stripped of its optional MECS header, corresponds to the SGML document instance. But while the MECS CDT corresponds roughly to the SGML Document Type Definition (DTD), there are fundamental differences between MECS CDTs and SGML DTDs.
The following features of MECS are exceptions and deviations from the main outline of the basic code syntax which have been made in order to enhance SGML compatibility.
An SGML document is a MECS-conforming document if no tag minimization or end tag omission has been used 14.
However, MECS software will interpret attributes, declarations and entity references differently from the way they are interpreted by an SGML application. In MECS, SGML attributes will be regarded as attribute strings and ignored. All SGML declarations, including the DTD as well as marked sections and comments, will be regarded simply as MECS comments and thus also ignored. SGML entities will be interpreted as character representation codes.
It is possible to define MECS code delimiters so that they agree closely with the corresponding parts of the SGML concrete reference syntax (cf. 9.6, example 4.)
For example, the following SGML document 15
<!DOCTYPE TEI.1 SYSTEM "c:\tei\public\tei1.dtd" [ <!ENTITY tla "Three Letter ACROnym"> <!ELEMENT my.tag - - (#PCDATA)> <!-- following line added by C.H. --> <!ELEMENT my.stone - o EMPTY> <!-- any other special-purpose declarations or re-definitions go in here --> ]> <tei.1> This is an instance of a modified TEI.1 type document, which may contain <my.tag>my special tags</my.tag>, <!-- following line added by C.H. --> including milestones such as <my.stone>, and references to my usual entities such as &tla;. </tei.1>is a MECS-conforming document, with the following minimal CDT
£ < > < > </ > [ £ ! £ £ £ £ ] & £ £ ; EIT acdefghilmnoprstuwy 1 ,. £ aegilmnosty .1 n o £ £ r £ n my.stone o my.tag o tei.1 r tla2.12.3 From MECS to SGML
A MECS document is a well-formed SGML document provided that:
The conversion of MECS documents not satisfying conditions 1 to 7 to SGML-conforming document instances is a straightforward process, provided they satisfy condition 8.
The conversion of MECS documents not satisfying condition 8 to SGML-conforming documents is likely to be a rather complicated process, and may lead to distortion or loss of information. There are two ways in which such documents can be converted: 1) either all occurrences of overlap can be eliminated (cf. Part II, ##); 2) or one has to identify sets of codes in the document which do not overlap, and define concurrent DTDs for each of these sets.
SGML applications are likely to interpret attribute strings and character disambiguation codes differently from the way they are interpreted by MECS software.
A preliminary version of MECS was drafted in February 1990 16. Version 1.00 was finished in February 1991 17. Version 1.01, of June 1992 18, consisted in a slight revision of the CDT format. The revision did not necessitate changes to version 1.00 documents.
Version 2.00, which was finished in August 1993 19, includes minor changes both in the CDT format, the MECS header, and the basic syntax of one of the code types, i.e. the N- element codes. Therefore, transition from earlier versions to version 2.00 necessitates changes to CDTs and may also require changes to MECS documents encoded according to these versions.
MECS version 3 will represent a simplification of the structures already present in earlier versions. At the same time, version 3 will offer new capabilities and new and more powerful mechanisms.
MECS version 2 no-element codes, one-element codes and N-element codes have one thing in common: their number of elements is fixed. In version 3, therefore, they will all be subsumed under one category, which will be called N-element codes. Poly-element codes will be retained, with the modification that they may contain any number of elements including 0 or 1 (i.e., the number of elements does not any more have to be higher than 1).
Character representation codes and character disambiguation codes will be retained.
The four remaining code types of version 3 may be exemplified as follows, in default notation:
Full markup Reduced markupN-element codes:
<tag> <tag/ ... /tag> <tag/ ... > <tag/ ... /tag| ... /tag> <tag/ ... | ... >Poly-element codes:
<tag_0> <tag_1/ ... /tag> <tag< ... > <tag_2/ ... /tag| ... /tag> <tag| ... | ... >Character codes:
{tag} {tag_tag}The character code close delimiter may be left out if immediately followed by a string delimiter or a reserved code delimiter, as follows:
{tag} {tag {tag}< {tag< {tag}/ {tag/ {tag}| {tag| {tag}> {tag> {tag}{ {tag{Inclusion of mechanisms similar to those of SGML external entities will be considered.
Comments and marked sections
Comments will be similar to version 2 comments, but the syntax will be changed so as to facilitate processing of SGML documents:
<|-- ... --|>Inclusion of mechanisms similar to the SGML marked sections (with keywords IGNORE, CDATA, RCDATA and INCLUDE) will be considered:
<|[IGNORE[ ... ]]|> <|[CDATA[[ ... ]]|> <|[RCDATA[ ... ]]|> <|[INCLUDE[ ... ]]|>Attributes
In earlier versions, a tag consists of a generic identifier which may be followed by an attribute string. The attribute strings play no role in the earlier versions, except to increase compatibility with SGML by leaving a space open for SGML attributes. Version 3 will follow up this strategy by incorporating most or all the syntactical features of SGML attributes.
In addition, a syntax for structured attributes proposed by Peter Cripps will be considered for inclusion in MECS (cf. Cripps 1996).
Discontinuation
An element opened by a start tag or delimiter tag may be discontinued by
_tag|and then resumed again by
|tag_e.g. like this:
<tag/ ... _tag| --- |tag_ ... /tag>(In the example, the element indicated by '---' does not belong to the code's coded elements.)
Overlap
In version 3 codes of all types may overlap codes of any type (whereas in version 2 multi-element cannot overlap each other).
One problem with earlier versions is that tokens of the same code type cannot overlap:
<s/ <s/ /s> /s>will necessarily be interpreted as two hierarchically nested codes. In MECS version 3, a tag may include a special code token identifier which serves to overcome this limitation, eg as follows:
<s #1/ <s #2/ /s #1> /s #2>Document Structure
In version 3, there will be no restrictions on combinations of codes whatsoever: all codes may nest arbitrarily deep and codes of all types may overlap with each other.
Master Documents
As a result of the changes described above the MECS header format will be simplified.
As in earlier versions, the syntactical role of every tag can be deduced directly from its delimiters. If a document includes a MECS header it will therefore still be possible to deduce a document's entire code syntax, including its code inventory, from the encoded document itself.
Unlike earlier versions, however, the formal specification of the code system will not be contained in a Code Declaration Table (CDT), but in a Master Document, which is itself a well-formed MECS document. Correspondingly, the master document deducible from any well-formed MECS document is called its Minimal Master Document. It also follows that any Minimal Master Document is its own Minimal Master Document.
In addition, Format Master Documents will allow for the inclusion of element format declarations. A Format Master Document specifies for each of the codes in a code system whether its coded element should correspond to some specific format such as free text, numeric characters only, a date in some standard format, a closed list of string values, and so on.
As with version 2, all SGML documents will be formally MECS- conforming documents. However, the functional compatibility of version 3 with SGML will be improved.
This User Guide is meant as a help to a quick start for use of the program package. It does not cover all aspects or details of the programs. For more detailed technical information, cf. 3 below. Some knowledge of the basic MECS syntax is presupposed - cf. Part I section 1 for a brief introduction.
Peter Cripps of the Wittgenstein Archives has written a menu- driven user interface integrating all aspects of the MECS Program Package. This user interface will be documented separately and made available later.
3.1 Installation and System Requirements
All programs in the package run on IBM PCs with DOS version 3.x or later, and compatibles.
Users will normally receive a copy of the package on a floppy disk or
as a zip archive containing a directory called MECS. To install the package,
copy all files on a separate directory on your hard disk called e.g. 'C:\MECS',
and add the full path name to your path string, e.g. by adding 'C:\MECS'
to the path command in your AUTOEXEC.BAT file as follows:
PATH=C:\;C:\DOS;...;C:\MECS
In all examples given below it will be assumed that this installation
procedure has been followed.
The package occupies less than 700 Kb of disk space. A hard disk is recommended, although the package will also run on a floppy disk system. Memory requirements depend on the size of your documents. It is possible to run the programs with less than 200 Kb available memory, but in most cases you will need more. The programs do not make use of extended or expanded memory.
Users with no intention to use the MECS Program Package for processing of SGML documents or conversion of MECS documents to SGML may skip this section.
Users who do have such intentions will find the rest of this User Guide to be of help as an introduction to the MECS Program Package, even though the examples discussed here are not SGML examples.
Roughly, all SGML documents are MECS-conforming, and all documents created or modified in MECS can be converted to SGML.
However, this requires some qualification: SGML documents are MECS-conforming only provided that they do not make use of tag minimization or end tag omission 20. The MECS Program Package provides tools which ensure that any document you create in MECS either is or can be converted to an SGML-conforming document instance (). The Program Package also allows you to take steps to ensure that such conversion leads to no loss or distortion of information ().
You can test your SGML documents for MECS conformance with the program MECSVAL. SGML documents are MECS-conforming in virtue of certain exceptions and deviations from the main outline of the basic syntax of MECS which have been made precisely in order to enhance SGML compatibility (cf. Part I, ##) 21.
MECSVAL is the only program in the MECS Program Package which takes all of these exceptions and deviations fully into consideration 22.
Therefore, if you intend to do any serious work at all with SGML documents by means of the MECS Program Package, it is highly recommended that you first use the program SGMLMECS to convert them to a format accepted also by the other MECS programs. You may convert your documents back to SGML again with the program MECSSGML.
SGML has features and capabilities which MECS does not have, and vice versa. But while MECS 'knows' at least something about SGML, SGML does not 'know about' MECS at all. Features and capabilities of MECS which are not shared by SGML may create SGML syntax errors. MECS, on the other hand, is designed simply to accept and ignore those features of SGML which it does not share with MECS.
If you want to use MECS primarily as a tool to process SGML documents, you should be aware that there are certain features of SGML which though accepted are not supported by MECS ().
If you use MECS to create SGML documents, or want to be able to convert your MECS documents to SGML, you should avoid using MECS features which have no corollary in SGML, or be aware of the consequences of doing so ().
3.3 Creating and Validating Documents and CDTs
To create your first MECS document, type the command
MECSVAL
at the DOS prompt and press return. The following menu will be displayed
in the upper part of the screen:
+------------------------------------------------------------+ |C:\MYDIR Mem: 433018 MECSVAL version 2.01| +------------------------------------------------------------+ |L LOG: I Info 1 List directory | |C CDT: S Switches 2 Change directory| |T TXT: M Create Minimal CDT 3 Copy file | |E EDT: D Check CDT 4 Print file | |Q Quit V Check CDT and TXT 5 Delete file | +------------------------------------------------------------+Press 'E', and MECSVAL will prompt you for a file name. Type a file name, e.g. 'DOC1', and press Enter to activate the MECSVAL editor.
The first thing you need to do, is to include a MECS header at the very beginning of your document. We will assume that you intend to use the MECS default delimiters (.). To save yourself some typing, you may include the default MECS header by pressing Ctrl+K, then R. When prompted for a file name, type 'C:\MECS\HEADMECS' (assuming that you installed the MECS Program Package on a directory called 'C:\MECS'), and press Enter. The top of your screen will now look like this:
+------------------------------------------------------------+ |+----------------------------------------------------------+| ||DOC1. Line 1 Col 1 Byte 1 Insert Indent Save|| |+----------------------------------------------------------+| | £ < > < / / > [ / | \ / | / ] { " \ } | | | | |Go to the line below the header and include the file C:\MECS\EX1 (or, alternatively, type the text below in). Your document should now look like this:
£ < > < / / > [ / | \ / | / ] { " \ } <|From EX1: |> [dmi\0|6] <paragraph/<title/Sample MECS Document>>> <intro/<paragraph/<indent/3>This is a sample <b/MECS> document which is intended to demonstrate the use of currently available <b/MECS> software./paragraph>/intro>Press F2 to store the text and exit the editor. Note that on the main menu DOC1 is now indicated as the current editor (EDT) file. If you need to review DOC1 again before proceeding, press 'E' and then Enter. To exit the editor and save, press F2. To exit without saving, press Ctrl+K, then Q.
You need to check your text for coding errors, but you have not yet created any Code Declaration Table (CDT). You may do both these things in one operation: press 'M' and type 'DOC1' when prompted for a text (TXT) file name. Because the text contained an error, you will get an error message and an indication of the line and column number where the error was detected:
+------------------------------------------------------------+ |C:\MYDIR Mem: 433018 MECSVAL version 2.01| +------------------------------------------------------------+ |L LOG: I Info 1 List directory | |C CDT: S Switches 2 Change directory| |T TXT: M Create Minimal CDT 3 Copy file | |E EDT: DOC1 D Check CDT 4 Print file | |Q Quit V Check CDT and TXT 5 Delete file | +------------------------------------------------------------+ |Text file: C:\MYDIR\DOC1 | |Writing code declaration table: DOC1.CDT | | Report from MECSVAL 25.8.1994, 22:30 | | | | | | | |Errors in DOC1: | | | | 3 [dmi\0|6] | | 4 <paragraph/<title/Sample MECS Document>>> | | ^ | |Error 68: No one-element code active | | | |1 errors encountered in DOC1 | |WARNING: CDT file may contain ERRORS | |Press Q to quit, any key to edit | +------------------------------------------------------------+The error message 'No one-element code active' indicates that you have included a superfluous one-element end tag close delimiter, i.e. a '<' too many. Press any key (except 'Q'), and the editor will be activated with the cursor positioned at the exact location of the error. Correct the error (by deleting the superfluous '>'), exit and save by pressing F2. Repeat this process until you get the message 'No errors' on pressing 'M' at the main menu. At this stage, your screen should look like this:
+------------------------------------------------------------+ |C:\MYDIR Mem: 430456 MECSVAL version 2.01| +------------------------------------------------------------+ |L LOG: I Info 1 List directory | |C CDT: DOC1.CDT S Switches 2 Change directory| |T TXT: DOC1 M Create Minimal CDT 3 Copy file | |E EDT: DOC1 D Check CDT 4 Print file | |Q Quit V Check CDT and TXT 5 Delete file | +------------------------------------------------------------+ |Text file: C:\MYDIR\DOC1 | |Writing code declaration table: DOC1.CDT | | Report from MECSVAL 25.8.1994, 22:30 | | | | | | | |Errors in DOC1: | | | |No errors encountered in text file DOC1 | | | | | | | | | | | | | | | +------------------------------------------------------------+You have now created a (minimal) CDT called DOC1.CDT on the basis of DOC1. Review your minimal CDT by pressing 'E' and entering 'DOC1.CDT'. It will look like this:
£ < > < / / > [ / | \ / | / ] { " \ } CDEMST abcdefhilmnoprstuvwy 036 . £ abdeghilmnoprt £ n o £ p r m o b o indent o intro o paragraph o title 2 dmiAs you can see, all the free characters and codes you used in DOC1 have been declared. Exit by pressing Ctrl+K, then Q. Assuming that you conclude from this inspection that you need to declare additional characters and codes used in the rest of this example, it is suggested that you extend the minimal CDT. An example CDT is supplied with the Program Package under the file name EX.CDT, so in this case you may save some typing by simply copying C:\MECS\EX.CDT. Normally, however, you would have to work some other way to create your extended CDT, e.g.: press '3' on the main menu and copy DOC1.CDT to a file called EX.CDT, and then edit EX.CDT to suit. The example EX.CDT looks like this:
£ < > < / / > [ / | \ / | / ] { " \ } abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 1234567890 ,;.:-()!?"' *%&=+ß abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890._- n o # p r d o b o indent o intro o paragraph o title 2 dmi o REF n ind n l o example o i o note o s o u p s r reverse_E r reverse_A d existPress F2 to store and exit EX.CDT. You may check EX.CDT for errors by pressing 'C' at the main menu, entering 'EX.CDT' when prompted for a file name. Then press 'D'. If errors are encountered, an error message will be displayed in the lower part of the screen:
+------------------------------------------------------------+ |C:\MYDIR Mem: 430456 MECSVAL version 2.01| +------------------------------------------------------------+ |L LOG: I Info 1 List directory | |C CDT: EX.CDT S Switches 2 Change directory| |T TXT: DOC1 M Create Minimal CDT 3 Copy file | |E EDT: EX.CDT D Check CDT 4 Print file | |Q Quit V Check CDT and TXT 5 Delete file | +------------------------------------------------------------+ |Reading code declaration table: EX.CDT | |Text file: C:\MYDIR\DOC1 | | Report from MECSVAL 25.8.1994, 22:33 | | | | | | 20 o i | | 21 o note/ | | ^ | |Error 39: Illegal character in generic identifier | | | |1 errors encountered in code declaration table EX.CDT | |Press Q to quit, any key to edit | | | | | | | | | +------------------------------------------------------------+In this case, you had mistakenly included the tag close delimiter in the declaration of a generic identifier. Press any key (except 'Q'), and the editor will be activated with the cursor positioned at the location in the file where the error was detected. Correct the error, store and exit, and press 'D' at the main menu again. Repeat this process until you get no error messages. Edit DOC1 again and add the following text (by typing it in, or by including the file C:\MECS\EX2) 23:
<|From EX2: |> <paragraph/<s/We <s/will see <s/some examples> of recursive codes, of <b/elements <u/which/b> overlap/u>, of/s> special characters<ind> like {reverse_A}, {reverse_E}, {reverse_E\exist}, and {"E"\exist}, and of<i> substitutions in <note/simplified/note>/s> MECS-WIT style:/paragraph> <exmple/<paragraph/ <ind><s/Ich besuche gern das alte <i/kleine> [s|Schloß|<i/Haus>] meines [s|Onkels.|<i/Vaters.>]/s> <ind><s/Ich besuche gern das [s|alte Schloß meines Onkels|<i/kleine Haus> meines <i/Vaters>]/s>/paragraph> /example> <paragraph/This is the end of our <note/very artificial> example./paragraph>Press F2 to store and exit DOC1. Instead of creating yet another minimal CDT on the basis of the new version of DOC1, you may check it against EX.CDT by pressing 'V' on the main menu. An error message will be displayed in the lower part of the screen:
+------------------------------------------------------------+ |C:\MYDIR Mem: 433018 MECSVAL version 2.01| +------------------------------------------------------------+ |L LOG: I Info 1 List directory | |C CDT: EX.CDT S Switches 2 Change directory| |T TXT: DOC1 M Create Minimal CDT 3 Copy file | |E EDT: DOC1 D Check CDT 4 Print file | |Q Quit V Check CDT and TXT 5 Delete file | +------------------------------------------------------------+ |Reading code declaration table: EX.CDT | |Text file: C:\MYDIR\DOC1 | | Report from MECSVAL 25.8.1994, 22:34 | | | | | |No errors encountered in code declaration table EX.CDT | | | | | |Errors in DOC1: | | | | 13 {reverse_A}, {reverse_E}, {reverse_E\exist}, | | 14 and {"E"\exist}, and of<i> substitutions in | | ^ | |Error 55: Wrong type | | | |1 errors encountered in DOC1 | |Press Q to quit, any key to edit | +------------------------------------------------------------+In this case, you have used 'i', which according to EX.CDT should be a one-element code, as a no-element code. We will assume that the no-element code 'l' was what you intended. Press any key (except 'Q'), and the editor will be activated with the cursor positioned at the location where the error was detected. Correct the error by changing <i> to <l>, store and exit.
You have now been through the process of creating a MECS document, reconstructing a minimal CDT, creating and validating your own CDT, and editing and validating a document in relation to a CDT. The entire process has been carried out interactively, and errors have been detected and corrected one by one.
At this stage, you may go on and repeat the process of validating and editing DOC1 until you get no more error messages. Instead, you may have MECSVAL go through the entire document and write all remaining error messages to a log file: press 'L' on the main menu and enter a log file name, e.g. DOC1.LOG. Press 'V' to check the document again. If errors are detected, you will be prompted as follows:
+------------------------------------------------------------+ |C:\MYDIR Mem: 427416 MECSVAL version 2.01| +------------------------------------------------------------+ |L LOG: DOC1.LOG I Info 1 List directory | |C CDT: EX.CDT S Switches 2 Change directory| |T TXT: DOC1 M Create Minimal CDT 3 Copy file | |E EDT: DOC1 D Check CDT 4 Print file | |Q Quit V Check CDT and TXT 5 Delete file | +------------------------------------------------------------+ |Reading code declaration table: EX.CDT | |Text file: C:\MYDIR\DOC1 | |Log file: C:\MYDIR\DOC1.LOG | | | |No errors encountered in code declaration table EX.CDT | | | |3 errors encountered in DOC1 | |Press Q to quit, any key to edit | | | | | | | | | | | | | +------------------------------------------------------------+A complete list of error messages has been written to the log file. Press any key, and MECSVAL will activate the editor with a split screen: the log file will be displayed in the lower window, the document file in the upper window:
+------------------------------------------------------------+ |+----------------------------------------------------------+| ||DOC1. Line 16 Col 8 Byte 590 Insert Indent || |+----------------------------------------------------------+| |<b/MECS> document which is intended to demonstrate | |the use of currently available <b/MECS> | |software./paragraph>/intro> | |<|From EX2: |> | |<paragraph/<s/We <s/will see <s/some examples> | |of recursive codes, of <b/elements <u/which/b> | |overlap/u>, of/s> special characters<ind> like | |{reverse_A}, {reverse_E}, {reverse_E\exist}, | |and {"E"\exist}, and of<l> substitutions in | |<note/simplified/note>/s> MECS-WIT style:/paragraph> | |<exmple/<paragraph/ | |+----------------------------------------------------------+| ||DOC1.LOG Line 19 Col 1 Byte 701 Insert Indent Save|| |+----------------------------------------------------------+| | 15 <note/simplified/note>/s> MECS-WIT style:/paragraph> | | 16 <exmple/<paragraph/ | | ^ | |Error 62: Illegal generic identifier | | | | 21 <i/Vaters>]/s>/paragraph> | | 22 /example> | | ^ | |Error 59: START tag missing | | | |Error 99: exmple started at line 16 8 - end tag missing| +------------------------------------------------------------+The log file is the active window. (You may have to scroll the lower window with the arrow keys in order to see in it the exact part of the log file displayed above.) If you switch to the upper window (by pressing F6) you will see that the cursor is positioned at the location of the last error reported in the log file.
It is frequently the case that one error causes several error messages. In this particular case, all three error messages are caused by one and the same error: you have mistyped <exmple/ for the start tag <example/ . MECSVAL first reports that <exmple/ is not a legal generic identifier, then indicates that /example> closes a code that has not been opened (because the start tag was mistyped), and finally, that the erroneous code <exmple/ has not been closed.
In many cases where the text file contains more than just a few errors, it may be convenient to use the log file option. You may then correct all errors in one pass by switching between the two windows, - scrolling the log file window to see error messages and editing the text window to correct the errors.
Once you have corrected all errors, check DOC1 again by pressing 'V' on the main menu. If you get no error messages you may exit from MECSVAL by pressing 'Q' at the main menu.
MECSVAL may also be run in batch mode. You can check that DOC1 conforms
to the basic syntax of MECS by entering the DOS command line
MECSVAL - DOC1 DOC1.LOG
If DOC1 is a MECS-conforming document, this command will cause MECSVAL
to display a 'No errors' message and to write DOC1's minimal CDT to a new
file called DOC1.CDT. If DOC1 is not MECS-conforming, it has no minimal
CDT. MECSVAL will then display an error message, and a list of errors will
be found in DOC1.LOG.
If you want to check that DOC1 conforms to a specific CDT, e.g. EX.CDT,
you should enter the following command at the DOS prompt:
MECSVAL EX.CDT DOC1 DOC1.LOG
MECSVAL includes several options and features which have not been mentioned
here. Cf. 2.6 for more information on MECSVAL log files, and 3 for a comprehensive
documentation of the program.
Note: Any use of other programs in the MECS Program Package presupposes that the input document is a well-formed MECS document, i.e. that it has a minimal CDT. Therefore, you should make sure that MECSVAL reports no error messages when reconstructing a minimal CDT from your documents, before processing them with any other program in the package. If used on texts that contain MECS syntax errors, the other programs in the package may cause unpredictable results.
Now that you have corrected all errors DOC1 will look like this:
£ < > < / / > [ / | \ / | / ] { " \ } <|From EX1: |> [dmi\0|6] <paragraph/<title/Sample MECS Document>> <intro/<paragraph/<indent/3>This is a sample <b/MECS> document which is intended to demonstrate the use of currently available <b/MECS> software./paragraph>/intro> <|From EX2: |> <paragraph/<s/We <s/will see <s/some examples> of recursive codes, of <b/elements <u/which/b> overlap/u>, of/s> special characters<ind> like {reverse_A}, {reverse_E}, {reverse_E\exist}, and {"E"\exist}, and of substitutions in <note/simplified/note>/s> MECS-WIT style:/paragraph> <example/<paragraph/ <ind><s/Ich besuche gern das alte <i/kleine> [s|Schloß|<i/Haus>] meines [s|Onkels.|<i/Vaters.>]/s> <ind><s/Ich besuche gern das [s|alte Schloß meines Onkels|<i/kleine Haus> meines <i/Vaters>]/s>/paragraph> /example> <paragraph/This is the end of our <note/very artificial> example./paragraph>Let us assume that you would like to tidy up the layout of DOC1, and that you want to insert codes containing reference numbers (which may be useful for a variety of reasons) at intervals within the text, e.g. preceding all 'paragraph'-codes. Type the following command at the DOS prompt:
£ < > < / / > [ / | \ / | / ] { " \ } <|From EX1: |> [dmi/2\0/dmi|6/dmi] <REF/1/REF><paragraph/<title/Sample MECS Document/title>/paragraph> <intro/ <REF/2/REF><paragraph/<indent/3/indent>This is a sample <b/MECS/b> document which is intended to demonstrate the use of currently available <b/MECS/b> software./paragraph> /intro> <|From EX2: |> <REF/3/REF><paragraph/ <s/We <s/will see <s/some examples/s> of recursive codes, of <b/elements <u/which/b> overlap/u>, of/s> special characters<ind> like {reverse_A}, {reverse_E}, {reverse_E\exist}, and {"E"\exist}, and of substitutions in <note/simplified/note>/s> MECS-WIT style:/paragraph> <example/ <REF/4/REF><paragraph/<ind> <s/Ich besuche gern das alte <i/kleine/i> [s/2|Schloß/s|<i/Haus/i>/s] meines [s/2|Onkels./s|<i/Vaters./i>/s]/s> <ind> <s/Ich besuche gern das [s/2|alte Schloß meines Onkels/s|<i/kleine Haus/i> meines <i/Vaters/i>/s]/s> /paragraph> /example> <REF/5/REF><paragraph/This is the end of our <note/very artificial/note> example./paragraph>If we assume that you would now like to format the text of DOC1 into a more compact, even if less conspicuous form, you may type the following command at the DOS prompt:
£ < > < / / > [ / | \ / | / ] { " \ } <|From EX1: |> [dmi\0|6] <REF/1><paragraph/<title/Sample MECS Document>> <intro/ <REF/2><paragraph/<indent/3>This is a sample <b/MECS> document which is intended to demonstrate the use of currently available <b/MECS> software.> > <|From EX2: |> <REF/3><paragraph/ <s/We <s/will see <s/some examples> of recursive codes, of <b/elements <u/which/b> overlap>, of> special characters<ind> like {reverse_A}, {reverse_E}, {reverse_E\exist}, and {"E"\exist}, and of substitutions in <note/simplified>> MECS-WIT style:> <example/ <REF/4><paragraph/<ind> <s/Ich besuche gern das alte <i/kleine> [s|Schloß|<i/Haus>] meines [s|Onkels.|<i/Vaters.>]> <ind> <s/Ich besuche gern das [s|alte Schloß meines Onkels|<i/kleine Haus> meines <i/Vaters>]> > > <REF/5><paragraph/This is the end of our <note/very artificial> example.>It should be noted that DOC1 and DOC1.MIN are equivalent, and that they will produce identical output from all other MECS programs.
The program MECSPRES enables you to create reformatted versions of your document in a number of different word processor formats and with a variety of different typographic features. First, however, you will have to create a profile definition table - a PDT. A PDT contains specifications as to how the codes in a document should be realized in the output from the program MECSPRES.
An example PDT is supplied with the Program Package under the file name EX-A.PDT, so in this case you may save some typing by simply copying C:\MECS\EX-A.PDT. Normally, however, you would have to go about in some other way to create your PDT, e.g.: press '3' on the main menu and copy EX.CDT to a file called EX-A.PDT, and then edit EX-A.PDT to suit. (You may use the MECSVAL editor or any other word processor which allows you to store files in so-called plain ASCII or DOS format.)
The example EX-A.PDT looks like this:
£ < > < / / > [ / | \ / | / ] { " \ } n o # p r m _ £ o b £ b £ £ £ £ £ o indent j £ £ £ £ £ £ o intro £ i £ £ £ £ £ o paragraph e £ £ £ £ £ £ o title s le £ £ £ £ £ 2 dmi 7 £ £ £ £ £ £ o REF q b £ £ £ £ £ n ind g £ £ £ £ £ £ o example £ m £ £ £ £ £ o note £ £ ( £ ) £ £ o u £ u £ £ £ £ £ p s 5 £ £ £ £ £ £ r reverse_E £ £ £ {#6#121} £ £ £ r reverse_A £ £ £ {#6#122} £ £ £ m exist £ £ £ {#6#121} £ £ £Having stored EX-A.PDT, type the following command at the DOS prompt:
[Not reproducible in HTML; see PostScript version of this text]Revising the PDT you may change the layout of the output file. E.g., with the following profile definition table, EX-B.PDT (also included with the MECS Program Package):
£ < > < / / > [ / | \ / | / ] { " \ } n o # p r m _ £ # a NB:_ o b £ bp £ £ £ £ £ o indent j £ £ £ £ £ £ o paragraph e £ £ £ £ £ £ o title £ s £ £ £ £ £ 2 dmi 7 £ £ £ £ £ £ o REF d £ £ £ £ £ £ n ind g £ £ £ £ £ £ n l h £ £ £ £ £ £ o i £ r £ £ £ £ £ o note b £ ( £ ) a c o u £ u £ £ £ £ £ p s 2 £ £ £ £ £ £ r reverse_E £ £ £ {#6#121} £ £ £ r reverse_A £ £ £ {#6#122} £ £ £ m exist £ £ £ {#6#121} £ £ £the command
will create the following WordPerfect 5.1 document, DOC1.BDW:
[Not reproducible in HTML; see PostScript version of this text]In addition to WordPerfect 5.1, available output formats are so- called plain ASCII, MECS-like presentational markup format, Folio Views markup format, HTML format, so-called screen display format as well as a number of other formats and layouts. E.g., the command
1 SAMPLE MECS DOCUMENT 2 This is a sample MECS document which is intended to demonstrate the use of currently available MECS software. 3 We will see some examples of recursive codes, of elements which overlap, of special characters like •, •, •, and E, and of substitutions in (simplified) MECS-WIT style: 4 Ich besuche gern das alte kleine Haus meines Vaters. Ich besuche gern das kleine Haus meines Vaters 5 This is the end of our (very artificial) example.while the command
will create a document DOC1.ANM formatted with MECSPRES' own, MECS-like presentational markup:
<C/<l/<e/SAMPLE MECS DOCUMENT/e>/l>/C> <T><i/This is a sample <b/MECS/b> document which is/i> <i/intended to demonstrate the use of currently/i> <i/available <b/MECS/b> software./i> We will see some examples of recursive codes, of <b/elements <u/which/b> overlap/u>, of special characters <T>like {#6#122}, {#6#121}, {#6#121}, and {#6#121}, and of substitutions in <b/(/b>simplified<b/)/b> MECS-WIT style: <T>Ich besuche gern das alte kleine Haus meines Vaters. <T>Ich besuche gern das kleine Haus meines Vaters This is the end of our <b/(/b>very artificial<b/)/b> example.If you would like to review output on screen before deciding to create an output file, replace the output file name with a dash. If you want to avoid word processor-specific formatting codes displayed on screen, replace the fifth parameter with an 's'. Examples:
Several of the programs in the package may help you analyze the structure of your documents: MECSVAL and MECSPRES, which we have already looked at, as well as MECSLYSE, MECSBETA, BETATXT, MECSSPEL, ALPHATXT and MECSGRAB.
As already mentioned (cf. 2.3), MECSVAL allows you to store a log of
the validation of a document. This log also contains useful status information
on the document in question. The log file may be created without entering
MECSVAL's interactive mode. E.g., if you type the following command at
the DOS prompt:
MECSVAL - DOC1 DOC1.LOG
the log file DOC1.LOG will look like this:
Report from MECSVAL 26.8.1994, 23:15 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Errors in DOC1: No errors encountered in text file DOC1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Status report: No-element codes: ind 3 l 1 One-element codes: REF 5 b 3 example 1 i 5 indent 1 intro 1 note 2 paragraph 5 s 5 title 1 u 1 Poly-element codes: s 3 N-element codes: dmi 1 Character representation codes: reverse_A 1 reverse_E 2 Character disambiguation codes: exist 2 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - S U M M A R Y : Number of codes: types | tokens No-element codes: 2 4 One-element codes: 11 30 Poly-element codes: 1 3 Character representation codes: 2 4 Character disambiguation codes: 1 2 N-element codes: 1 1 Sum total: 18 44 Maximum nesting level: 5 Overlapping codes: 1The first part of the log file lists all errors in the text file, if any (in this case, no errors have been found). The second part lists all codes found in the document, and indicates the number of occurrences of each code. The third part indicates the number of types and tokens of codes of each code type found in the document. The last two lines indicate the number of overlapping codes and the maximum nesting level of codes in the document.
3.6.2 Document Structure and Overlapping Elements
The program MECSLYSE gives additional information on the structure of
documents. Type the following command at the DOS prompt:
MECSLYSE DOC1 DOC1.TR1 - - O REF
The output file DOC1.TR1 will contain a list of all overlapping codes,
as well as a complete listing of the document's element structure in the
form of an indented table:
MECSLYSE File in: DOC1 File out: DOC1.TR1 OVERLAP: <u/ /b> 15 21 15 29 <b/ started at 15 9 Position 15 29 Level 1 DOCUMENT STRUCTURE 2 22 |[dmi\ 3 5 |<REF/1> 3 22 |<paragraph/ 3 29 | . <title/ 5 9 |<intro/ 6 5 | . <REF/2> 6 22 | . <paragraph/ 6 30 | . . <indent/ 7 5 | . . <b/ 8 36 | . . <b/ 11 5 |<REF/3> 11 22 |<paragraph/ 12 5 | . <s/ 13 7 | . . <s/ 14 9 | . . . <s/ 15 9 | . . . <b/ 15 21 | . . . . <u/ 19 10 | . . <note/ 21 11 |<example/ 22 5 | . <REF/4> 22 22 | . <paragraph/ 23 5 | . . <s/ 23 34 | . . . <i/ 24 9 | . . . [s| 24 21 | . . . . <i/ 25 9 | . . . [s| 25 22 | . . . . <i/ 26 5 | . . <s/ 26 31 | . . . [s| 27 16 | . . . . <i/ 28 7 | . . . . <i/ 30 5 |<REF/5> 30 22 |<paragraph/ 30 51 | . <note/ SUMMARY Overlapping code types <b/ <u/ 1 Overlapping codes: 1 Max. depth of overlapping codes: 1 Max. no of overlapping codes at 0 Number of pairs of overlapping codes 13.6.3 Breakpoints and Recursion
The command
MECSLYSE DOC1 DOC1.TR2 o paragraph R
will cause the output file DOC1.TR2 to contain a list of all codes
active (if any) at the start and end points of all occurrences of the one-element
code 'paragraph', as well as a list of all occurrences of recursive codes:
MECSLYSE File in: DOC1 File out: DOC1.TR2 BREAKPOINT: <paragraph/ at 6 22, <intro/ started at 5 9, still active BREAKPOINT: /paragraph> at 9 22, <intro/ started at 5 9, still active RECURSION: <s/ at 13 7 and at 12 5 RECURSION: <s/ at 14 9 and at 13 7 and at 12 5 BREAKPOINT: <paragraph/ at 22 22, <example/ started at 21 11, still active BREAKPOINT: /paragraph> at 28 34, <example/ started at 21 11, still active SUMMARY: Codes at breakpoints: 4 Recursive codes: 33.6.4 Betatexts (Substitutions)
The concept of a betatext has been invented in the course of work related to the development of a registration standard for the Wittgenstein Archives at the University of Bergen, MECS-WIT. In MECS-WIT, multi-element codes are used to indicate substitutions (variants, parallel texts) in manuscripts. It belongs to the definition of a substitution that each of its elements are incompatible with any other element, but at the same time it is a requirement that every element can be embedded in the context of the rest of the text.
A betatext is a version of the text corresponding to one particular combination of such substitution elements. A text with many substitutions therefore has a quite considerable number of betatexts. The programs MECSBETA and BETATXT were written in order to help identify and check these possible combinations of text elements.
BETATXT serves to compute the number and display all possible combinations of elements of specified multi-element codes in a document. In order to achieve this, the document needs preprocessing by MECSPRES. The following profile definition table EX-BETA.PDT (supplied with the Program Package) specifies a profile suitable for the required preprocessing:
£ < > < / / > [ / | \ / | / ] { " \ } n o # p r m _ £ o indent d £ £ £ £ £ £ o paragraph e £ £ £ £ £ £ 2 dmi d £ £ £ £ £ £ o REF w £ £ £ £ £ £ o s 2 £ £ £ £ £ £ p s b £ £ £ £ £ £ r reverse_E £ £ £ {#6#121} £ £ £ r reverse_A £ £ £ {#6#122} £ £ £ m exist £ £ £ {#6#121} £ £ £Type the following command at the DOS prompt:
4 Ich besuche gern das alte kleine ->Schloß meines Onkels. ->Schloß meines Vaters. ->Haus meines Onkels. ->Haus meines Vaters. Ich besuche gern das ->alte Schloß meines Onkels ->kleine Haus meines Vaters ----------------------------------- Beta: 8The last line indicates that the total number of "betatexts" generated by the document is 8.
Spell checking may be a problem with heavily marked up files - ordinary spell checkers are not able to distinguish markup from text and are therefore also unable to identify strings to be checked for spelling. In general it is therefore necessary to perform spell checking on reformatted versions of the encoded documents - with the considerable disadvantage of having to trace the sources of errors in the marked-up files manually.
The Program Package contains two programs which may help to remedy this problem - MECSPRES and ALPHATXT. First, a word list is created by designing an appropriate profile definition table. EX- ALPHA.PDT (which comes with the program package) is an example of such a profile definition:
£ < > < / / > [ / | \ / | / ] { " \ } n o # p r m _ ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz%&/-• o example d £ £ £ £ £ £ o REF d £ £ £ £ £ £ o indent d £ £ £ £ £ £ o title d £ £ £ £ £ £ o intro d £ £ £ £ £ £ o paragraph e o £ £ £ £ £ o s 2 £ £ £ £ £ £ r reverse_E £ £ £ {#6#121} £ £ £ r reverse_A £ £ £ {#6#122} £ £ £ m exist £ £ £ {#6#121} £ £ £This profile suppresses all one-element 'example', 'REF', 'indent', 'title' and 'intro' codes, and changes the first free character of all 'paragraph' codes to lower case. It also defines 'paragraph' as a section code and 's' as a segment code (). In effect, the profile suppresses the title and intro of DOC1 and extracts the English text from the rest of the document. Alternatively, we could have defined a different filtering profile to extract only the German text of the document. The command
# 12 6 we - 13 8 will . 13 13 see - 14 10 some . 14 15 examples . 14 27 of . 14 30 recursive . 14 40 codes . 14 47 of . 15 10 elements . 15 22 which . 15 31 overlap . 15 43 of . 16 5 special . 16 13 characters . 16 29 like . 16 44 • . 17 15 • . 17 34 • . 17 37 and . 17 51 • . 18 5 and . 18 9 of . 18 15 substitutions . 18 29 in . 19 11 simplified . 19 31 MECS-WIT . 20 3 style # 30 23 this . 30 28 is . 30 31 the . 30 35 end . 30 39 of . 30 42 our . 30 52 very . 31 3 artificial . 31 20 exampleThis format is accepted by the program ALPHATXT. Assuming that the file EX-ENG.LIS contains a master list of english words, the command
Assuming that all words except 'overlap', 'artificial', 'recursive', and 'MECS-WIT' are already included in EX-ENG.LIS, and that the two first are accepted whereas the two last are rejected, DOC1.CHK will look like this:
. 14 30 recursive . 19 31 MECS-WITwhile DOC1.OK will look like this:
overlap artificialTo perform both the above steps in one operation, you can give the command:
MECSSPEL EX-ENG.LIS EX-ALPHA.PDT DOC1 25
Since DOC1.CHK contains references by line and column number to relevant locations in DOC1, it is easy to retrieve these locations in DOC1 by displaying DOC1.CHK in a parallel window while correcting DOC1.
Since the words listed in DOC1.OK are accepted by the user it may be
convenient to add them to EX-ENG.LIS for use in later spell checking. This
can be done simply by appending the file DOC1.OK to EX-ENG.LIS. However,
in order to make EX-ENG.LIS an ordered list, the new words in DOC1.OK may
be inserted in their proper places with the following command:
ALPHATXT OR - EX-ENG.LIS DOC1.OK - /EX-ENG.LIS
3.6.6 Frequency Word Lists and Simple Statistical Analyses
MECSPRES and ALPHATXT can be used in a variety of ways and combinations () to build up and maintain master word lists and check individual document files.
ALPHATXT allows for user-defined character sort procedures and can therefore
produce a wide range of differently sorted alphabetic word lists from a
document - for details on this. ALPHATXT can also produce frequency word
lists and simple statistical analyses. The following command:
ALPHATXT FNRS - - - DOC1.ALF DOC1.WL - DOC1.STS
will write a frequency word list to DOC1.WL and a summary of statistical
information to DOC1.STS. The frequency word list is sorted according to
descending frequency. The first few lines of the file DOC1.WL will look
like this:
of 5 • 4 and 2 MECS-WIT 1 artificial 1 characters 1 codes 1The statistical summary DOC1.STS indicates total number of characters, strings (words), string types (word forms), sections and segments; their mean, maximum and minimum values and standard deviation:
Chars: 170 Strings: 37 Chars/String: 4.59 Min: 1 Max: 8 StdDv: 2.39 Segments: 4 Strings/Segment: 9.25 Min: 1 Max: 1 StdDv: 4.33 Sections: 2 Strings/Section: 18.50 Min: 1 Max: 1 StdDv: 4.80 Types: 29 Tokens/Type: 1.28 Min: 1 Max: 26 StdDv: 0.973.6.7 Extracting Elements
The program MECSGRAB serves to extract specified elements from a document.
E.g. the command
MECSGRAB DOC1 DOC1.GRB o note RT
will extract from DOC1 all one-element 'note' codes, preceded by their
line and column reference numbers, to the file DOC1.GRB:
. 19 10 <note/simplified/note> . 30 51 <note/very artificial/note>Similarly, the command
[s/2|Schloß/s|<i/Haus/i>/s] [s/2|Onkels./s|<i/Vaters./i>/s] [s/2|alte Schloß meines Onkels/s|<i/kleine Haus/i> meines <i/Vaters/i>/s]Output from MECSGRAB may in turn be used as input to other MECS programs.
3.7 Processing SGML Documents in MECS
3.7.1 Validating SGML documents for MECS Conformance
It has been explained elsewhere (cf. Part I ##, ##, ##) that SGML documents are MECS-conforming, provided that they do not make use of tag minimization or end tag omission. E.g., the following SGML document (), EXSGML:
<!DOCTYPE TEI.1 SYSTEM "c:\tei\public\tei1.dtd" [ <!ENTITY tla "Three Letter Acronym"> <!ELEMENT my.tag - - (#PCDATA)> <!-- following line added by C.H. --> <!ELEMENT my.stone - o EMPTY> <!-- any other special-purpose declarations or re-definitions go in here --> ]> <tei.1> This is an instance of a modified TEI.1 type document, which may contain <my.tag>my special tags</my.tag>, <!-- following line added by C.H. --> including milestones such as <my.stone>, and references to my usual entities such as &tla;. </tei.1>can be validated with the following command:
SGMLVAL EXSGML EXSGML.LOG
You have now validated EXSGML and generated its minimal CDT (which has automatically been called EXSGML.CDT) without in any way interfering with the original SGML file.
Alternatively, you may perform the validation interactively. Copy the
document, which is included with the MECS Program Package under the file
name 'EXSGML', to the current directory. Start MECSVAL by typing the command
MECSVAL
at the DOS prompt. Press 'S', and MECSVAL will display the following
menu:
+------------------------------------------------------------+ |C:\MYDIR Mem: 432266 MECSVAL version 2.01| +------------------------------------------------------------+ |L LOG: I Info 1 List directory | |C CDT: S Switches 2 Change directory| |T TXT: M Create Minimal CDT 3 Copy file | |E EDT: D Check CDT 4 Print file | |Q Quit V Check CDT and TXT 5 Delete file | +------------------------------------------------------------+ | +---------------------------------+ | | |s SGML-mode OFF | | | |n Strict hierarchical nesting OFF| | | |7 Low ASCII OFF | | | |r End tag reduction OPTIONAL | | | |d Reset all values to default | | | |q Quit | | | +---------------------------------+ | | | | | +------------------------------------------------------------+Press 'S' to turn "SGML mode" on, and then 'Q' to exit from the Switches menu. Load the document into the MECSVAL editor (i.e., press 'E', and type 'EXSGML' when prompted for a file name). Type the following MECS header at the beginning of the document (or, alternatively, press Ctrl+K, then R, and type C:\MECS\HEADSGML):
£ < > < > </ > [ £ ! £ £ £ £ ] & £ £ ;Press F2 to store and exit EXSGML. Then press 'M' at the main menu and type 'EXSGML' when prompted for a text file name. If EXSGML is an SGML-conforming document, you should get no error messages. With this interactive process, however (and unlike SGMLVAL) you have also changed the original SGML document by adding a MECS header to it.
As has been explained elsewhere, an SGML file is MECS conforming only in virtue of the exceptions and deviations from the main outline of the basic code syntax of MECS which have been made precisely in order to enhance SGML compatibility (). It should also be noted that even if run in so-called SGML mode, MECSVAL validates for MECS conformance, not for SGML conformance (cf. ##).
Experience has shown that if an SGML document is not MECS- conforming, i.e. if MECSVAL reports errors, it is not entirely unlikely that the document is not properly SGML conforming either. So if MECSVAL reports errors in your SGML documents, it may be a good idea to check whether the error may in fact also be an SGML error.
3.7.2 Converting SGML files to MECS
The other programs in the MECS Program Package will be able to process SGML documents to some extent only. If you intend to do any serious work at all with SGML documents by means of the MECS Program Package, it is highly recommended that you first convert them to MECS default notation by means of the conversion program SGMLMECS. (You may then convert them back into SGML again with another program, MECSSGML - see below.)
In order to convert the example discussed above, EXSGML, to MECS, you
may type the following command at the DOS prompt:
SGMLMECS EXSGML EXMECS
The output file, EXMECS, which is a fully MECS-conforming document,
will look like this:
£ < > < / / > £ £ | £ £ £ £ £ { £ £ } <|DOCTYPE TEI.1 SYSTEM "c:\tei\public\tei1.dtd" [ <!ENTITY tla "Three Letter Acronym"> <!ELEMENT my.tag - - (#PCDATA)> <!-- following line added by C.H. --> <!ELEMENT my.stone - o EMPTY> <!-- any other special-purpose declarations or re-definitions go in here --> ]|> <tei.1/ This is an instance of a modified TEI.1 type document, which may contain <my.tag/my special tags/my.tag>, <|-- following line added by C.H. --|> including milestones such as <my.stone>, and references to my usual entities such as {tla}. /tei.1>It has been mentioned several times that SGML documents which make use of end tag omission or tag minimization are not MECS-conforming. However, even SGML documents with occasional tag minimization may be converted to MECS-conforming documents. E.g. from the following SGML document instance which contains both start tag and end tag minimization:
+------------------------------------------------------------+ |<tei.1> | | This is an instance of a modified TEI.1 type document, | | which may contain <my.tag>my special tags</my.tag>, | | <!-- following 2 lines added by C.H. --> | | <my.tag>also</> | | including milestones <>such</> as <my.stone>, and | | references to my usual entities such as &tla;. | |</tei.1> | +------------------------------------------------------------+SGMLMECS will produce the following:
£ < > < / / > £ £ | £ £ £ £ £ { £ £ } <tei.1/ This is an instance of a modified TEI.1 type document, which may contain <my.tag/my special tags/my.tag>, <|-- following 2 lines added by C.H. --|> <my.tag/also> including milestones <my.tag/such> as <my.stone>, and references to my usual entities such as {tla}. /tei.1>This is a fully MECS-conforming document.
Since MECSVAL will detect all missing end tags, also SGML documents making use of more extensive end tag omission and minimization can (after conversion by SGMLMECS) in most cases be brought to MECS conformance, even if sometimes only with some amount of manual post-editing.
3.7.3 Converting MECS documents to SGML
You may convert EXMECS back to SGML by the following command:
MECSSGML EXMECS EXSGML2
The new file, EXSGML2, is identical to the original file, EXSGML. This
is so because EXMECS contained none of the features peculiar to MECS (naturally,
since we made no changes to the document at all except converting it from
SGML to MECS and then back again to SGML). Documents such as our previous
example, DOC1, which do make use of these special MECS features, however,
can sometimes only be converted to SGML at the cost of some loss or distortion
of information. The command
MECSSGML DOC1 DOC1SGML - MECSDOC R
will give the following result:
+------------------------------------------------------------+ |<MECSDOC> | | <!--From EX1: --> | |<p_dmi><p_el>0</p_el><p_el>6</p_el></p_dmi> | |<REF>1</REF><paragraph><title>Sample MECS | | Document</title></paragraph> | | <intro> | |<REF>2</REF><paragraph><indent>3</indent>This is a | | sample <b>MECS</b> document which is intended to | | demonstrate the use of currently available | | <b>MECS</b> software.</paragraph> | | </intro> <!--From EX2: --> | |<REF>3</REF><paragraph> | | <s>We | | <s>will see | | <s>some examples</s> of recursive codes, of | | <b>elements <u>which</u></b><u> overlap</u>, of</s> | | special characters<ind> like &reverse_A;, | | &reverse_E;, &reverse_E.exist;, and &qEq.exist;, | | and of<l> substitutions in | | <note>simplified</note></s> MECS-WIT | | style:</paragraph> | | <example> | |<REF>4</REF><paragraph><ind> | | <s>Ich besuche gern das alte <i>kleine</i> | | <p_s><p_el>Schloß</p_el><p_el><i>Haus</i></p_el></p_s> | | meines <p_s><p_el>Onkels.</p_el><p_el><i>Vaters.</i> | | </p_el></p_s></s> <ind> | | <s>Ich besuche gern das <p_s><p_el>alte Schloß meines | | Onkels</p_el><p_el><i>kleine Haus</i> meines | | <i>Vaters</i></p_el></p_s></s> </paragraph> | | </example> | |<REF>5</REF><paragraph>This is the end of our | | <note>very artificial</note> example.</paragraph> | |</MECSDOC> | +------------------------------------------------------------+This is a well-formed SGML document instance. However, the document was originated in MECS and therefore does not contain any SGML DTD. Multi-element tags have been converted to SGML elements, and character disambiguation codes have been merged with character representation codes to form SGML entities. Moreover, the document originally contained overlapping elements and MECSSGML has enforced a hierarchical structure on the output file (cf. 3.). (By omitting the last parameter 'R' you might have instructed MECSSGML not to do this, but you would then have had to design at least two DTDs and use the CONCUR feature of SGML in order to make the document completely SGML-conforming.)
Consequently, so much of the information in DOC1 may have been distorted or even lost in the conversion to SGML that there may be no way you can automatically convert DOC1SGML back to MECS and obtain a result equivalent to the document you started with.
In sum, this calls for the following precautions: if you use the MECS Program Package to process documents with the intention to convert them to SGML without loss or distortion of information, you should:
£ < > < / / > £ £ ! £ £ £ £ £ { £ £ }3.8 Project management
If you work with the MECS Program Package for a while, you will probably build up a number of CDTs, each of which contains a substantial number of code declarations. You will tend to define in the course of your work a corresponding or probably larger number of PDTs, and your document files may become numerous, many of them larger than just a couple of hundred Kb.
At some stage of this process you may easily loose control unless you have taken steps to prevent things from getting out of hand.
Large CDTs and PDTs are most conveniently created, maintained and documented by means of a database program. Almost any database program will serve the purpose, as long as it enables you to output your CDTs and PDTs in flat ASCII files. Maintaining them in a database has the additional advantage of enabling you to add vital information such as free text descriptions of application criteria for codes, examples of usage etc.
The Registration Standard of the Wittgenstein Archives at the University of Bergen, MECS-WIT (Huitfeldt 1997), provides an example. It is stored in a database, and the CDT, all PDTs, as well as the so-called Code Book, which includes a full description of all codes, an alphabetical summary etc., are output from this database.
CDTs and PDTs are most conveniently held in a separate directory included in the path string of your AUTOEXEC.BAT file. Both MECSVAL and MECSPRES will retrieve all files in path directories.
The MECSVAL editor is not a swapping editor, i.e. it is unable to edit files larger than the size of conventional DOS memory available. Since MECSVAL also holds the entire CDT in memory, the amount of memory available to the editor will depend on the size of your CDT.
A simple solution, if you run out of memory, is to use another editor. Any editor which enables you to output your files in so-called flat ASCII files with a maximum line length of 255 bytes may be used.
The MECS Program Package also provides another solution: you may input a sequence of document files to any of the programs by replacing the command line input file name with a slash immediately followed by the name of a file which contains a list of the document files you intend for processing. The programs will create *.ERR-files if errors are encountered.
There is no limit to the number of files you may include in such an input file list, but each file must be a well-formed MECS document on its own. E.g., you cannot include the start tag of a code in one file and the end tag of the same code in a succeeding file. If the files contain MECS headers, their headers should be identical.
Some of the programs in the package require or accept a large number of command line parameters. Within one and the same project most of the parameters will often or always be identical, and several programs will be run over and over again in identical sequences. This naturally calls for batch processing.
Let us assume that the current path string contains the directories C:\MECS and C:\MECSUTIL, that the directory C:\MECS stores the programs of the MECS Program Package, that the directory C:\MECSUTIL stores the files PROJECT.CDT, PRO-N.PDT, PRO-D.PDT, PRO-BETA.PDT and MYJOB.BAT, and that the contents of MYJOB.BAT is:
echo off if exist %1.err del %1.err MECSVAL PROJECT.CDT /%1 %1.LOG if exist %1.err goto end MECSFORM /%1 / 75 R paragraph s 3 REF 1 if exist %1.err goto end MECSLYSE /%1 /%1.tr1 o paragraph R if exist %1.err goto end MECSLYSE /%1 /%1.tr2 - - O REF if exist %1.err goto end MECSPRES PRO-BETA.PDT /%1 TEMPFILE.TMP B A I if exist %1.err goto end BETATXT TEMPFILE.TMP %1.BET if exist %1.err goto end DEL TEMPFILE.TMP MECSPRES PRO-N.PDT /%1 /%1.ANW N W I if exist %1.err goto end MECSPRES PRO-D.PDT /%1 /%1.BDW D W I if exist %1.err goto end MECSSGML /%1 /%1.GML :end if exist %1.err type %1.err if not exist %1.err echo Normal terminationLet us further assume that the directory C:\MECSDOC stores the MECS document files DOC1, DOC2 and DOC3, that the current directory stores the file DOC, and that the contents of DOC is:
C:\MECSDOC\DOC1 C:\MECSDOC\DOC2 C:\MECSDOC\DOC3With the assumed configuration, the command
4.1 General Features and Command Line Parameters
All files input to the programs in the MECS Program Package must be so- called flat ASCII files with a maximum line length of 255 characters. Some of the programs may create output files of different formats and line lengths, depending on specifications given by the user. Maximum length of generic identifiers is 50 characters for MECSVAL, 30 characters for the other programs.
All programs accept or require a number of command line parameters. Optional parameters may be omitted or replaced by a dash. The following conventions are used to indicate the format of such parameters:
MECSVAL should be used to check documents for syntax errors before they are treated by any other program in the package. The other programs may produce unpredictable results if used on documents which are not MECS- conforming. (ALPHATXT and BETATXT are the only exceptions from this rule.)
MECSVAL is the only program in the package accepting documents in so- called SGML mode. All other programs can only be used with documents which begin with a MECS header (except MECSPRES), which are fully MECS- conforming, and which do not require parsing in SGML mode. (ALPHATXT and BETATXT are the only exceptions from this rule.)
All the programs in the package create so-called 'err-files' if errors are encountered. I.e., if a program reports an error, it also creates a new file with the same name as the current input file and the file name extension 'err'. If the err-file already exists, the error message will be appended to the existing err-file.
All programs may create temporary files called 'TEMPFILE.*' or '*.TMP' during execution. Normally, such temporary files are deleted before the program terminates, but if the program terminates unnormally they may still remain in the local directory.
MECSVAL is a validating parser-editor for MECS version 2 documents. The program checks Code Declaration Tables (CDTs) and documents for MECS conformance, and deduces CDTs from MECS-conforming documents. The program may be run either in batch mode or in interactive mode.
Usage:
MECSVAL file_in read_file file_out SGML STRICT (RED|NORED)
7|com
The simplest way to start MECSVAL in interactive mode is to enter the command MECSVAL and press return. MECSVAL's main menu looks like this:
+------------------------------------------------------------+ |C:\MYDIR Mem: 433018 MECSVAL version 2.01| +------------------------------------------------------------+ |L LOG: I Info 1 List directory | |C CDT: S Switches 2 Change directory| |T TXT: M Create Minimal CDT 3 Copy file | |E EDT: D Check CDT 4 Print file | |Q Quit V Check CDT and TXT 5 Delete file | +------------------------------------------------------------+The following commands and options are available from the main menu:
The menu options 1-5 in the rightmost menu column provide access from MECSVAL to some of the most frequently used DOS operating system commands, and should be self-explanatory.
Many of the above-mentioned file specifications and other options may be initialized from the command line. MECSVAL accepts 1-7 command line parameters:
1 | file_in |
2 | read_file |
3 | file_out |
4, 5, 6, 7 |
If the third command line parameter is omitted (or replaced by a dash), MECSVAL enters interactive mode, and the effect of any other command line parameter is to initialize the corresponding option listed on the main menu.
The command
MECSVAL
simply starts MECSVAL; while
MECSVAL MYDEF.CDT MYTXT.TXT
starts MECSVAL with MYDEF.CDT as default CDT and MYTXT.TXT as default
document file; and
MECSVAL MYDEF.CDT MYTXT.TXT - 7 STRICT
starts MECSVAL with MYDEF.CDT as default CDT and MYTXT.TXT as default
document file. 8-bit ASCII is not allowed, and all codes must be hierarchically
nested.
MECSVAL - MYTXT.TXT
deduces a minimal CDT called MYTXT.CDT (any existing file by the name
MYDEF.CDT will be overwritten) from MYTXT.TXT, and starts MECSVAL with
MYTXT.CDT as default CDT and MYTXT.TXT as default document file.
The program will enter batch mode if and only if a log file name (third command line parameter) has been specified. Any existing file with the same name will be overwritten.
The command
MECSVAL MYDEF.CDT MYTXT.TXT MYTXT.LOG NORED 7
makes MECSVAL read the CDT MYDEF.CDT and check it for errors. If no
errors are found in the CDT, the document file MYTXT.TXT is checked against
the declarations given in MYDEF.CDT. Status information and possible error
messages are written to MYTXT.LOG. Any markup reduction and ASCII values
above 127 will produce error messages.
MECSVAL - MYTXT.TXT MYTXT.LOG
makes MECSVAL read the document file MYTXT.TXT and write MYTXT.TXT's
minimal CDT to a new file, MYTXT.CDT (note that any existing file with
the same name will be overwritten). Status information and possible error
messages are written to MYTXT.LOG.
The command
MECSVAL MYDEF.CDT /MYTEXTS MYTEXTS.LOG
makes MECSVAL read and check the CDT MYDEF.CDT, and then read and check
all files listed in the file MYTEXTS. Status information and possible error
messages are written to MYTEXTS.LOG.
The file MYTEXTS must consist of a list of document file
names, each file name on a separate line, e.g. like this:
C:\MYDIR\MYFIRST
C:\MYDIR\MYSECOND
C:\MYDIR\MYLAST
If drives and directories are not specified, MECSVAL will assume that
the files are to be found in the current drive and directory.
If a minimal CDT is deduced (i.e., the first parameter
is a dash), the first of the files listed should contain a MECS header,
and the other files should contain either an identical header or no header
at all.
If and only if errors are found MECSVAL will also write a brief message to a file with a name identical to the document file and the extension '.ERR'. If the file already exists, the message will be appended to it.
Character left | Left arrow or Ctrl+S |
Character right | Right arrow or Ctrl+D |
Word left | Ctrl+Left arrow or Ctrl+A |
Word right | Ctrl+Right arrow or Ctrl+F |
Line up | Up arrow or Ctrl+E |
Line down | Down arrow or Ctrl+X |
Scroll up | Ctrl+W |
Scroll down | Ctrl+Z |
Page up | PgUp or Ctrl+R |
Page Down | PgDn or Ctrl+C |
Beginning of file | Ctrl+PgUp or Ctrl+Q R |
End of file | Ctrl+PgDn or Ctrl+Q C |
Beginning of line | Home or Ctrl+Q S |
End of line | End or Ctrl+Q D |
Top of screen | Ctrl+Home or Ctrl+Q E |
Bottom of screen | Ctrl+End or Ctrl+Q X |
Go to line | Ctrl+J L |
Go to column | Ctrl+J C |
Top of block | Ctrl+Q B |
Bottom of block | Ctrl+Q K |
Jump to marker | Ctrl+Q 0..Ctrl+Q 9 |
Set marker | Ctrl+K 0..Ctrl+K 9 |
Previous cursor position | Ctrl+Q P |
New line | Enter or Ctrl+M |
Insert line | Ctrl+N |
Insert control character | Ctrl+P |
Tab | Tab or Ctrl+I |
Delete current character | Del or Ctrl+G |
Delete character left | Backspace or Ctrl+H |
Delete word | Ctrl+T |
Delete to end of line | Ctrl+Q Y |
Delete line | Ctrl+Y |
Find pattern | Ctrl+Q F |
Find and replace | Ctrl+Q A |
Find next | Ctrl+L |
Abandon file | Ctrl+K Q |
Save and continue edit | Ctrl+K S |
Save and exit | Ctrl+K X or F2 |
Save to file | Ctrl+K N |
Add window | Ctrl+O A or Shift+F3 |
Next window | Ctrl+O N or F6 |
Previous window | Ctrl+O P or Shift+F6 |
Resize current window | Ctrl+O S |
Begin block | Ctrl+K B or F7 |
End block | Ctrl+K K or F8 |
Copy block | Ctrl+K C |
Move block | Ctrl+K V |
Delete block | Ctrl+K Y |
Hide block | Ctrl+K H |
Mark current word as block | Ctrl+K T |
Read block from file | Ctrl+K R |
Write block to file | Ctrl+K W |
Toggle insert mode | Ctrl+V or Ins |
Toggle autoindent mode | Ctrl+Q I |
Toggle marker display | Ctrl+K M |
Change directory | Ctrl+J D |
Show version | Ctrl+J V |
Show available memory | Ctrl+J R |
Set undo limit | Ctrl+J U |
Set default extension | Ctrl+J E |
Abort command | Ctrl+U |
Undo last deletion | Ctrl+Q U |
Restore line | Ctrl+Q L |
MECSFORM is a code formatter for MECS version 2 documents. The program formats MECS-conforming documents by either extending, retaining, or reducing codes, removing trailing blanks and trailing blank lines, optionally indenting specified codes and/or inserting specified reference codes.
Usage:
read_file write_file # (E|R|X) gi gi # gi # gi gi
The program accepts 1-11 command line parameters:
1 | read_file | (required) |
2 | write_file | (optional) |
3 | # | (optional) |
4 | (E|R|X|-) | (optional) |
5 | gi | optional |
6 | gi | optional |
7 | # | required if 5 and/or 6 are set. |
8 | gi | optional if parameter no 5 set |
9 | # | required if parameter no 8 set |
10 | gi | optional if parameter no 8 set |
11 | gi | optional if parameter no 7 set |
Examples
The command
MECSFORM MYFILE NEWFILE 60 R
reads MYFILE and creates a new file called NEWFILE, with a maximum
line length of 60 characters, removes trailing blanks and trailing blank
lines, and reduces codes to their minimal form wherever possible.
The command
MECSFORM /MYFILES /NEWFILE 78 E
reads all files specified in the file MYFILES and writes all output
to one file called NEWFILE, which will be overwritten without notice if
it already exists. The document is formatted to a maximum line length of
78, and all reduced codes are extended.
The command
MECSFORM /MYFILES / 65 E sec s 2 R 1 doc
reads and overwrites all files specified in the file MYFILES, with
a maximum line length of 65 characters, removes trailing blanks and trailing
blank lines, extends all reduced codes to their full form, inserts linebreaks
before every one-element s- and sec-codes, indents all s- and sec- elements
2 characters, inserts one-element 'R'-codes containing numbers (incrementally,
starting with 1) immediately before every sec-code, and resets the reference
numbering sequence to 1 after every occurrence of a 'doc'-code.
MECSLYSE is a document analyzer for MECS version 2 documents. The program analyzes relationships between the codes of a MECS-conforming document and allows the user to define breakpoints, list all recursive or overlapping codes, and create a tabulated list of the structure of the encoded elements of a document.
Usage:
read_file write_file type gi (R|O) gi
The program accepts 1 - 8 command line parameters:
1 | read_file | required |
2 | write_file | optional |
3 | type | optional |
4 | gi | (required if parameter no 3 set |
5 | (R|O) | optional |
6 | gi | optional |
7 | gi | optional if parameter no 5 set to 'O' |
8 | gi | optional if parameter no 5 set to 'O' |
Examples
MECSLYSE MYFILE MYFILE.OUT - - O
writes a list of all overlapping codes in MYFILE to MYFILE.OUT
MECSLYSE /MYFILES /RESULTS o sentence R ref
reads all files specified in the file MYFILES and writes all output
to one file called RESULTS, which will be overwritten without notice if
it already exists. RESULTS will contain a list of all codes active at start
and end-points of the one-element code 'sentence', all occurrences of recursive
codes, and a code list segmented at all occurrences of the one-element
code 'ref'.
MECSGRAB is an element extraction program for MECS version 2 documents. The program 'grabs' specified elements from a document and prints them and/or their line and column reference numbers in a separate file.
Usage:
read_file write_file type gi (AKRT)
The program accepts 5 command line parameters:
1 | read_file | required |
2 | write_file | optional |
3 | type | required |
4 | gi | required |
5 | (AKRT) | required |
for examples of usage of MECSGRAB.
MECSPRES is a reformatter for MECS version 2 documents. The program reads a profile definition table and a MECS-conforming document, and produces a new text formatted according to the profile definition table.
Usage:
file_in read_file write_file (layout) (format) (C|D|I)
(style) (#|M) (title)
4.6.1 Profile Definition Table (PDT)
A profile definition table (PDT) consists of a MECS header, a list of code type indicators, a declaration of valid output characters, optionally a list of fixed notes, and finally a series of profile definitions for individual codes.
A PDT should always start with a MECS header. The header should be followed by a declaration of six code type indicators. The code type indicators are identical to the code type indicators found in Code Declaration Tables (), and in the rest of the PDT they serve to reference codes by indicating their type followed by their generic identifier.
The code type indicators are followed by a blank indicator. In the rest of the PDT the blank indicator serves to indicate blank characters for output to the reformatted text.
The blank indicator should be followed by a string declaring valid output characters for output in alpha-format (). If this string is replaced by a nil indicator, all input characters are valid output characters.
Optionally, the PDT part may also contain declarations of so-called fixed notes. Fixed notes are strings which may be listed in the beginning of the output text for later reference by letter indexes, or printed in notes. Fixed notes may be referenced with letters 'a'-'u'. (I.e., maximum number of fixed notes is 20.)
In the following example PDT
£ < > < / / > [ / | \ / | / ] { " / } n o # p r d _ £ # a Linker_Rand # b Rechter_Rand o comment b bi Comment:_ £ ! £ £ o doc e b £ £ £ £ £ o vpline n £ £ {#6#39} £ £ £ p blort 3 b|u|i -> / <- b a r GAMMA £ £ £ {#8#6} £ £ £the first line contains a MECS header. The second line declares the six code type indicators 'n', 'o', '#', 'p', 'r' and 'd', followed by a declaration of the blank indicator, '_'. The third line contains a declaration of the valid output characters (in this case nil they are nil, which means that all characters are valid output characters, ). The next two lines begin with a numeric indicator, which indicates that they declare fixed notes (note 'a' and 'b', with the values 'Linker_Rand' and 'Rechter_Rand', respectively). The last five lines of the table define the profiles for the one-element codes 'comment', 'doc', 'and 'vpline', the poly-element code 'blort' and the character representation code 'GAMMA'.
Each profile definition references a code and declares its values for seven parameters. These parameters are called Position, Mode, MarkIn, MarkDel, MarkOut, NoteNumber and NoteType.
The following line from the above example PDT contains a code type and a generic identifier identifying the code in question, followed by values for the seven parameters:
o comment b bi Comment:_ £ ! £ £This definition says that a one-element code ('o') with the generic identifier 'comment' should be printed in a note ('b'), in bold italics ('bi'), that it should be preceded by the string 'Comment: ' and succeeded by an exclamation mark. Since '£' is the nil indicator in this example PDT, the character '£' in the fourth, sixth and seventh parameter positions indicate that the code is not given any value for these parameters.
With this definition, the input text
xxx <comment/bla bla bla> xxxwill be printed like this:
The first parameter, Position, accepts single character values. Since positions are mutually exclusive, only one position can be declared for any code. The type of the code in question, the layout and the format decide which values are available, and what they mean - .
Text may be output to five different buffers: The main buffer, the note buffer, the main line buffer, the left margin buffer and the right margin buffer.
The value of the Position parameter decides which buffer the contents of a code should be sent to. By default, all output is sent to the main buffer, unless otherwise indicated by the Position value.
The Position value may also be used to decide the relative position of a text element within the main buffer, e.g. text may be indented, centered, aligned with right margin, printed in tables or columns etc.
and ## for further details.
The second parameter, Mode, accepts the following values:
The way in which all except the last six of these modes are represented will depend on the format chosen for the output file (). Only in one of the available formats, i.e. WordPerfect 5.1 format, will all modes be distinguished from each other.
Modes are not mutually exclusive, and therefore a combination of modes may be declared for a code by giving the mode parameter a string value. For multi-element codes different modes may be assigned to different elements by delimiting the mode indicators by bars. E.g. the following profile definition:
p blort £ b|u|bi £ £ £ £ £says that for the poly-element code with the generic identifier 'blort' all elements should be printed to the current buffer, the first element in bold ('b'), the last in bold italics ('bi'), and any elements between the first and the last should be underlined ('u'). With this definition, the input text
[blort/4| first | second | third | fourth ]will be printed like this:
Markers and note indices are always printed in Systems Mode and
Note Reference Mode, which are decided by layout and format, - cf. ##.
4.6.1.5 MarkIn, MarkDel and MarkOut
The third, fourth and fifth parameters, i.e. MarkIn, MarkDel and MarkOut, are strings which are printed respectively before, between, or after the elements of a code. (This is the general rule, for exceptions).
MarkIn, MarkDel and MarkOut are printed in Systems Mode, - cf. ##.
Characters not included in the ASCII character set may be indicated by means of a convention borrowed from WordPerfect 5.1: &A.-35-n-35-nnn;, where n is the WordPerfect character set number and nnn the character number.
In other than WordPerfect formats, characters indicated in this way will be printed as the corresponding ASCII character, or, if no corresponding ASCII character exists, as '•' (ASCII 254).
E.g. the following profile definition:
p blort 3 £ -> / <- £ £says that for the poly-element code with the generic identifier 'blort' all elements should be printed to the current buffer. An arrow pointing right should be printed before the first element, a slash between each element, and an arrow pointing left after the last element. With this definition, the input text
[blort/4| first | second | third | fourth ]will be printed like this: -> first / second / third / fourth <-
4.6.1.6 NoteNumber and NoteType
The sixth parameter, NoteNumber, refers to one of the fixed notes declared in the beginning of the PDT ().
The seventh parameter, NoteType, decides how the fixed note will be indicated in the output text:
E.g. the following profile definition:
p blort 3 £ £ £ £ b a
says that for the poly-element code with the generic identifier 'blort'
all elements should be printed to the current buffer. The text should also
reference the fixed note b in style a.
For example, the input text
[blort/4| first | second | third | fourth ]will be printed like this:
4.6.2 Declaration of Codes of Different Types
In general, the type of the code in question, the layout and the format, decides which values are available for each of the seven parameters in a code's profile definition, and what they mean.
Global features are features which are decided by command line parameters given to MECSPRES. They affect the general layout and format of the output file, and sometimes also the effect of certain profile definition parameters as well.
The layout of the output document is selected by a command line parameter ().
Six different layouts are available - they are simply called layouts B, C, D, N, P, and X.
As explained above (), text may be output to five different buffers: The main buffer, the note buffer, the main line buffer, the left margin buffer and the right margin buffer. The layout decides whether and how these buffers are printed in the output file. E.g., in layouts D and X the buffers are laid out as follows:
+•Main line buffer | +•Left margin buffer | | +• Main buffer | | | +•Right margin buffer | | | | • • • • ----++---++----------------------------++---+ ----+| || || | | || || | | || || | | || || | +--------------------------------------+ | | | | | | • Note buffer•+Other layouts are printed similarly, with the following exceptions:
If the left margin width is less than 4 or there is no maximum line length, left and right margin buffers are suppressed.
In some layouts tabulators are represented as blanks, in others as tabulator codes or marks appropriate to the selected output file format. If the latter case, the number of tabs is rounded off to a preset interval value.
In some layouts character disambiguation codes are ignored, in others character representation codes are ignored if they occur in conjunction with character disambiguation codes, and in some layouts both character representation codes and character disambiguation codes are printed.
MarkIn, MarkDel, and MarkOut are always printed in Systems Mode, character disambiguation codes are printed in Systems Mode in some layout.
The layout also affects default values for certain other general features, such as style, maximum line length and positioning of notes.
With some layouts the note buffer is printed at the end of the output file, with others it is printed after each main buffer ().
It should be noted that also certain formats affect the positioning of text in the output file, . (E.g. fields 2-4 of the main line buffer are only available with format F, while in format A all other buffers than the main buffer are unavailable.
The effects of the various layout values can be summarized as follows:
Layout value D B N P C X Available buffers Main line field 1 X X X X X X Main buffer X X X X X X Left margin buffer X X Right margin buffer X X Note buffer X X X X X X Layout Feature Systems mode b b b - b b Left margin width 4 0 0 0 0 4 Tabs printed as Blank Tab Tab Tab Blank Blank Tabs roundoff value 1 5 4 3 1 1 Character codes printed rep dis dis dis both rep Character disambiguation codes in systems mode N Y N Y Y N Default notes position End End End Main Main End Default style CR13 CR13 PL14 PL12 CR13 CR13 Default max. line length 54 - - - 78 70In layout D and format W, the selected style affects the default maximum line length: With CR12 maximum line length is 57, with CR13 54, and with CR14 48.
Layout X affects the way MECSPRES interprets profile definitions in the PDT in a rather special way: Only codes which have been given the value '1' (excerpt mode) are printed in the output file, and all other parts of the input document are suppressed.
The format of the output document is selected by a command line parameter ().
MECSPRES layouts and profile definitions provide access to various text buffers and text fonts and styles. However, not all buffers, fonts and styles are available in all output formats.
E.g. while bold characters can easily be printed in most word processor formats, it is not possible to print bold characters in flat ASCII files. Therefore, a text element which is assigned mode 'b' (bold) in a PDT will be printed in bold if WordPerfect format is chosen; but if the reformatted file is output in flat ASCII format, the same text will be printed without any indication whatsoever that it was assigned a bold value.
The availability of other modes is limited in some of the formats, and their kind of realization may also vary between formats.
Format A (alpha format)
This format has been designed to facilitate preparation of files for input to the program MECSSPEL ().
Alpha format suppresses all other buffers than the main buffer, and prints the contents of the main buffer with one word per line, each word being preceded by point or file name, line number and column number.
If a string of valid alpha characters has been defined in the PDT (), only characters included in this string will be output to the reformatted text file, - all other characters will be suppressed.
There are some position values for one- and multi-element codes which are affected by the choice of alpha format, - .
In addition to the generally available modes mentioned above, the following modes are available:
Mode | Realization | |
d | overstrike | < ... > |
w | redline | < ... > |
z | superscript | ^ |
Format B (beta format)
This format has been designed to facilitate preparation of files for input to the program MECSPRES ().
There are some position values for one- and multi-element codes which are affected by the choice of beta format, - .
In addition to the generally available modes mentioned above, the following modes are available:
Mode | Realization | |
d | overstrike | < ... > |
w | redline | < ... > |
z | superscript | ^ |
Note indices are printed in superscript, i.e. in the form '^#', where # is the note number.
Format C (plain ASCII format)
This format has been designed to facilitate preparation of files in so-called flat ASCII or DOS format.
In addition to the generally available modes mentioned above, the following modes are available:
Mode | Realization | |
d | overstrike | < ... > |
w | redline | < ... > |
z | superscript | ^ |
Note indices are printed in superscript, i.e. in the form '^#', where # is the note number.
Format S (screen display format)
This format has been designed for previewing of output on screen in text mode. (By replacing the third command line parameter with a dash, output is sent to the screen, and ##).
In addition to the generally available modes mentioned above, the following modes are available:
Mode | Realization | |
d | overstrike | < ... > |
w | redline | < ... > |
u | underline | underlined |
v | double underline | underlined |
b | bold | intense video |
z | superscript | inverse video |
Note indices are printed in superscript, i.e. inverse video.
Format M (MECS-like markup format)
This format has been designed to facilitate preparation of ASCII files with presentational markup suited for further processing by other programs, e.g. word-processor macros.
In addition to the generally available modes mentioned above, all other modes are available. Text elements printed in these modes are marked '<x/ ... /x>', where x is the relevant MECSPRES mode indicator (). All such elements will also be delimited by end and start tags at line endings and buffer limits.
Note indices are printed in superscript, i.e. in the form '<z/#/z>', where # is the note number.
Format F (FolioViews markup format)
This format has been designed to facilitate preparation of files for reading and processing by Folio corporation's desktop publishing program Views, version 2.1.
FolioViews format prints the main line buffer as a separate line, starting with a FolioViews folio marker and a FolioViews group marker containing a replica of the contents of field 1.
Fields 2-4 of the main line buffer are available only with this format. Field 1 is printed in the left margin, field 2 aligned with the left margin, field 3 centered, and field 4 aligned with the right margin.
There are some position values for one- and multi-element codes which are affected by the choice of FolioViews format, - .
In addition to the generally available modes mentioned above, the following modes are available:
Mode | Realization | |
d | overstrike | < ... > |
w | redline | < ... > |
u | underline | underlined (red) |
v | double underline | underlined (red) |
b | bold | bold (blue) |
z | superscript | ^.^.^. |
Note indices are printed in superscript, i.e. in the form '^#', where # is the note number.
Format W (Word-Perfect 5.1 format)
WordPerfect format is the only format in which all modes and styles are both available and realized as indicated by the various modes' and styles' names. (Mode b, 'bold', is printed in bold, mode i, 'italics', is printed in italics etc.) Note indices are printed in superscript.
MECSPRES accepts 2 - 9 command line parameters:
1 | file_in | (required) |
2 | read_file | (required) |
3 | write_file | (optional) |
4 | layout (B|C|D|N|P|X) | (optional) |
5 | format (A|B|C|F|S|M|W) | (optional) |
6 | (C|D|I) | (optional) |
7 | style (CR|PL)(12|13|14) | (optional) |
8 | (#|M) | (optional) |
9 | title (string) | (optional) |
The command
MECSPRES - MYFILE
reformats MYFILE in layout D and displays output on the computer screen
in flat Ascii format, ignoring all codes.
MECSPRES DIPLO.PDT MYFILE NEWFILE
reformats MYFILE according to the profile definition table DIPLO.PDT
in layout D and writes output to the file NEWFILE in "flat" ASCII format.
Undeclared codes are ignored.
MECSPRES NORM.PDT /MYFILES NEWFILE N W D
reformats all files listed in MYFILES according to the profile definition
table NORM.PDT in layout N and writes output to the file NEWFILE in WordPerfect
5.1 format. Undeclared codes are suppressed.
MECSPRES PROOF.PDT MYFILE /NEWFILE P W C PL13 - N
reformats MYFILE according to profile definition table PROOF.PDT in
layout P and writes output to the file NEWFILE in WordPerfect 5.1 format,
in Palatino 13 with no maximum line length and no title page. If NEWFILE
already exists, it will be overwritten. Undeclared codes are printed as
codes.
MECSPRES BASE.PDT /MYFILES /NEWFILE B F I - 65
reformats all files listed in /MYFILES according to the profile definition
table BASE.PDT in layout B and writes output to the file /NEWFILE in Folio
Views markup format, with maximum line length 65. If NEWFILE already exists,
it will be overwritten. Undeclared codes are ignored.
MECSPRES DIPLO.PDT MYFILE - D S
reformats MYFILE according to the profile definition table DIPLO.PDT
in layout D and displays output on the computer screen in Screen display
format. Undeclared codes are ignored.
MECSBETA is a document analysis program for MECS version 2 documents. The program computes and prints the betatexts of an input document.
Usage:
MECSBETA file_in read_file file_out
Roughly, a betatext is a text resulting from excluding all except one of the elements of specific multi-element codes, defined as substitutions. All the betatexts of a particular document are generated by systematically varying which element to include from each substitution, until all possible combinations have been exhausted. Cf. ## for a fuller explanation of the concept of a betatext.
Since the number of betatexts generated by a document may be very large, MECSBETA allows the user to define reference points within the document, divide the document into segments, and generate all possible betatexts for each segment separately.
MECSBETA requires a profile definition table (). All the usual profile defintions are available in this PDT. In addition, the PDT should define certain one-element codes as reference and segmentation codes, and certain multi-element codes as substitutions. This is indicated by the position values of the codes in question, as follows:
+------------------+-----------------+-----------------------+ | | Code type | PDT position value | +------------------+-----------------+-----------------------+ |Reference code | one-element | 1 or w | +------------------+-----------------+-----------------------+ |Segment code | one-element | 2 | +------------------+-----------------+-----------------------+ |Substitution code | multi-element | b | +------------------+-----------------+-----------------------+MECSBETA excerpts all and only segments containing substitions, prints their references and displays them as follows: If the contents of the last preceding reference is different from the previous reference printed, the contents of the reference will be printed on a separate line. The part of the segment which precedes the first substitution of the segment will be printed on a separate line, followed by each betatext generated from the substitutions of the segment, each on a separate line starting with '->', followed by the part of the segment succeeding its last substitution on a separate line. If a substitution crosses segment borders all segments containing the substitution will be treated as one segment.
MECSBETA accepts 3 command line parameters:
1 | file_in | required |
2 | read_file | required |
3 | file_out | required |
MECSBETA is a batch program calling two of the other programs included in the MECS program package. Thus, MECSBETA can be included in another batch program either by a CALL command, or by including a copy of it in the other batch program. The contents of MECSBETA.BAT is:
echo off if exist %2.err del %2.err MECSPRES %1 %2 /TEMPFILE. B B I if exist %2.err goto end BETATXT TEMPFILE. %3 DEL TEMPFILE :endExamples
With the following PDT, MYBETA.PDT:
£ < > < / / > [ / | £ / | / ] £ £ £ £ n o £ p £ £ £ £ o R 1 £ £ £ £ £ £ o s 2 £ £ £ £ £ £ p s b £ £ £ £ £ £and the following document, MYDOC:
<R/1/R> <s/xxx xxx/s> <R/2/R> <s/xx xx [s/2|pp pp/s|qq qq/s] yy yy/s> <R/3/R> xxxxx <s/yy yyy/s> <R/4/R> mmm <s/lll/s> <s/ttt/s> <s/xx [s/2|aa [s/2| bb/s|cc/s] dd /s| ee /s] ff [s/2|gg /s| hh/s] yy/s> <s/mmm ttt/s>the command
2 xx xx ->pp pp ->qq qq yy yy 4 xx ->aa bb dd ff gg ->aa bb dd ff hh ->aa cc dd ff gg ->aa cc dd ff hh -> ee ff gg -> ee ff hh yy ----------------------------------- Beta: 124.8 BETATXT
Like MECSBETA, BETATXT is a document analysis program which computes and prints the betatexts of an input document. Unlike MECSBETA, however, BETATXT requires input not in MECS format but in a special format called beta format.
Usage:
BETATXT file_in file_out A
Beta format files are flat ASCII files containing special markers for references, segments and substitutions. Therefore, documents will normally require preprocessing by some other program before they can be input to BETATXT. From MECS documents such preprocessing can be done by means of MECSPRES (). Irrespective of how the input document is created, however, BETATXT expects to find the following beta markers:
+--------------------------+--------+-------+ | | Default|Alter- | | | ASCII|native | | | value|value | +--------------------------+--------+-------+ |Reference start | 1 | { | |Reference end | 2 | } | +--------------------------+--------+-------+ |Segment start | 3 | < | |Segment end | 4 | > | +--------------------------+--------+-------+ |Substitution start | 5 | [ | |Substitution delimiter | 6 | | | |Substitution end | 7 | ] | +--------------------------+--------+-------+BETATXT accepts 2 - 3 command line parameters:
1 | file_in | required |
2 | file_out | required |
3 | A | optional |
Examples
For example, with the following input file BETAFORM:
{1} <xxx xxx> {2} <xx xx [pp pp|qq qq] yy yy> {3} xxxxx <yy yyy> {4} mmm <lll> <ttt> <xx [aa [ bb|cc] dd | ee ] ff [gg | hh] yy> <mmm ttt>The command
2 xx xx ->pp pp ->qq qq yy yy 4 xx ->aa bb dd ff gg ->aa bb dd ff hh ->aa cc dd ff gg ->aa cc dd ff hh -> ee ff gg -> ee ff hh yy ----------------------------------- Beta: 124.9 MECSSPEL
MECSSPEL is an interactive spell checking program for MECS version 2 documents.
Usage:
MECSSPEL file_in file_in file_in
MECSSPEL reads a master word list, a PDT and a MECS document. If the program encounters a word in the document which is not included in the master word list (i.e. a "new" word), the user is prompted to reject or accept the new word. Finally, the program produces three separate output files - one containing new accepted words, one containing new rejected words, and one containing statistical information on the document.
MECSSPEL calls the reformatting program MECSPRES in alpha format (). All the ususal profile definitions are available in the PDT input to MECSSPEL. It should be noted that some mode and position values are provided especially for alpha format, or have special functions in this format:
The following modes are of particular relevance to spell checking ():
MECSSPEL also allows the user to specify that the contents of certain codes should be regarded as phrases (even though they contain word delimiters). In addition, the PDT may define certain codes as section or segment codes used in the statistical calculations performed by the program. This is indicated by the PDT position values of the codes in question, as follows ():
+------------------+-----------------+-----------------------+ | | Code type | PDT position value | +------------------+-----------------+-----------------------+ |Section code | one-element | e | +------------------+-----------------+-----------------------+ |Segment code | one-element | 2 | +------------------+-----------------+-----------------------+ | | one-element | 3 | |Phrase code +-----------------+-----------------------+ | | multi-element | p | +------------------+-----------------+-----------------------+MECSSPEL requires 3 command line parameters:
1 | file_in | required |
2 | file_in | required |
3 | file_in | required |
MECSSPEL is a batch program calling two of the other programs included in the MECS program package. Thus, MECSSPEL can be included in another batch program either by a CALL command, or by including a copy of it in the other batch program. The contents of MECSSPEL.BAT is:
echo off if exist %3.err del %3.err MECSPRES %2 %3 %3.TMP B A I if exist %3.err goto end if not exist %3.TMP goto end ALPHATXT -R - %1 - %3.TMP %3.WL %3.CHK %3.STS del %3.TMP echo. echo Accepted words on %3.WL echo Rejected words on %3.CHK echo Statistics on %3.STS :endfor examples of usage of MECSSPEL.
Like MECSSPEL, ALPHATXT is a program which may be used for interactive spell checking of documents. Unlike MECSSPEL, however, ALPHATXT may also perform certain additional tasks, such as the production of word lists sorted according to user-defined character sort criteria, frequency word lists, and simple statistical analyses. ALPHATXT accepts input files in so-called flat ASCII format as well as alpha format ().
Usage:
ALPHATXT (ACEFILNORS) file_in file_in file_in file_in
file_out file_out file_out
Although ALPHATXT does not itself accept input in MECS format, the program has been developed precisely to satisfy the need for spell- checking and vocabluary control on MECS documents. Ordinary spell checkers are not able to distinguish markup from content and are therefore not suitable for use directly on marked-up documents. The normal procedure is rather to spell check reformatted versions of marked-up documents. However detecting errors in derived rather than primary documents leads to problems in tracking their exact sources in the primary, marked-up documents. With a combined use of MECSPRES and ALPHATXT these problems can be overcome.
4.10.1 Command Line Parameters
ALPHATXT accepts 6-8 command line parameters:
1 | options | optional |
2 | file_in | optional |
3 | file_in | optional |
4 | file_in | optional |
5 | file_in | optional |
6 | file_out | optional |
7 | file_out | optional |
8 | file_out | optional |
If parameters 6 and 7 are both specified, ALPHATXT enters interactive mode and prompts the user to accept or reject each word in the input file which is not found in the master or additional word lists.
Alpha format files are flat ASCII files containing four strings per line. The first string consists of either a point, a dash or a number sign and/or a file name. The second and third strings are numbers (indicating line number and column number, respectively). The fourth string is a word. .
4.10.2 Defining an Alphabetic Sort Order
From the following document, MYFILE:
This is an exercise. The German word for this thing is Übung.the command
German The This an exercise. for is thing this word Übung.As can be seen, the file is listed according to conventional ASCII sort order, with English capital letters first and the German Umlaut 'Ü' last. Punctuation marks have been included in the word strings. The command
an exercise. for German is The thing This this word Übung.In this case, upper-case and lower-case characters have been merged in the sort order. However, the German Umlaut still comes last in the alphabet, and punctuation marks are still included. In order to avoid these problems, it is useful to define a character sort order file. With the following sort order file ALPHABET:
AaÄäBbCcDdEeFfGgHhIiJjKkLlMmNn OoÖöPpQqRrSsTtUuÜüVvWwXxYyZz 0123456789the command
an exercise for German is The thing This this Übung wordThanks to the character sort order file the German Umlaut has been sorted in its proper place, and because option 'e' has been specified on parameter 1 all characters not included in the sort order file (such as punctuation marks) have been excluded.
However, the word 'the' with a capital 'T' and the word 'this' with
both upper-case and lower-case first letter are still included in the list.
It is difficult to implement reliable procedures for correct handling of
case in texts which are not marked up in any way. (However, in combination
with MECSPRES, ALPHATXT is capable of implementing such distinctions on
suitably marked-up files - cf. further below.) It may therefore often be
convenient to convert all upper-case characters to lower-case. The command
ALPHATXT CEI ALPHABET - - MYFILE /LIST
will produce the following output file LIST:
an exercise for german is the thing this übung word4.10.3 Frequency Word Lists and Simple Statistical Analyses
ALPHATXT can also be used to produce frequency word lists and simple
statistics. The command
ALPHATXT EFIN ALPHABET - - MYFILE /LIST - /STAT
will produce the following output file LIST:
is 2 this 2 an 1 exercise 1 for 1 german 1 the 1 thing 1 übung 1 word 1The file STAT will contain frequency distribution lists for word lengths and word forms etc. Again, however, this part of the program works best with marked-up files - .
In the previous example, the file LIST was sorted in order of descending
frequency. If we want to produce from MYFILE a word list for use in later
spell checking, it must be sorted in either default or user-specified alphabetical
order. The command
ALPHATXT EI ALPHABET - - MYFILE /MASTER
will produce the following output file MASTER:
an exercise for german is the thing this übung wordIn order to show how spell checking works, let us assume that you add some new text to MYFILE, e.g. as follows:
This is an exercise. The German word for this thing is Übung. This ist a new exercise.The command
a newwhile REJECT will look like this:
. 3 9 istindicating that the misspelt word 'ist' occurs in line 3, column 9. After you have corrected the misspelt word 'ist' to 'is', you can check the document again with the command
Since MYFILE should by now not contain any word not included either
in MASTER or ACCEPT, you should not be prompted for any words, and the
files ACCEPT2 and REJECT should be empty. Alternatively, you may give the
command
ALPHATXT EIO ALPHABET MASTER ACCEPT MYFILE - /REJECT
and check that the file REJECT is empty.
Normally, you would now want to include the new accepted words in the
file ACCEPT into the master word list MASTER for later use. This can be
done with the command:
ALPHATXT EIO ALPHABET MASTER ACCEPT - /MASTER
which will produce the following new MASTER:
a an exercise for german is new the thing this übung wordIn the examples above, ALPHATXT was run with the O-option active on parameter 1. This may speed up processing with large master word lists quite drastically, but presupposes that the master word list is already ordered in accordance with the specified alphabetical sort order.
4.10.5 Working with Marked-up Documents
So far we have been looking at examples of uses of ALPHATXT with ordinary ASCII format documents (running text files). The strength of ALPHATXT, however, is its ability to work with files in alpha format files produced from MECS documents with MECSPRES.
With the following MECS document MYCODE:
£ £ £ < / / > £ £ £ £ £ £ £ £ £ £ £ £ <sec/ <s/This is an exercise.> <s/The <nationality/German> word for this thing is <german/Übung>.>> <sec/ <s/This is a new exercise.>>and the following PDT, MYOLD.PDT:
£ £ £ < / / > £ £ £ £ £ £ £ £ £ £ £ £ n o # p r d _ £ o sec 4 £ £ £ £ £ £the command
This is an exercise. The German word for this thing is Übung. This is a new exercise.This file is exactly identical to the document MYFILE, which was the departure point for previous examples. However, since the source file, MYCODE, is suitably marked up with codes for sections, sentences, foreign words etc., we are in a better position to perform case sensitive vocabulary control and simple statistical analyses with ALPHATXT. For example, with the following PDT, MYCODE.PDT:
£ £ £ < / / > £ £ £ £ £ £ £ £ £ £ £ £ n o # p r d _ £ o german d £ £ £ £ £ £ o nationality £ n £ £ £ £ £ o s 2 o £ £ £ £ £ o sec e £ £ £ £ £ £the command
# 3 5 this . 3 10 is . 3 13 an . 3 16 exercise. - 4 5 the . 4 22 German . 4 30 word . 4 35 for . 5 3 this . 5 8 thing . 5 14 is . 5 31 . # 7 5 this . 7 10 is . 7 13 a . 7 15 new . 7 19 exercise.This is an alpha format file, which can be input to ALPHATXT. The command
a an exercise for German is new the thing this wordAccording to the specifications in MYCODE.PDT, upper-case letters at the beginning of sentences have been changed to lower case, the upper-case 'G' in 'German' has been preserved, and the foreign word 'Übung' has been left out.
and ## for an example of how ALPHATXT can be used in combination with MECSPRES for spell-checking of MECS documents.
The command
ALPHATXT EFNR ALPHABET - - MYCODE.TMP /LIST - /STAT
produces a frequency word list LIST and a statistical analysis file
STAT. The contents of LIST is:
is 3 this 3 exercise 2 a 1 an 1 for 1 German 1 new 1 the 1 thing 1 word 1whereas STAT looks like this:
mycode.tmp ------------------------------ String length in characters Chars Strings 1 1 2 4 3 3 4 4 5 1 6 1 8 2 Sum: 61 16 ------------------------------ Segment length in strings Strings Segments 4 1 5 1 7 1 Sum: 16 3 ------------------------------ Section length in strings Strings Sections 5 1 11 1 Sum: 16 2 ------------------------------ String tokens per string type Tokens Types 1 8 2 1 3 2 Sum: 16 11 ------------------------------ Chars: 61 Strings: 16 Chars/String: 3.81 Min: 1 Max: 4 StdDv: 1.98 Segments: 3 Strings/Segment: 5.33 Min: 1 Max: 1 StdDv: 2.22 Sections: 2 Strings/Section: 8.00 Min: 1 Max: 1 StdDv: 2.94 Types: 11 Tokens/Type: 1.45 Min: 1 Max: 8 StdDv: 0.97In the summary at the bottom of the file, the leftmost column of numbers indicates absolute counts, - i.e. MYCODE.TMP contains 61 characters, 16 strings (words), 3 segments (sentences), 2 sections and 11 string types (word forms). The second column of numbers indicates average values, the third and fourth columns indicate maximum and minimum values, and the fifth column indicates standard deviation.
MECSSGML is a code converter for MECS version 2 documents. The program converts MECS-conforming documents to SGML-conforming document instances.
Usage:
read_file write_file # element (R7)
The program accepts 1-5 parameters:
1 | read_file | required |
2 | write_file | optional |
3 | # | optional |
4 | element | optional |
5 | (R7) | optional |
No matter which delimiters are used in the input file, the delimiters used in the output file will be those of SGML's concrete reference syntax. In the table below, MECS input is examplified by MECS default delimiters.
+----------------------------+--------------------------+ |Input file - MECS | Output file - SGML | |default delimiters | Concrete Reference Syntax| +----------------------------+--------------------------+ |No-element code | Empty element | |<tag> | <tag> | +----------------------------+--------------------------+ |One-element code | Element | |<tag/ ... /tag> | <tag> ... </tag> | |<tag/ ... > | | +----------------------------+--------------------------+ |Poly-element code | Elements | |[tag/#| ... /tag| ... /tag] | <p_tag><p_el> ... </p_el>| |[tag| ... | ... ] | <p_el> ... </p_el>| | | </p_tag> | +----------------------------+--------------------------+ |Character codes | Entities | |{tag} | &tag; | |{tag\tag} | &tag.tag; | |{"x"\tag} | &qxq.tag; | +----------------------------+--------------------------+Examples
The command
MECSSGML MYFILE NEWFILE
reads MYFILE and writes an SGML document instance to NEWFILE.
The command
MECSSGML MYFILE NEWFILE 60 document R7
reads MYFILE and writes an SGML document instance called 'document'
to NEWFILE. Maximum line length is 60 characters, all overlapping elements
will be modified to hierarchical structures, and 8-bit ASCII will be converted
to SGML entities.
SGMLVAL is a validating MECS parser for SGML documents. The program validates SGML documents for MECS conformance.
Usage:
SGMLVAL read_file file_out file_out
The program accepts 2-3 parameters:
1 | read_file | required |
2 | file_out | required |
3 | file_out | optional |
SGMLVAL is a batch program calling MECSVAL. Thus, SGMLVAL can be included in another batch program either by a CALL command, or by including a copy of it in the other batch program. The contents of SGMLVAL.BAT is:
ECHO OFF COPY C:\MECS\HEADSGML + %1 TEMPFILE.TMP MECSVAL - TEMPFILE.TMP %2 SGML IF NOT "%3" == "" COPY TEMPFILE.CDT %3 IF "%3" == "" COPY TEMPFILE.CDT %1.CDT DEL TEMPFILE.TMP DEL TEMPFILE.CDTIt should be noted that SGMLVAL presupposes that the MECS Program Package has been installed on drive C:, directory MECS ().
SGMLVAL provides a quick and easy-to use test of SGML documents for MECS conformance. for further details, and other options provided by the program MECSVAL. Please note also that even if SGMLVAL reports errors, it may be that the program SGMLMECS () may successfully convert a document MECS.
SGMLMECS is a code converter for SGML files. The program converts SGML-conforming files to MECS-conforming documents.
Usage:
read_file write_file
The program accepts 1-2 parameters:
1 | read_file | required |
2 | write_file | optional |
The input file must use SGML's reference concrete syntax. SGMLMECS will convert the file to a well-formed MECS document using a subset of MECS default delimiters, as follows:
+---------------------------------+--------------------------+ | SGML | MECS | +-----------------+---------------+------------+-------------+ | element | reference | code | default | | |concrete syntax| | delimiters | +-----------------+---------------+------------+-------------+ |empty element | < > | no-element | < > | +-----------------+---------------+------------+-------------+ |element | < > </ > | one-element| < / / > | +-----------------+---------------+------------+-------------+ |entity | & ; | char.rep. | { } | +-----------------+---------------+------------+-------------+ |comment | <!-- --> | comment | <|-- --|> | +-----------------+---------------+------------+-------------+ |marked section | <! [ ] > | comment | <| [ ] |> | +-----------------+---------------+------------+-------------+The output document's MECS header is:
£ < > < / / > £ £ | £ £ £ £ £ { £ £ }A successful conversion presupposes that the input SGML file is MECS-conforming. SGML files can be checked for MECS conformance with MECSVAL or SGMLVAL (). In particular, SGML files which make use of end tag omission or tag minimization are not MECS-conforming.
However, even SGML documents with occasional tag minimization may often be converted successfully with SGMLMECS (). SGMLMECS reads the input file in two passes: In the first pass, it identifies the generic identifiers of all non-minimized end tags. This information is used in order to extend minimized end tags in the second pass. In practice, therefore, it is mostly sufficient that the generic identifier is included in the end tag of at least one of the occurences of an element.
It should also be noted that even if an SGML file has been successfully converted to MECS, that does not necessarily mean that MECS applications will interpret the SGML mechanisms in the way that SGML applications do. E.g., SGML declarations (including comments, marked sections, element and entity declarations etc.) will be ignored by MECS software.
The MECS Program Package contains programs for the creation, validation, formatting, reformatting and analysis of documents conforming to MECS version 2. The package also contains programs for conversion of MECS version 2 documents to SGML and vice versa.
Det som er dokumentert her, er versjon 2....
The MECS Program Package is under constant revision and development. Comments, bug reports and suggested improvements are most welcome and will as far as possible be taken into consideration in future versions of the programs. Comments should be addressed to:
Email: Claus.Huitfeldt@hd.uib.no
All programs were written and compiled in Borland Corporation's Turbo Pascal, versions 5.5 and 6.0. with Editor Toolbox version 4.0.
The Program Package is made available as copyrighted software free of charge and may be freely redistributed and used. Commercial use, reverse-engineering of executable files, or usage of documentation files in any other form is infringement of copyright. Use of the program package for creating or editing documents shared with third parties should be acknowledged. The copyright holder cannot be held responsible for possible inconvenience, loss or damage which might be caused by the use of the software.
memory-manager-problemer for MECSVAL
diskettstasjon-drap (visse programmer)
visse programmer (som SGMLMECS) for strenge, aborterer men resultat ligger da på tempfile.tmp
MECSVAL teller ikke N-element-koder med mer enn 2 elementer, regner sammen siste summarium feil i basic-mode.
1The
introductory chapter is an updated and slightly revised version of Huitfeldt
1993.
2Incidentally,
this also explains why the present document has been given number 3 in
the Wittgenstein Archives' series of working papers - publication was originally
planned for 1992.
3See
e.g. Huitfeldt 1998 and TEI P3
Chapter 2 ("A Gentle Introduction to SGML") for elementary introductions.
4C.
Michael Sperberg-McQueen and I are working on a data structure for MECS
similar to that of SGML. This work has been promising, but it is too early
to make any final judgement, and no implementation exists.
5Huitfeldt
1995, p 238
6The
few things which are forbidden or mandatory in MECS are implicitly required
by the basic syntax, not explicitly stated.
7Even
though the amount of SGML software has increased, no SGML software to my
knowledge yet replicates the functionality of the MECS Program Package.
In particular, there is still little software that supports the CONCUR
feature.
8Strictly
speaking, XML is a subset of SGML. Therefore, the comparisons made here
are not between a non-SGML system and SGML, but between the SGML subset
XML and "full" or "unrestricted" SGML.
9However
cf. Part II, ##
10In
earlier versions of MECS generic identifiers were called 'code names',
and attribute strings were called 'code name extensions'. The new terminology
has been adopted in order to approximate to SGML terminology. However,
this may have the disadvantage of being slightly misleading: MECS' generic
identifiers and attribute strings are not identical to SGML's generic identifiers
and attributes.
11This
exception is inconsistent with the general tendency of MECS to allow overlap
anywhere, and is the (unintended) result of influence from a constraint
enforced at the Wittgenstein Archives.
12A
small reservation is required here: the indicators, i.e. character no 2
and words nos 26..31 of the CDT cannot be unequivocally deduced from the
document alone. However this is rather insignificant since these indicators
play no role except as householding characters internally within the CDT.
13However
cf. Part II, ##
14Although
the conversion program SGMLMECS (cf. Part II, ##) does handle SGML tag
minimization and end tag omission to some extent.
15The
example is taken from TEI P1, page 30. Cf. also TEI P3, section 2.9.2 on
pp. 34-35.
16Documented
in Huitfeldt 1990.
17Documentation
in the Code Syntax Part of "Registration Standard for The Wittgenstein
Archives at the University of Bergen", unpublished working paper.
18Documented
in an earlier draft of the current document of September 1992, unpublished
but widely circulated.
19In
intermediate versions of this document version 2.00 was referred to as
version 1.02.
20SGML
software is able to automatically modify SGML documents so that they comply
with this condition. Even if your document should for some reason fail
to satisfy the condition, it may often be processed with the MECS Program
Package after all, .
21For
example, SGML attributes and declarations (including comments and marked
sections) are regarded by MECSVAL as attribute strings and comments, respectively.
This means that MECS programs will simply disregard SGML attributes and
declarations, including the entire SGML DTD.
22Though
even if you run MECSVAL in so-called SGML mode, it validates for MECS conformance,
not for SGML conformance.
23MECS-WIT,
which is referred to in this example, is described Huitfeldt 1997.
24MECSBETA
is a batch file which calls the programs MECSPRES and BETATXT as exemplified
in the two command lines below.
25MECSSPEL
is a batch file which calls the programs MECSPRES and ALPHATXT as indicated
above. In addition, MECSSPEL will write statistical data to a separate
file called DOC1.STS.