SGML: encoding theory and practice

SGML: encoding theory and practice



From owner-humanist@lists.Princeton.EDU Thu May 16 19:44:37 1996
Date: 	 Thu, 16 May 1996 18:35:59 -0400 (EDT)
Sender: owner-humanist@lists.Princeton.EDU
From: Humanist <mccarty@phoenix.Princeton.EDU>
To: Humanist Discussion Group <humanist@lists.Princeton.EDU>
Subject: 10.33 encoding theory and practice

               Humanist Discussion Group, Vol. 10, No. 33.
    Center for Electronic Texts in the Humanities (Princeton/Rutgers)
        Information at http://www.princeton.edu/~mccarty/humanist/

  [1]   From:    Michael Sperberg-McQueen                           (117)
                <U35395%UICVM.BitNet@pucc.Princeton.EDU>
        Subject: Orlandi on theory and practice of MS encoding


-------------------------------------------------------------------------

Many thanks to Tito Orlandi for calling our attention (in Humanist
10.1.1 of 7 May 1996) to his paper "Teoria e prassi della codifica dei
manoscritti" at http://rmcisadu.let.uniroma1.it/~orlandi/encod.html.
This paper is very much to the point, as regards the theory of encoding
and its practical implications, and I recommend it to anyone interested
in the field.

If I may trespass a bit on the patience of those who are not thus
interested in the field, I'd like to respond here to some issues
raised by Orlandi's essay.

In particular, I like Orlandi's attempt to bring order to the problem by
distinguishing systematically the 'ideal text', the 'virtual text', and
the 'material text' (testo ideale, testo virtuale, testo materiale),
which correspond approximately with what I would call the authorial
conception, the text as an abstract linguistic/cultural object, and the
book or witness to the text.

Orlandi usefully applies some basic semiotic thinking to the theory of
encoding and the practical implications of that theory.  In one point,
however, I wonder whether Orlandi is not dismissing too quickly the
position taken by Ian Lancashire in his postings to Humanist last
December.  Orlandi says in his concluding paragraph:

  Tuttavia, quale che sia lo scopo che ci si propone, la codifica
  su supporto magnetico non &egrave; la codifica del testo materiale,
  ma quella del testo virtuale, che si ottiene esaminando il testo
  materiale alla luce della competenza di chi lo ha prodotto.  Solo
  questo permetter&agrave; di identificare tutti gli elementi singoli,
  atomici, che formerrano l'oggetto della codifica, e di formulare
  una tabella convenzionale di corrispondenza fra i codici, cio&egrave;
  i simboli della codifica, e quegli elementi.

  At any rate, whatever scope be proposed for the encoding of a text,
  encoding in magnetic form is not the encoding of the material text,
  but that of the virtual text, which is obtained by examining the
  material text in the light of the competence of the creator.  Only
  the encoding of the virtual text will make it possible to identify
  all the individual, atomic elements which will form the object of
  encoding, and to formulate a conventional table of correspondences
  between the tags, i.e. the symbols of the encoding, and those
  elements.  (my translation, take with grain of salt)

This seems to me a perfectly acceptable approach, in many cases.  But it
does present some problems for those readers faced with material texts
(MSS, inscriptions in the stone of ancient ruins, ...) which we do *not*
wholly understand, and from which can reconstruct only partially the
virtual text.

Sometimes, we can understand nothing at all.  In practice, it seems to
me, we tend to do two things in such cases:
  * in editions, we record as much detail of the physical state of the
original artefact as seems (a) economically feasible and (b) potentially
significant -- by means of detailed transcription, or by images of the
artefact, or both
  * we retain the original artefact in a museum, archive, or library, in
order that it can be consulted in cases of need.

If we can *partially* unravel the virtual text, then we need an
electronic representation which will
  * allow us to express our understanding of the virtual text (such as
it is, given the faulty state of the material text and our own faulty
competence) as far as possible
  * allow us to record as much of the material manifestation of the text
as we think *might be* significant.

I believe Prof. Lancashire is concerned in part with such situations,
and it is difficult to dismiss entirely the desire to record the
material conditions of the text in such cases.

Even in cases where we think we understand a (virtual) text
satisfactorily, the historical vicissitudes of text transmission by
print and manuscript do text to encourage the multiplication of text
versions -- and oral tradition is even more prolific of variation.  And
the material transmission of the text is itself an object of study, even
for those of us with an 'allographic' understanding of the text.  And
therefore it is necessary that scholarship possess a method of encoding
which can record *both* the virtual text *and* its historically
important material manifestations.

In this sense, I have to agree that one of Prof. Lancashire's premises
is correct, even while the other one (the claim that the TEI *requires*
a focus on the virual text and *forbids* the encoding of the material
text) is false.

It is perhaps worth pointing out, however, that recording the material
manifestation of a text when we do not understand it is fraught with
risks:  if we don't understand the text, then we cannot guarantee that
our recording of the text's material manifestation will capture its
every significant aspect.  We are likely to lose something which later
analysts will think bears meaning -- just as Thomas Johnson, the editor
of Emily Dickinson, may possibly have lost significant distinctions in
her punctuation despite his very conservative transcription policies.
(At least one later scholar says the marks transcribed by Johnson as
dashes are rhetorical marks for rising, sustained, or falling tones, and
need to be transcribed as at least three distinct symbols.)

Equally to the point, the risks of omission are not limited to texts we
are conscious of not understanding.  There are no detectable limits to
the ingenuity of scholarship in drawing inferences from texts and the
circumstances of their transmission, so there is no detectable limit to
the set of features which *might* be significant in some context or
other, or under some analytical microscope or other.  From this I draw
the inference that *any* representation of a text, like any
representation of any object, is likely to lose some information:

  Representations are inevitably partial, never disinterested;
  inevitably they reveal their authors' conscious and unconscious
  judgments and biases.  Representations obscure what they do
  not reveal, and without them nothing can be revealed at all.
  ("Text in the Electronic Age," L&LC 6.1 (1991): 34)

This is one reason the notion of extensibility is built so deep into the
TEI Guidelines.

Under the circumstances, it is futile to expect any encoding of the
physical manifestations of a text to be complete, just as it is futile
to expect the physical manifestation to exhaust the significance of the
virtual text, still less of the ideal text.  All that can be expected
by later users, and all that can be hoped for by encoders, is that
an encoding capture without excessive distortion some of the features
of a text the encoders believe they understand.  As Orlandi says (in
private correspondence):

    All in all: we can encode only what we understand; and what
    we do not understand, may be reproduced and communicated
    by means of analogical rather than digital devices, so
    photography, autopsy, etc. Or more precisely: by means of
    devices used analogically, including digital images etc.

(Even this may be too optimistic:  analogical reproductions lose
information, too, because they choose some features of the original,
rather than others, for reproduction.)

Many thanks again for Tito Orlandi for this paper.


Best regards,

C. M. Sperberg-McQueen