[This local archive copy mirrored from the canonical site: http://www.indiana.edu/~letrs/vwwp/vwwp-general.html; links may not have complete integrity, so use the canonical document at this URL if possible.]

Victorian Women Writers Project

Encoding Guidelines

Listed below are some general editorial principles to follow when transcribing and encoding texts for this project. The most important principle to follow is that when you have questions about transcribing or encoding, ask the general editor (PWILLETT@indiana.edu). I will try to collect questions and distribute to the group.

Generally speaking, if something is there on the page as part of the text, include it in the electronic transcription, using the appropriate tag or tags to encode it. However, there may be things on the page that are not part of the textual matter, but instead belong to the bibliographic features of a book. These include such things as the recurring titles that may appear at the top of each page, or printer's marks (a "B" for instance) that may appear at the bottom of a page. Other typographic features may be included in the text, such as a recurring picture of an ivy vine at the end of each poem. Such bibliographic or typographic features can be left out, but noted in the header as a feature of the text. If you have questions about any such feature, please ask the editor

General Procedures

You will be assigned a text to transcribe and encode. You will be working from printouts, either from texts available in the RSCH collection, or from another library. You may mark the printouts with notes, but please use pencil.

You will first discuss the general structure of the text with the editor, and after taking a quick look through the text, discuss any features about which you are not sure. There are a number of different word processors and SGML-editors--use whichever one you find most productive. Be sure to save the text as an ASCII file in any case. If you run into problems along the way, either contact the general editor or note it in some way, and inform the editor upon completion of the transcription. I will parse and proof the transcriptions (using Author/Editor and a spell-checker). A printout of the SGML file and one of the text without markup will returned to you for more careful proofreading.


In general, it does not matter if tags are encoded in all upper- or lower-case letters. It will be perhaps easier, if you are using a word processor or basic text editor, to use lower-case for tags. You should use double quotation marks with entity references: <lg type="stanza">

Page Breaks

Page breaks are encoded as <pb> at the place where they occur in the text. The page number is noted using the "n=" attribute, with the actual page number as the value. If the page number is in Roman numerals, transcribe it in Roman numerals , e.g. "n=vi". If a poem starts on a new page, then the <pb> tag should precede the beginning of the poem:
<pb n="3">
<div0 type="poem">
etc., etc. Otherwise, if a page break occurs within a poem, then place the <pb> tag exactly where it occurs, e.g between lines, or between stanzas.

If the text has a table of contents, then use an extra attribute "id=" to refer back to the table of contents entry. The value of the "id" attribute will be the actual page number, preceded by a "p". In this case, the page break would look like this:
<pb id="p3" n="3">
Again, if the page number is in Roman numerals, the "p" prefix is not necessary:
<pb id="vii" n="vii">

Notes and Annotations

Occasionally, notes will occur within prefaces, titles, poems or appendices that should be marked. Place the note tag exactly in the text where it occurs, using these attributes (and see P3, p.1072 for more discussion):

Errors and Corrections

Even publishers and printers make mistakes, and you will occasionally come across errors in the texts you encode. They should be noted using the <corr> tag. This tag has three attributes: "SIC=" for the erroneous wording or spelling as present in the printed text; "RESP=" to note your initials; and "CERT=" to note your uncertainty. Errors can be very difficult to spot--if you are uncertain about a potential error, note it in the text using the <CORR> tag and the "cert=n" attribute. If you are certain, then there is no need to use the "cert=" attribute. Here are a few real life examples:

And, as she turned, they saw how bare
And bruisèd where her pilgrim feet.

<L>And, as she turned, they saw how bare</L>
<L>And bruis&egrave;d <CORR SIC="where" RESP="PW">were</CORR> her pilgrim feet.</L>

For wings and probocis can go their own way.
<L>For wings and <CORR SIC="probocis" RESP="PW">proboscis</CORR> can go their own way.</L>

And if if you give it dinner, yet a further pack or two.
<L>And <CORR SIC="if if" RESP="FJ">if</CORR> you give it dinner, yet a further pack or two.</L>

Note that the corrected text occurs within the <corr></corr> tag pair, with the original erroneous text within the <corr> tag as the value of the "sic" attribute.

Emphasized, Foreign and Highlighted Words

Generally, there are several different ways to indicate text that is highlighted in some way, generally by italics or bolded type. Each of these tags has attributes that allow for noting the typographic rendering and language. The most common attributes for rendering:

In general, don't spend too much time trying to marking all the typographic features of a text, especially if they do not seem to add to an understanding of the text.

A word or phrase that is marked by the author for rhetorical or linguistic effect would use the <emph> tag:

Say rather, why not? It is easier so;
<L>Say rather, why <emph rend="italics">not</emph>? It is easier so;</L>

For a word or phrase that is typographically distinct from the surrounding text, for which there is no clear rhetorical or linguistic meaning, use the <hi> tag. This is most commonly used when the first letter or first word of a poem is highlighted as a typographic convention.

A Minor Poet.

HERE is the phial; here I turn the key

<div0 type=poem><head>A Minor Poet.</head>
<L><HI rend="bold">HERE</hi>is the phial; here I turn the key</L>

Foreign words may also be highlighted in the text; these may be tagged using the <foreign> tag, using the "lang" attribute to identify the language used.

‘Eh ? what ? baffled by a woman ! Ah, sapristi ! she can run !
<L>&lsquo;Eh ? what ? baffled by a woman ! Ah, <FOREIGN LANG="ita" rend="italics">sapristi</FOREIGN> ! she can run !</L>

There are ways to combine these characteristics, using a combination of tags and attributes.

Thousands of people heard him say it.
<emph><hi rend="italics">Thousands</hi> </emph>

Aimée, most patient listener, most true friend,
<L><name lang="fre">Aimée</name>, most patient listener, most true friend,</L>


We will in some cases include graphical representations of illustrations. If you come across an illustration that you think is important to the text, please note it and inform the editor.

About the VWWP
To the VWWP Home Page
To the VWWP Library