[From Matthias Clasen (Institut für Mathematik, Albert-Ludwigs-Universitaet Freiburg). Created as a DSSSL-learning project. The package contains a DTD, DSSSL stylesheets, SGML source, etc. Available via FTP: ftp://peano.mathematik.uni-freiburg.de/pub/tutorial.tgz]

Bibliographical references meet DSSSL

To learn more about DSSSL, I posed myself the following problem:

How are bibliographical references handled in DSSSL ?

I wanted to achieve roughly the functionality of LaTeX + Bibtex. This includes:

You can find all mentioned files, including this document, in the gezipped tar file bib.tgz.

The DTD

This is only a vastly simplified model. Bibtex knows 13 different types of references having many more fields than our examples book and article.

The example DTD
<!element doc1 - - ( bibliography, p* ) >
<!element doc2 - - ( p*, references? ) >
<!element p - o ( #PCDATA ) +( bibref, bibnoref ) >
<!element ( bibliography, references ) - - ( article | book )* >
<!element ( article, book ) - - ( author & title ) >
<!attlist ( article, book ) id ID #REQUIRED >
<!element ( author, title ) - - ( #PCDATA ) >
<!element bibref - - EMPTY >
<!attlist bibref cite IDREF #REQUIRED
                 nocite (nocite) #IMPLIED >
	  

As you can see, this contains two document types: users will normally create <doc1> documents with no explicit list of references an the bibliographical data in the <bibliography> element. A first transformation step will convert this into a <doc2> document containing an explicit list of references in the <references> element. Users could also create a <doc2> document manually (this was one of the goals).

The intermediate document will then be subject to a formatting process.

An example document

A typical document as entered by a user thus looks like:

A typical document instance
<!doctype doc1 system "bib.dtd" [
<!entity bibdata system "bibdata.sgm" >
]>
<doc1>
  <bibliography>&bibdata;</bibliography>
  <p>This is just some example text. There are probably better examples 
     in <bibref cite="handbook">.
<bibref nocite cite="boo">
<bibref nocite cite="bar">
<bibref nocite cite="baz">
<bibref nocite cite="foo">
  <p>Now let's cite the DSSSL standard <bibref cite="dsssl">! And never forget
  to have the handbook <bibref cite="handbook"> under your pillow.
  <p>And now I even cite a faked article <bibref cite="myarticle"> of my own! 
</doc1>
	  
The bibliographical database bibdata.sgm
<article id="myarticle">
  <author>Matthias Clasen</author>
  <title>Wish I had one</title>
</article>
<book id="dsssl">
  <title>ISO/IEC 10179:1996</title>
  <author>Joint Technical Commitiee ISO/IEC JTC1, 
      Information technology</author>
</book>
<book id="handbook">
  <author>Charles F. Goldfarb</author>
  <title>The SGML Handbook</title>
</book>
<article id="boo">
  <author>D</author>
  <title>Article by D</title>
</article>
<article id="foo">
  <author>A</author>
  <title>Article by A</title>
</article>
<article id="baz">
  <author>C</author>
  <title>Article by C</title>
</article>
<article id="bar">
  <author>B</author>
  <title>Article by B</title>
</article>
<article id="uncited1">
  <author>E</author>
  <title>Article by E</title>
</article>
<article id="uncited2">
  <author>F</author>
  <title>Article by F</title>
</article>
	  

This example shows how the bibliographical information can be kept separately from the document instance. The approach taken here might become problematic with very large databases, since the ID's come all from the same name space. Using subdoc entities instead of text entities would avoid that problem but at the cost of abandoning the simple ID-IDREF mechanism completely.

The transformation process

In the absence of an implementation of the transformation language of DSSSL, I had to use the nonstandard flow object classes of Jade to do the transformation with the style language. I intend to translate this to the transformation language when implementations become available.

The missing-standard-procedures entity contains some functions which are part of the style langugage, but are not yet implemented in Jade: map, node-list-reduce, node-list-contains?, node-list-remove-duplicates?, node-list-filter and node-list->list. I have simply copied the sample implementations from the standard.

The copy-attributes procedure is from James Clark, the identity transformation in the default construction rule is from W. Eliot Kimbers dovalueref.dsl and the reference sorting is lifted from the index sorting functions in Mark Burtons dbtohtml.dsl.

It would be nice to make the sorting function customizable, e.g. by introducing a class or role attribute on the <bibref> elements and create multiple lists of references according to that.

Another nice thing to implement would be an index of the citations

The transformation specification
<!DOCTYPE style-sheet PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN" [
<!ENTITY missing-standard-procedures system "std-proc.dsl">
]>
<style-specification>

(declare-flow-object-class element
  "UNREGISTERED::James Clark//Flow Object Class::element")
(declare-flow-object-class empty-element
  "UNREGISTERED::James Clark//Flow Object Class::empty-element")
(declare-flow-object-class document-type
  "UNREGISTERED::James Clark//Flow Object Class::document-type")
(declare-flow-object-class entity
  "UNREGISTERED::James Clark//Flow Object Class::entity")

&missing-standard-procedures; 

; Suspiciously missing counterpart of node-list->list.
(define (list->node-list l)
  (apply node-list l))

(define (copy-attributes #!optional (nd (current-node)))
  (let loop ((atts (named-node-list-names (attributes nd))))
    (if (null? atts)
        '()
        (let* ((name (car atts))
               (value (attribute-string name nd)))
          (if value
              (cons (list name value)
                    (loop (cdr atts)))
              (loop (cdr atts)))))))

(define (string-ci>? s1 s2)
  (let ((len1 (string-length s1))
	(len2 (string-length s2)))
    (let loop ((i 0))
      (cond ((= i len1) #f)
	    ((= i len2) #t)
	    (#t (let ((c1 (index-char-val (string-ref s1 i)))
		      (c2 (index-char-val (string-ref s2 i))))
		  (cond
		   ((= c1 c2) (loop (+ i 1)))
		   (#t (> c1 c2)))))))))

(define (index-char-val ch)
  (case ch
    ((#\A #\a) 65)
    ((#\B #\b) 66)
    ((#\C #\c) 67)
    ((#\D #\d) 68)
    ((#\E #\e) 69)
    ((#\F #\f) 70)
    ((#\G #\g) 71)
    ((#\H #\h) 72)
    ((#\I #\i) 73)
    ((#\J #\j) 74)
    ((#\K #\k) 75)
    ((#\L #\l) 76)
    ((#\M #\m) 77)
    ((#\N #\n) 78)
    ((#\O #\o) 79)
    ((#\P #\p) 80)
    ((#\Q #\q) 81)
    ((#\R #\r) 82)
    ((#\S #\s) 83)
    ((#\T #\t) 84)
    ((#\U #\u) 85)
    ((#\V #\v) 86)
    ((#\W #\w) 87)
    ((#\X #\x) 88)
    ((#\Y #\y) 89)
    ((#\Z #\z) 90)

    ((#\ ) 32)

    ((#\0) 48)
    ((#\1) 49)
    ((#\2) 50)
    ((#\3) 51)
    ((#\4) 52)
    ((#\5) 53)
    ((#\6) 54)
    ((#\7) 55)
    ((#\8) 56)
    ((#\9) 57)

    (else 0)))

; Sort an alist. Each member of al should be a pair whose car 
; must be a string which is used as sort key.
(define (sort-alist al)
  (letrec ((list-head (lambda (l n)
			(if (> n 0)
			    (cons (car l) (list-head (cdr l) (- n 1)))
			    '())))
	   (merge (lambda (al1 al2)
		    (cond ((null? al1) al2)
			  ((null? al2) al1)
			  ((string-ci>? (car (car al2)) (car (car al1)))
			   (cons (car al2) (merge al1 (cdr al2))))
			  (#t (cons (car al1) (merge (cdr al1) al2)))))))
    (let* ((ll (length al))
	   (ldiv2 (quotient ll 2)))
      (if (> 2 ll)
	  al
	  (merge (sort-alist (list-head al ldiv2))
		 (sort-alist (list-tail al ldiv2)))))))

; Turn a node list into an alist with the help of proc which must 
; be a function turning a singleton node list into a pair.
(define (node-list->alist proc nl)
  (map proc (node-list->list nl)))

; Turn an alist into a node list.
(define (alist->node-list al)
  (list->node-list (map cdr al))) 

; Return a node list containing all nodes whose id's occur as the value 
; of a bibref element in the subtree below the current node. The node 
; list is sorted alphabetically by author.
(define (references) 
  (alist->node-list 
   (sort-alist 
    (node-list->alist 
     (lambda (snl) (cons (data (select-elements (descendants snl) "AUTHOR")) snl)) 
     (node-list-remove-duplicates
      (node-list-map 
       (lambda (snl) (element-with-id (attribute-string "CITE" snl)))
       (select-elements (descendants (current-node)) "BIBREF")))))))

; parameters

(define instance-sysid "bibdoc.sgm2")
(define dtd-sysid "bib.dtd")

; construction rules

(root 
 (case (gi (node-property 'docelem (current-node))) 
   (("DOC1") (make entity
		   system-id: instance-sysid
		   (make document-type
			 name:  "DOC2" 
			 system-id: dtd-sysid)
		   (process-children)))
   (("DOC2") (error "I can only work on documents of type DOC1."))))
 
(default
  (cond
   ((node-list-empty? (node-property 'content (current-node)))
    (make empty-element
	  attributes: (copy-attributes)))
   (else
    (make element
	  attributes: (copy-attributes)))))

(element BIBLIOGRAPHY
  (empty-sosofo))

(element DOC1
  (make element
	gi: "DOC2"
	attributes: (copy-attributes)
	(process-children)
	(make element
	      gi: "REFERENCES"
	      (process-node-list (references)))))
 
</style-specification>
	  

You should take a look at the result of running Jade's SGML backend on the example document with this style sheet.

The formatting process

The transformation process has already done the most for us, the only remaining task is to create the keys for the references. This is done in the procedure bibkey. There are three parameters involved:

The formatting process
<!DOCTYPE style-sheet PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN" [
<!ENTITY missing-standard-procedures system "std-proc.dsl">
]>

<style-specification>

; parameters 

(define bibkey-style "numeric")
(define bibkey-open "[")
(define bibkey-close "]")

(define (bibkey snl) 
  (string-append 
   bibkey-open
   (case bibkey-style		  
     (("full-id") (attribute-string "ID" snl))
     (("short-id") (substring (attribute-string "ID" snl) 0 3))
     (("numeric") (number->string (+ 1 (node-list-length (preced snl))))))
   bibkey-close))

; construction rules

(root 
 (case (gi (node-property 'docelem (current-node)))
   (("DOC1") (error "I can only work on documents of type DOC2."))
   (("DOC2") (process-children))))

(element DOC2
  (make simple-page-sequence
	left-margin: 1in
	right-margin: 1in
	top-margin: 1in
	bottom-margin: 1in
	(process-children)))

(element P
  (make paragraph
	(process-children)))

(element BIBREF
  (if (attribute-string "NOCITE") (empty-sosofo)
      (make sequence
	    (literal (bibkey (element-with-id (attribute-string "CITE")))))))
  
(element REFERENCES
  (make sequence
	(make paragraph 
	      font-size: 12pt
	      font-weight: 'bold
	      space-before: 0.5cm
	      space-after: 0.2cm
	      (literal "References"))
	(process-children)))

(element BOOK
  (make paragraph 
	(literal (bibkey (current-node))) 
	(literal " ")
	(process-matching-children 'author)
	(literal ", ")
	(make sequence
	      font-posture: 'italic
	      (process-matching-children 'title))))
  
(element ARTICLE
  (make paragraph
	(literal (bibkey (current-node))) 
	(literal " ")
	(process-matching-children 'author)
	(literal ", ")
	(make sequence
	      font-posture: 'upright
	      (process-matching-children 'title))))

</style-specification>
	  

You can have a look at the result of applying this style sheet to the result of the first pass on the example document. It is a dvi file created by Jade's TeX backend and Sebastian Rahtz's Jadetex macros.


I would appreciate comments on the DSSSL code.

Matthias Clasen

mclasen@sun2.mathematik.uni-freiburg.de