Conversations by Dan Connolly, Andrea Asperti, and Philip Wadler on the the public W3C 'spec-prod@w3.org' mailing list [forum for discussion of W3C Spec Production Issues] have (incidentally) referenced MSL (Model Schema Language), which represents "an attempt to formalize some of the core idea in XML Schema." MSL, as with Hypertextual Electronic Library of Mathematics (HEML) and the online The COQ proof assistant, may be of interest to XML developers having expertise in mathematics and formal logic.
A presentation entitled "MSL: A model for W3C XML Schema" will be given in an XML Foundations session at the Tenth International World Wide Web Conference (WWW10). The same WWW10 session, chaired by Carl Lagoze of Cornell University, will feature also "A Unified Constraint Model for XML" (Wenfei Fan, Gabriel M. Kuper, Jerome Simeon) and "Keys for XML" (Peter Buneman, Susan Davidson, Wenfei Fan,Carmem Hara, Wang-Chiew Tan). An online draft paper indicates that "MSL has already proved helpful in work on the design of XML Query; we expect that similar techniques can be used to extend MSL to include most or all of XML Schema." According to one online authority, MSL (Model Schema Language) is "an initiative by members of the W3C's XML Schema Working Group, to provide a formal model for the XML Schema language." Cf. the Software AG Glossary, which notes also a 'New MSL Draft by Jonathan Robie, published 15.09.2000 on a W3C mailing list.' Appendix A of the online paper lists problems with the current XML Schema Working Drafts, uncovered during the MSL work; Appendix B lists suggestions for improving XML Schema, based on the MSL work.
"MSL: A Model for W3C XML Schema." By Allen Brown [Microsoft, allenbr@microsoft.com], Matthew Fuchs [Commerce One, matthew.fuchs@commerceone.com], Jonathan Robie [Software AG, jonathan.robie@SoftwareAG-USA.com], and Philip Wadler [Avaya Labs, wadler@avaya.com]. November 2000. 22 pages, with 13 references. Accepted for WWW10, Hong Kong, May 2001. Abstract: "MSL (Model Schema Language) is an attempt to formalize some of the core idea in XML Schema. The benefits of a formal description is that it is both concise and precise. MSL has already proved helpful in work on the design of XML Query. We expect that similar techniques can be used to extend MSL to include most or all of XML Schema. Referenced in Philip Wadler's XML resources. See also the sources of the MSL paper (mv to msl-www.tar, 'tar' format). [cache PDF, cache sources]
Philip Wadler, commenting on the 'inference rule markup' thread: "The HELM work looks very interesting. Please take a look at MSL, and let me know what you think about how easy or hard it would be to formalize in HELM/COQ... Currently we are trying to maintain the MSL document in HTML, and that is a big mess. Appropriate MathML tools would be a big step up. And the chance to directly verify properties of the formalism from the same data that is published would be a huge step up. (Babbage designed the analytic engine so that it would typeset log and trig tables directly, to avoid the possibility of transcription error... I would be interested in formalizing both MSL and the query algebra." (1) MSL: http://www.cs.bell-labs.com/who/wadler/topics/xml.html#msl; (2) Query algebra: http://www.cs.bell-labs.com/who/wadler/topics/xml.html#xalgebra-icdt and http://www.w3.org/TR/query-algebra.
From the MSL document 'Introduction':
"XML is based on two simple ideas: represent documents and data as trees, and represent the types of documents and data using tree grammars. Tree grammars are represented using DTDs or XML Schema. XML Schema is being developed under the auspices of the World Wide Web Consortium (W3C), the body responsible for HTML and XML, among other things. As of this writing, XML Schema has just entered candidate recommendation status.
XML Schema is more powerful than DTDs. Among other things, it uses a uniform XML syntax, supports a sophisticated notion of derivation of document types, permits all groups and nested definitions, and provides atomic data types (such as integers, oating point, dates) in addition to character data. XML Schema is also more complex than DTDs, requiring around a couple of hundred pages to describe, as opposed to the thirty or so in the original specification of XML 1.0 (which included DTDs). The remainder of this paper assumes some familiarity with XML Schema.
MSL (Model Schema Language) is an attempt to formalize some of the core idea in XML Schema. The benefits of a formal description is that it is both concise and precise, although it does require some familiarity with mathematical notation.
MSL is described with an inference rule notation. Originally developed by logicians, this notation is now widely used for describing type systems and semantics of programming languages. A basic understanding of grammar rules and rst-order predicate logic should be adequate to understand this paper; all other notations are defined before they are used. We hope our work on MSL may make XML Schema easier to understand, and may aid in the process of designing other specifications and tools that build on XML Schema. In particular, MSL has already proved helpful in work on the design of XML Query, another W3C standard currently under development. We expect that similar techniques can be used to extend MSL to include most or all of XML Schema.
MSL (like XML Schema) draws on standard ideas about type systems for semistructured data as described in the literature refs], notably the use of regular expressions and tree grammars. In particular, MSL closely resembles the type system in Xduce..."