[This local archive copy mirrored from: http://www.webweek.com/1997/11/10/software/19971110-wood.html; see the canonical version of the document.]

iWorld Web Site Newsstand Trade Shows Advertising 
Rates and Information Corporate Information Search Page Subscription 
Information
Internet.com CBT Systems. More options. More partnerships. More impact. More consistency. More of what you want in IT Training.  Click here.
November 10, 1997

Q&A: Lauren Wood, Chair, W3C DOM

By Nate Zelnick

W3C Spec Adds Structure to Web

No task taken up by the World Wide Web Consortium is as ambitious or far-reaching as the attempt to define a basic structure for all Web documents. As the core piece of Dynamic HTML--both Microsoft and Netscape versions--the Document Object Model promises to be the primary way applications and Web developers will work with pages in the future. If the effort succeeds, a host of new ways to automate information retrieval, transactions, and other services will be more easily achieved.

Lauren Wood, technical product manager for SoftQuad Corp., has chaired the W3C's DOM working group since its inception last spring. The group, composed of representatives from browser makers Microsoft and Netscape, SGML experts Inso and Arbortext, as well as Sun Microsystems Inc., Novell, IBM, and others, released a draft of the first portion of the DOM specification--called the Level 1 Core--last month.

Web Week: What's not in the Level 1 Core DOM specification that will be in the final spec?

Lauren Wood: It doesn't contain all of those things that people are used to in current dynamic HTML--the HTML convenience functions. In some versions of current DHTML, such as Netscape 3 or what we're calling Level 0, you can say "Give me all of the images" and the application knows that an image is actually an IMG element. We haven't done those yet--they'll be in the HTML part that we're working on at the moment. This is because we have to also come up with a way for this to work in XML where you don't know that an image is called IMG, necessarily. We don't want to put out something publicly until we are reasonably sure that we're not going to have to change it too much later.

WW: So when you look at something like how Internet Explorer 4.0 exposes all elements, that's not in this level?

Wood: No, not in the Level 1 Core. Level 1 Core you can use on HTML and XML documents. It knows nothing about HTML and it knows nothing about DTDs either because DTD stuff--like how do I get DTDs, and what's the content model--that stuff is all XML, really. What we've released right now is more of a frame than anything else. It sets up what is an element object, what is an attribute, how do I reach an element, how do I reach an attribute in a general way. All of the rest will sit on top of that. The first thing you need to know is if I need to get to an element from in a document somewhere, how do I get to this element.

WW: Is this the most noncontroversial part of the DOM? The easiest part?

Wood: I wouldn't call it the easiest part--not because of controversy, but because it's the foundation, the framework. So if we make mistakes here, they're just going to multiply.

So we had to spend a lot of time on this. That's why it's taken a little bit longer then we had originally anticipated. We had to spend time on it to make sure we actually get it right. Because we don't want to have to redo it in two months or three months when we're working on level two or we're working on the event model, and determining what sort of HTML conveniences people are going to want for navigating their way through documents and what sort of extra XML stuff is needed. If you look at our requirements, which are available to the public on the W3C's Web site, there's a little notation that says "After Level 1" on the things we know we aren't putting into Level 1. A fair amount of it will be there, although we might end up pushing some of it off.

WW: Will this end up being derived out of the sort of hodgepodge that ended up being HTML?

Wood: Well, it's not only HTML, it's also SGML (now XML) and there've been these two worlds of experience as to how to look at a document and what is a document. The HTML one with fixed tags--fixed in principle, anyway--and the SGML or XML where you can do what you want, but you have to tell people what you're doing. There's been a lot of experience in both things. For example, we've taken a lot of ideas from SoftQuad's authoring tools. We have experience in, to some extent, a DOM-like thing with our SGML editor. Author/Editor has a tool associated with it called Sculptor you can use to manipulate documents. And obviously, Microsoft and Netscape have experience in their directions.

We've taken a lot of those ideas and put them together.

WW: It seems like there are two frameworks-RDF and the hierarchy in a tree structure.

Wood: Even an HTML document isn't a completely hierarchical tree structure, which is why we're all so careful to use the term "structure model." In HTML you might have a named anchor and the link to that anchor. These point at each other, so it's not a hierarchy. And then when you come to add style information on top, you have other pointers to your rendering engine. We try in the DOM not to specify the model underneath. What we're specifying is the interface by which you interact with the various objects.

Obviously, all of us have a different mental model to help us figure out what the actual functions should be to get from one part of the document to the other. And obviously, sometimes in our discussions we need to drop down and think about how we would implement it and talk about how to make it concrete. We don't want to specify what the implementation is underneath, though. Go and implement it the way you want, because different implementations might have performance issues or other manifestations. We don't want to get into how to specify that. That's up to every vendor to fight out amongst themselves.

WW: Isn't there a predisposition toward what Microsoft and Netscape have already put out there?

Wood: We obviously take into account everything they've done and talk about it. We ask ourselves, "Can we do this in a nice general way?" and "What is what has been done for HTML going to mean in terms of XML?" and "Can we generalize this for XML?" They've spent so much time implementing this, and they have so much customer experience, they can tell us what works and what doesn't. On the other hand, the SGML vendors also have years of experience listening to customers like Boeing saying what they need. We need to make our object model flexible enough to deal with both.

WW: I think people have a hard time grasping what XML is all about and how this sort of thing applies. How do you explain it?

Wood: Part of the problem is that people tend to think of HTML not in structured document terms. If you put H1 tags around some bit of text, that's saying to an application: "This is an H1"; what you're doing is labeling this piece of text so the application knows what this piece of text is and can do whatever is appropriate. For an H1, that could be pulling it out into a table of contents or changing the formatting so it's bold and big.

A lot of people come from the desktop publishing world and they see the start H1 tag and think, "Ah, this turns on formatting." And then the end H1 tag, "This turns off formatting." That makes it difficult to see the point of doing extra stuff, which is what XML is. But if you look at the way a lot of professional Web site designers put a lot of comments through their pages to say, for instance, the author affiliation section starts here, in XML, they would just make up an element for author affiliation and put that in there.

And then they could have some application that could pull out all of the author affiliation infomation from every page they happen to have on their site.


WebConnect Network Click here for Databeam

RELATED STORIES:

Back to Home Page

Keywords: html, standards
Date: 19971110