IP Telephony: Toward a Telephony Markup Language. ['XML is CT's Next Big Breakthrough: A Tool for Making the Web into a Framework for Distributed CT and Messaging Apps.']

[This archive copy from: http://www.telecomlibrary.com/db_area/archives_rd/ComputerTelephony/1999/August/ip.telephony.html; please use this canonical source if possible.]

First-generation computer telephony is crippled by integration headaches. Hooking voicemail to a PBX can still be an all-day undertaking (tweaking MF frequencies, fiddling around with ringback cadences, etc.). On the network side, things are only slightly easier, fruit of halfway-manageable standards such as SQL, ODBC, CORBA, Microsoft’s Component Object Model (COM), Enterprise JavaBeans, and other ways of bringing data producers and consumers onto the same page.

Any time you need to get machines talking to each other, it seems an awful lot of work (not to mention a good deal of arcane knowledge) is required. That increases the cost of CT apps, complicates delivery, causes interdepartmental and political problems, and prevents some of the niftiest apps from ever seeing the light of day.

Part of the promise of convergence — second-generation CT — is that it starts making those problems go away. Or it at least puts voice and data on the same pipe, under the same protocol, making both accessible to software. The Web, too, is an incredible win for telephony, providing a ubiquitous infrastructure of readily-accessible, multi-purpose servers that transfer data on demand — both to people, and, in principle, to machines.

The people side is relatively easy to understand and implement. If you’re smart enough to write a unified messaging system, it’s trivial to write the HTML and CGI scripts needed to display the inbox on a Web browser. Much harder to write a client application that automatically grabs data off the Web and does something useful with it — i.e., a program that can browse ComputerTelephony.com, find Richard’s phone number, and stick it into GoldMine.

Yet apps like these — machine-to-machine apps — can really add intelligence to the phone call. Research agents, comparison-shopping engines, supply-chain automation, simple data sources feeding complex data sources, open-standard-based third-party telephony client applications, my messaging system talking to your messaging system ... Indeed, the whole concept of the open organization, in which contact protocols and hierarchy are “published” in machine-readable form, facilitating contact and better customer service, hangs on making machines talk to each other more fluently, with less custom programming, fiddling, debugging, and experimentation.

Ideally, you want a system in which data producers and consumers can exchange information fluently, but still evolve independently. I want to publish my dynamically-assigned IP address on my Web site so you can call me with your VoIP client. You want your app to grab my IP address and dial. I don’t want to consult with you every time I change my home page. You don’t want to debug your application every time my art director makes a font change. What’s a body to do?

Enter XML — a simplified descendant of the enormously complex standardized general markup language (SGML) that’s been used for high-end, highly structured publishing apps for the past 10 years.

Unlike HTML, XML lets you create your own tags (which is what “extensible” means). That’s not to say XML is a replacement for HTML. It isn’t. Although it’s possible that a future version of HTML might be specified in XML, the two markup languages have completely different purposes. HTML tags specify the presentation of your Web information (e.g., <BOLD>917-305-3000</BOLD>), whereas XML tags — which can exist on the same page as HMTL — describe what the content means (e.g., <TAB>917-305-3000</TAB>).

Because XML specifies the semantics of data, independent from presentation, it naturally lends itself to business-to-consumer and business-to-business interactions, but more importantly to intelligent search and navigation applications. Because it’s extensible, XML is really a higher- or meta-level standard, a language that can define other markup languages to facilitate communication and transactions between consenting networked partners in any particular domain, such as e-commerce, astronomy, physics, biology or what-not, provided everyone agrees on the tagging vocabulary. The standard is being rapidly integrated into software products, even from longtime enemies such as Microsoft, Oracle, and IBM. Microsoft in particular has fallen in love with XML, calling it a cornerstone of its BizTalk initiative.

XML and HTML work together nicely on the Web as we now know it — the former handling semantics, the latter, presentation. But XML isn’t rigidly attached to the PC-based, browser-centric Web — any data consumer can be outfitted with an XML parser. That includes PDAs, cell phones with displays, or even such non-display devices as telephony servers.

Indeed, Motorola (and partners) have already engineered version 1.0 of an XML-based standard they call VoxML (www.voxml.com), aimed at helping Webmasters provide IVR access over the phone, as a sideline to Web-based e-commerce. The system prescribes a set of tags identifying nodes, prompts, menus, and other components of an IVR app (think of it as an HTML-like app-gen, in the same sense that Artisoft’s Visual Voice is a Visual Basic-like app-gen). You script your IVR app using those tags, using a word processor to create a nice set of human-readable files. Then you load your tagged pages onto a standard Web server, and they get served out under normal HTTP protocols to voice browsers — VoxML-aware processes running on a telephony server equipped with TTS and speech-rec facilities. When a caller dials into the server, it fires up a browser process that works like a proxy, accessing your Web server for VoxML documents, reading out the prompts, interpreting speech and DTMF commands, and returning results (again using standard browser protocols) to whatever CGI scripts are running in the background.


Several months back, we were sitting around the Computer Telephony offices, writing yet another diatribe about how CT makers have to educate the channel and provide channel-ready solutions. It suddenly occurred to us that what the world needs is an XML vocabulary for a telephony markup language (TML) to handle telephony and unified messaging. Such a standard would let messaging system makers present a single front-end interface that could drive standard, non-XML browsers, but also (via XML tagging) inform XML-aware client apps or ‘Net-linked wired or wireless devices.

Conceivably, TML might grow to embrace related applications, for example, the propagation of corporate directory data (a local extension map, home phone and pager numbers) to telephony servers, VoIP clients or other facilities. TML would facilitate management of multiple devices, unbundle clients and servers, work on intranets and extranets, and open the market to real innovation at numerous points.

It seemed like such a good idea that we were sure somebody, somewhere, was already doing it. But all we could find was VoxML — a cool thing, just not what we were looking for. So we decided to start bruiting the idea around the industry and see who bit.

Around this time, SoloPoint’s President and CEO, Arthur Chang, was in our East Coast Lab demonstrating SoloPoint’s amazing new Teleputing network architecture for unified messaging (among other things). Chang brought up a unified messaging interface running on a laptop, then said, “At the moment this is written in Visual Basic, but in the future we’d like to do it in XML. You know, what this industry needs is something like —”

We finished the sentence for him: “— a telephony markup language! You know anybody who’s doing this?”

Chang didn’t. And he was so taken with the idea, he said that SoloPoint would be willing to join the discussion to develop TML and want to be the first company to incorporate it in its product line. We had a momentary vision of becoming Internet millionaires — but then John “Buzz Kill” Jainschigg realized that as bonafide journalists, we had an ethical obligation to recuse ourselves from direct involvement in the marketplace we report on. So we decided (since it’s so much more pleasant to do paperwork and sit in committee-meetings than to earn millions of dollars) that we would simply nurture TML along, report on progress, encourage companies to participate in the standard-making process, and help with coordination.

We next turned to Dr. Setrag Khoshafian at Technology Deployment International (Santa Clara, CA — 408-330-3400). TDI is doing massive amounts of research into XML and offers a basic service pack that consists of strategic training for XML as well as an analysis of how TDI’s expertise can customize and emphasize XML technologies for your company.

Khoshafian and TDI are members of the XML Schemas Group of the W3C. Schemas are designed to replace document type definitions (DTDs), which are a set of syntax rules for tags that tell computers how to interpret XML vocabularies — which tags you can use in a document, which order they should appear in, which tags can appear inside other ones, which tags have attributes, etc. Originally developed for SGML, a DTD can be part of an XML document, but it’s usually a separate one. Because XML is a system for defining languages and not itself a language, it has no universal DTD the way HTML does. Instead, each industry or scientific discipline that wants to use XML defines its own DTDs.

The new schemas can express rules that DTDs can’t, and are themselves written in XML, which makes it easy to integrate parts of various schemas. Still, DTDs will be around for a while yet.

On a conference call with Chang and Khoshafian, Computer Telephony did some brainstorming as to how to get our telephony markup language initiative going. Here’s the plan as of mid-July:

TDI and SoloPoint will contribute to our initiative by atomizing call and messaging functions in an effort to start formulating a list of possible telephony and messaging tags. There’s also the organizational structure of the language that must be considered, the base types and the enumeration types that need to be listed. Those assist in defining constraints — such as the default values assumed by attributes when those are unspecified in a particular document.

Our little group will hold a working session soon, and with our knowledge and expertise in the various domains, we can come up with a first-order approximation of TML. Then we’ll go through an RFQ process, revisions, you know the drill, and finally come up with something we can show to grown-ups. Meanwhile, if anybody reading this would like to get involved in the great TML crusade, send us an e-mail (John says, “Send Richard an e-mail.”). Like the old Steve Allen song, “this could be the start of something big.” *