Microsoft has announced several new lines of support for open-standards-based speech technology, including a Speech Server, updated Speech Application Software Development Kit (SASDK), Microsoft Speech Server Beta Program, Early Adopter Program, and specialized training courses. Based upon the Speech Application Language Tags (SALT) specification, the speech server supports unified telephony and multimodal applications. Its key components include Speech Engine Services (Speech Recognition Engine, Prompt Engine, Text-to-Speech Engine) and Telephony Application Services (SALT Interpreter, Media and Speech Manager, SALT Interpreter Controller). With these technology offerings, "customers can use speech to access information from standard telephones and cell phones as well as GUI-based devices like PDAs, Tablet PCs and smart phones. For connectivity into the enterprise telephony infrastructure and call-control functionality, Intel Corp. and Intervoice Inc. will provide a Telephony Interface Manager (TIM) that supports Microsoft Speech Server. The TIM will provide fast and easy integration of the speech server with the Intel NetStructure communications boards, enabling deployment of robust speech processing applications."
From the Announcements
Microsoft Corp. today announced several key milestones for bringing speech technology to the mainstream including the first public beta release of the highly anticipated Microsoft Speech Server, a Windows Server System, and the beta 3 release of the Speech Application Software Development Kit (SASDK). In addition, Microsoft established several new resources to engage with enterprises looking to adopt Microsoft's Speech Application Language Tags (SALT)-based speech offerings. The resources include the Microsoft Speech Server Beta Program; an Early Adopter Program (EAP); and specialized training courses on Speech Server, the SASDK, and voice user interface (VUI) and speech application design. To build a solid foundation for companies interested in becoming partners, Microsoft also today introduced the Microsoft Speech Partner Program.
Microsoft Speech Server Beta v1.0
Designed to run on the Windows Server 2003 operating system, Microsoft Speech Server is the most flexible and integrated platform for delivering low total cost of ownership for speech deployments. Taking advantage of the improved secure architecture and new security-aware features of Windows Server 2003, Microsoft Speech Server includes additional security features to help protect and defend systems, resources and users from potential security threats. Built on SALT, an open industry standard, Microsoft Speech Server extends existing Web markup languages by adding speech recognition and prompt functionality to both telephony and multimodal applications.
For connectivity into the enterprise telephony infrastructure and call-control functionality, Intel Corp. and Intervoice Inc. will provide a Telephony Interface Manager (TIM) that supports Microsoft Speech Server. The TIM will provide fast and easy integration of the speech server with the Intel NetStructure communications boards, enabling deployment of robust speech processing applications. Multimodal applications do not require a TIM.
The following are additional key components of the Microsoft Speech Server:
Speech Engine Services (SES)
- Speech Recognition Engine. This component includes the state-of-the-art Microsoft Speech Recognition Engine for accurately handling users' speech inputs.
- Prompt Engine. The Prompt Engine joins prerecorded prompts from a database and plays them back so that users hear a human voice.
- Text-to-Speech Engine. When prerecorded prompts are unavailable, SpeechWorks' Speechify Text-to-Speech Engine synthesizes audio output from a text string.
Telephony Application Services (TAS)
- SALT Interpreter. This component deals with all the speech interface and presentation logic (input and output). In addition, the SALT Interpreter handles interactions between the speech application and the telephony components of the architecture.
- Media and Speech Manager. The Media and Speech Manager handles requests made by SALT Interpreters to SES for speech recognition and prompt playback, and manages interfaces with the third-party TIM to deliver audio to and from the telephone user.
- SALT Interpreter Controller. The SALT Interpreter Controller manages creation, deletion and resetting of the multiple instances of the SALT Interpreter that are managing dialogs with individual callers.
"Microsoft Speech Server is unique to the marketplace in that it is the only speech server that supports both unified telephony and multimodal applications. By building our speech technology offerings upon the open, industry-standard SALT specification, customers can use speech to access information from standard telephones and cell phones as well as GUI-based devices like PDAs, Tablet PCs and 'smart' phones," said Xuedong Huang, general manger of the Speech Technologies group at Microsoft.
Microsoft Corp. has introduced the Speech Partner Program (SPP), designed to provide additional revenue and profit opportunities for new and existing partners interested in developing, deploying and reselling enterprise-grade speech technology solutions based on Microsoft technologies. Through the new program, Microsoft Speech partners will gain competitive advantage by providing large and medium-sized enterprise customers with new and value-added product and service offerings based on the Microsoft Speech Server. The Microsoft Speech Server, the most recent addition to the Windows Server System family, is designed to enable the delivery of enterprise-grade, mission-critical speech technology solutions for both telephony (voice-only) and multimodal (speech and visual) applications.
From the July 2003 White Paper The Microsoft Speech Server
The Microsoft Speech Server and Toolset address the needs of those exploring interactive voice responsive (IVR) systems for the first time, call center veterans, and IT managers exploring breakthrough speech applications for customers and employees. The solution is well suited to a variety of speech scenarios ranging from traditional call center applications that target customers with phones to newer IT applications targeted at customers or employees with Pocket PC devices. Speech can be used to do these things:
- Bring in lower-cost, automated IVR systems to improve customer service.
- Fix an ineffective touch-tone IVR system (speech has been shown by many third-party studies to increase transaction completion 50 percent to 100 percent).
- Integrate Web applications with call center applications using speech where appropriate.
- Create new multimodal (speech plus graphical user interface) speech applications using speech for filling out forms or navigating through applications in settings where people have a high degree of mobility, such as hospitals, offices, or factories.
- Address the needs of customers and employees with mobile devices such as Tablet PCs and Pocket PCs.
The server-side components of Microsoft Speech Server (MSS) enable telephones, cell phones, desktop computers, and Pocket PC devices to access speech applications. Specifically, the Microsoft Speech Server includes speech recognition, speech synthesis, connectivity interfaces, and call management interfaces for telephones, cell phones, computers, and Pocket PC devices. MSS includes the following components: (1) Speech Engine Services; (2) Telephony Application Services...
Speech Engine Services (SES) provides server-side speech recognition and speech playback services for multimodal and telephony clients using the Microsoft Speech Server. A multimodal client on a Pocket PC device accesses SES directly for both speech recognition and speech playback. Desktop and tablet computers perform speech recognition and speech playback locally. A person using a telephone accesses SES through TAS, which serves as a proxy for a telephone and allows telephony endpoints to use the same application framework and speech services as do Pocket PCs. SES provides clear benefits for end users, call center managers, and IT managers...
Telephony Application Services (TAS) handles the connectivity and call management necessary to support traditional telephones and cell phones connecting to the Microsoft Speech Server. TAS manages a set of SALT interpreters that enable phones and cell phones to communicate with Web applications with embedded speech tags. TAS further brokers the communication link between the telephone system, speech services, and Web server application. TAS works with third-party Telephony Interface Manager (TIM) software for connectivity to phone lines and PBXs. TAS and TIM are not necessary for multimodal endpoints such as Pocket PC devices. TAS serves as the application proxy that enables telephones to use the same kinds of Web applications (with embedded grammars, dialog and prompts) that multimodal clients such as Pocket PCs use. TAS overcomes the limitation that telephones do not understand Web pages: TAS does understand Web pages developed with embedded speech tags for call management, speech recognition, and speech output...
The Microsoft Speech Server and Toolset offer a current generation of call center managers and IT managers the ability to develop and deploy cost-effective telephony and multimodal applications. The future looks bright for speech as a mainstream and pervasive technology.
About Speech Application Language Tags (SALT)
"Speech Application Language Tags (SALT) are a lightweight set of extensions to existing markup languages, in particular HTML and XHTML that enable multimodal and telephony access to information, applications and Web services from PCs, telephones, Tablet PCs and wireless personal digital assistants (PDAs). SALT consists of a small set of XML elements, with associated attributes and Document Object Model (DOM) object properties, events and methods, that apply a speech interface to Web pages. The example below presumes a device that can be clicked with a mouse or tapped with a stylus, and a browser that supports events, simple objects and method calling. A SALT Interpreter interprets HTML, SALT, and script on speech-enabled Web pages. similar to a graphical browser, except that it recognizes only voice commands (rather than mouse clicks or text input). Like graphical browsers, it completes forms on the page and submits information to the Web server..." [from the Glossary]
- Announcement 2003-07-09: "Microsoft Speech Technologies Enable Enterprise Competitive Advantage. Microsoft Releases Speech Server Beta, Enhancements to Speech Application SDK, New Programs for Enterprise Companies."
- "Microsoft Offers New Program for Partners to Capitalize On Growing Speech Industry. Microsoft Speech Partner Program Enables Business Opportunities With Microsoft Speech Server in Enterprise Solutions."
- Microsoft Speech Server (MSS)
- Microsoft Speech Server Frequently Asked Questions
- Microsoft Speech Server Beta Program
- Microsoft Speech Technologies Virtual Press Kit
- "Microsoft Speech Server." White paper. July 2003. "Enabling people to use speech as part of their everyday interactions with software and services whether they are using telephones, mobile devices, or PCs."
- Microsoft speech technologies news
- "Microsoft Pitches Voice Specification. SALT Support Trumps Voice XML as Speech Server Sounds Return of Enterprise Voice." By Ephraim Schwartz. In InfoWorld (July 11, 2003).
- "Microsoft Releases Speech Server Beta." By Stephen Lawson. In Network World (July 10, 2003).
- "Microsoft Begins Broader Test Of Speech Software. Speech Server will compete with VoiceXML standard, which is used by several other vendors." By Aaron Ricadela. In InformationWeek (July 9, 2003).
- "Talking Computers Nearing Reality." By Michael Kanellos. In CNET News.com (July 9, 2003).
- SALT: Speech Application Language Tags (SALT). Version 1.0 Specification.
- See also: W3C Multimodal Interaction Activity
- See also: W3C Voice Browser Activity
- "Speech Application Language Tags (SALT)" - Main reference page.