Version 2 of the Voice XML markup language is all but signed and sealed, but not quite delivered due to a snag in nailing down IP (intellectual property) rights.
According to an industry analyst familiar with the issues discussed at the Voice XML Forum, all the specifications have been agreed upon, but there is a concern still that a future developer using VXML could be sued by a member of the Forum for infringement of IP rights.
“I wouldn’t clarify it as a problem but the members need to clarify the IP situation which makes it easier for everyone,” said Bill Meisel, president of TMA Associates, in Tarzana, Calif.
One solution may be that companies [currently 55 Forum members] might choose to provide license-free use or forego patent rights, Meisel added.
All sources in the speech technology industry see VXML as a boon to the industry because it uses a standard language already familiar to Web developers.
“Voice XML is to voice applications what HTML is to Web applications. HTML displays Web pages and VXML is used to render the voice interface, including dialogs and prompts,” said Steve Ehrlich, vice president of marketing at speech engine vendor Nuance Communications, in Menlo Park, Calif.
Version 2, expected to ship by the end of the year, is in its final development stages, according to the Forum chairman Bill Dykas.
“The biggest change in version 2.0 is the creation of a grammar format standard,” Dykas said.
Up until now, developers creating speech applications used proprietary formats for writing speech grammars. A speech grammar is needed to map a wide range of responses into a narrower range, explained Dykas. For example, in a “yes/no grammar” there may be a dozen ways for a caller to respond in the affirmative to a question including yeah, yes, okay, please, and alright which all can be mapped to Yes. Version 2 of VXML will define a common format so the program has to deal with only a single response.
The second major addition to the standard – the Voice XML Forum is working with the W3C standards body – is the clarification of the call transfer tags.
“If you are inside the VXML application, there needs to be a function that you could call that would trigger a call to get a live operator. The transfer tag is a step toward expanding that full [telephony] application scope,” Dykas said.
The Voice XML Forum works via a network of subcommittees each addressing a particular technology issue. At some point, the Forum decides which technologies from the subcommittees will be included in the next version.
For Version 3.0 and beyond subcommittees are currently working on speech technology components, as for example in telephony: how to manipulate telephone voice mail and load balancing between mechanisms if a large number of calls come in simultaneously.
“If you have a message left on your phone, a red light turns on the phone. Currently, there is no way to do that in VXML,” Dykas said.
Other areas include natural language understanding and multimodal interfaces for handhelds and cellular handsets. For example, in using a multimodal interface, a mobile worker may make a voice request to a database for customers that match a certain set of parameters, but the results will be displayed rather than spoken.
“Eventually VoiceXML needs to meet up with WML (wireless markup language),” Nuance’s Ehrlich said.
As VXML continues to wrap itself around more and more sophisticated features within an application in a standard way, the hope is that speech technology will become an everyday part of most applications.
“Prior to VoiceXML, ISVs needed a team of phone people, speech people, Web people. VXML is almost to the point where all I need is a single developer,” Dykas said.