Language VoiceXML


VoiceXML (or simply VXML) stands for Voice Extensible Markup Language. Such as HTML is a markup language for creating distributed textual/visual applications, VoiceXML is an XML based markup language for creating distributed voice applications. VoiceXML is a standard based on XML that allows Web applications and content to be accessed by a phone. You can develop speech-based telephony applications using VoiceXML.

Goals of VoiceXML

VoiceXML’s main goal is to bring the full power of Web development and content delivery to voice response applications, and to free the authors of such applications from low-level programming and resource management. It enables integration of voice services with data services using the familiar client-server paradigm. A voice service is viewed as a sequence of interaction dialogs between a user and an implementation platform. The dialogs are provided by document servers, which may be external to the implementation platform. Document servers maintain overall service logic, perform database and legacy system operations, and produce dialogs. A VoiceXML document specifies each interaction dialog to be conducted by a VoiceXML interpreter. User input affects dialog interpretation and is collected into requests submitted to a document server. The document server replies with another VoiceXML document to continue the user’s session with other dialogs.

VoiceXML is a markup language that:


  • Minimizes client/server interactions by specifying multiple interactions per document.
  • Shields application authors from low-level, and platform-specific details.
  • Separates user interaction code (in VoiceXML) from service logic (e.g. CGI scripts).
  • Promotes service portability across implementation platforms.
  • VoiceXML is a common language for content providers, tool providers, and platform providers.
  • Is easy to use for simple interactions, and yet provides language features to support complex dialogs.

While VoiceXML strives to accommodate the requirements of a majority of voice response services, services with stringent requirements may best be served by dedicated applications that employ a finer level of control.

Scope of VoiceXML

The language describes the human-machine interaction provided by voice response systems, which includes:

Output of synthesized speech (text-to-speech).

  • Output of audio files.
  • Recognition of spoken input.
  • Recognition of DTMF input.
  • Recording of spoken input.
  • Control of dialog flow.
  • Telephony features such as call transfer and disconnect.

The language provides means for collecting character and/or spoken input, assigning the input results to document-defined request variables, and making decisions that affect the interpretation of documents written in the language. A document may be linked to other documents through Universal Resource Identifiers (URIs).


Hello world!

<?xml version="1.0"?>
<vxml version = "2.0" xmlns="">
  <block>Hello world!</block>