Under the background of globalization, along with the rapid development of internet technology as well as multimedia technology, bilingual multi-modal dialogue system has become one of the most popular services, which has inestimable prospect. Multilingual multi-modal dialogue system is mainly composed of media server and application server. Media server provides kinds of multimedia engines such as machine translation, speech recognition, speech synthesis, and session management. Application server is the core of the system and also plays a pivotal role that provids control commands and service logic for the system. This article takes this as the background. It analyses the present status of multi-modal system, introduces the specific demand for multilingual speech, and gives a specific network scheme based on IMS server construction. Meanwhile, as signal control protocol for multimedia session developed by IETF, SIP (Session Initiation Protocol) gets more preponderant and it is simple, flexible and scalable. So the system uses SIP protocol. The system adopts a distributed architecture with a high degree of flexibility and scalability. This article focuses on the achievement of the sip application server, using the MSCML, STML and the other xml extensible markup languages in order to control the machine translation, speech recognition and synthesis and session management engines. Designed and implemented SDML markup language and written XML parser for different XML languages. SIP stack achieved based on Asterisk. This system follows SIP protocol (RFC3261) and its associated drafts, makes the suitable expansion according to the needs and with good versatility. 【keywords】 multi-modal, sip, xml, application server
修改评论