Book chapter
Audio-Visual Speech Processing for Human Computer Interaction
Advances in Robotics and Virtual Reality, pp.135-165
Intelligent Systems Reference Library, 26, Springer-Verlag
2012
Abstract
This chapter presents an audio-visual speech recognition (AVSR) for Human Computer Interaction (HCI) that mainly focuses on 3 modules: (i) the radial basis function neural network (RBF-NN) voice activity detection (VAD) (ii) the watershed lips detection and H∞ lips tracking and (iii) the multi-stream audio-visual back-end processing. The importance of the AVSR as the pipeline for the HCI and the background studies of the respective modules are first discussed follow by the design details of the overall proposed AVSR system. Compared to the conventional lips detection approach which needs a prerequisite skin/non-skin detection and face localization, the proposed watershed lips detection with the aid of H∞ lips tracking approach provides a potentially time saving direct lips detection technique, rendering the preliminary criterion obsolete. Alternatively, with a better noise compensation and a more precise speech localization offered by the proposed RBF-NN VAD compared to the conventional zero-crossing rate and short-term signal energy, it has yield to a higher performance capability for the recognition process through the audio modality. Lastly, the developed AVSR system which integrates the audio and visual information, as well the temporal synchrony audiovisual data stream has proved to obtain a significant improvement compared to the unimodal speech recognition, also the decision and feature integration approaches.
Details
- Title
- Audio-Visual Speech Processing for Human Computer Interaction
- Authors
- Siew Wen Chin (Author) - University of Nottingham Malaysia CampusK P Seng (Author) - University of Nottingham Malaysia CampusLi-Minn Ang (Author) - University of Nottingham Malaysia Campus
- Contributors
- Tauseef Gulrez (Editor)Aboul Ella Hassanien (Editor)
- Publication details
- Advances in Robotics and Virtual Reality, pp.135-165
- Series
- Intelligent Systems Reference Library; 26
- Publisher
- Springer-Verlag
- Date published
- 2012
- DOI
- 10.1007/978-3-642-23363-0_6; 10.1007/978-3-642-23363-0
- ISBN
- 9783642233630
- Organisation Unit
- University of the Sunshine Coast, Queensland; School of Science, Technology and Engineering; Engage Research Lab
- Language
- English
- Record Identifier
- 99513801302621
- Output Type
- Book chapter
Metrics
48 Record Views