4.5 Article

Appearance and shape-based hybrid visual feature extraction: toward audio-visual automatic speech recognition

Journal

SIGNAL IMAGE AND VIDEO PROCESSING
Volume 15, Issue 1, Pages 25-32

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s11760-020-01717-0

Keywords

AV-ASR; Appearance and shape-based hybrid visual speech features; LBP-TOP; DCT; PZM; Hybrid classifier (classifier combination)

Ask authors/readers for more resources

This research introduces a new set of hybrid visual features that combine shape-based and appearance-based features to enhance the performance of visual speech recognition systems. By calculating features such as Pseudo-Zernike Moment, Local Binary Pattern-three orthogonal planes, and Discrete Cosine Transform, the goal is to embed global and local visual information into a compact feature set.
Nowadays, audio-visual automatic speech recognition (AV-ASR) is an emerging field of research, but there is still lack of proper visual features for visual speech recognition. Visual features are mainly categorized into shape based and appearance based. Based on the different information embedded in shape and appearance features, this paper proposes a new set of hybrid visual features which lead to a better visual speech recognition system. Pseudo-Zernike Moment (PZM) is calculated for shape-based visual feature while Local Bnary Pattern-three orthogonal planes (LBP-TOP) and Discrete Cosine Transform (DCT) are calculated for the appearance-based feature. Moreover, our proposed method also gathers global and local visual information. Thus, the objective of the proposed system is to embed all this visual information into a compact features set. Here, for audio speech recognition, the proposed system uses Mel-frequency cepstral coefficients (MFCC). We also propose a hybrid classification method to carry out all the experiments of AV-ASR. Artificial Neural Network (ANN), multiclass Support Vector Machine (SVM) and Naive Bayes (NB) classifiers are used for classifier hybridization. It is shown that the proposed AV-ASR system with a hybrid classifier significantly improves the recognition rate.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available