☆ 4.5 Article

Appearance and shape-based hybrid visual feature extraction: toward audio-visual automatic speech recognition

SIGNAL IMAGE AND VIDEO PROCESSING (2021)

Journal

SIGNAL IMAGE AND VIDEO PROCESSING

Volume 15, Issue 1, Pages 25-32

Publisher

SPRINGER LONDON LTD

DOI: 10.1007/s11760-020-01717-0

Keywords

AV-ASR; Appearance and shape-based hybrid visual speech features; LBP-TOP; DCT; PZM; Hybrid classifier (classifier combination)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This research introduces a new set of hybrid visual features that combine shape-based and appearance-based features to enhance the performance of visual speech recognition systems. By calculating features such as Pseudo-Zernike Moment, Local Binary Pattern-three orthogonal planes, and Discrete Cosine Transform, the goal is to embed global and local visual information into a compact feature set.

Nowadays, audio-visual automatic speech recognition (AV-ASR) is an emerging field of research, but there is still lack of proper visual features for visual speech recognition. Visual features are mainly categorized into shape based and appearance based. Based on the different information embedded in shape and appearance features, this paper proposes a new set of hybrid visual features which lead to a better visual speech recognition system. Pseudo-Zernike Moment (PZM) is calculated for shape-based visual feature while Local Bnary Pattern-three orthogonal planes (LBP-TOP) and Discrete Cosine Transform (DCT) are calculated for the appearance-based feature. Moreover, our proposed method also gathers global and local visual information. Thus, the objective of the proposed system is to embed all this visual information into a compact features set. Here, for audio speech recognition, the proposed system uses Mel-frequency cepstral coefficients (MFCC). We also propose a hybrid classification method to carry out all the experiments of AV-ASR. Artificial Neural Network (ANN), multiclass Support Vector Machine (SVM) and Naive Bayes (NB) classifiers are used for classifier hybridization. It is shown that the proposed AV-ASR system with a hybrid classifier significantly improves the recognition rate.

Appearance and shape-based hybrid visual feature extraction: toward audio-visual automatic speech recognition

Journal

SIGNAL IMAGE AND VIDEO PROCESSING

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Appearance and shape-based hybrid visual feature extraction: toward audio-visual automatic speech recognition

Journal

SIGNAL IMAGE AND VIDEO PROCESSING

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper