Journal
PATTERN RECOGNITION LETTERS
Volume 88, Issue -, Pages 49-56Publisher
ELSEVIER
DOI: 10.1016/j.patrec.2017.01.013
Keywords
Audio classification; Texture; Image processing; Acoustic features; Ensemble of classifiers; Pattern recognition
Categories
Ask authors/readers for more resources
In this paper a novel and effective approach for automated audio classification is presented that is based on the fusion of different sets of features, both visual and acoustic. A number of different acoustic and visual features of sounds are evaluated and compared. These features are then fused in an ensemble that produces better classification accuracy than other state-of-the-art approaches. The visual features of sounds are built starting from the audio file and are taken from images constructed from different spectrograms, a gammatonegram, and a rhythm image. These images are divided into sub windows from which a set of texture descriptors are extracted. For each feature descriptor a different Support Vector Machine (SVM) is trained. The SVMs outputs are summed for a final decision. The proposed ensemble is evaluated on three well-known databases of music genre classification (the Latin Music Database, the ISMIR 2004 database, and the GTZAN genre collection), a dataset of Bird vocalization aiming specie recognition, and a dataset of right whale calls aiming whale detection. The MAT LAB code for the ensemble of classifiers and for the extraction of the features will be publicly available (https://www.deLunipclit/node/2357 +Pattern Recognition and Ensemble Classifiers). (C) 2017 Elsevier B.V. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available