Bag-of-Features Methods for Acoustic Event Detection and Classification

Rene Grzeszick, Axel Plinge and Gernot A. Fink
IEEE/ACM Trans. Audio, Speech and Language Processing, 25(6), pages 1242-1252, 2017.

BibTeX

Abstract

The detection and classification of acoustic events in various environments is an important task. Its applications range from video and multimedia analysis to surveillance of humans or even animal life. Several of these tasks require the capability of online processing. Besides many approaches that tackle the task of acoustic event detection, lately methods that are based on the well known Bag-of-Features principle emerged in the field. Features are calculated for all frames in a given window. Then, applying the Bag-of-Features concept, the features are quantized with respect to a learned codebook and a histogram representation is computed. Bag-of-Features approaches are particularly interesting as they can be efficiently computed and applied online. In this paper, the Bag-of-Features principle and various extensions are reviewed, including soft quantization, supervised codebook learning and temporal modeling. Furthermore, Mel and Gammatone frequency cepstral coefficients that originate from psychoacoustic models are used as input features for the Bag-of-Features representation. The possibility of fusing the results of multiple channels in order to improve the robustness is shown. Two different databases are used for the experiments: the DCASE 2013 office live dataset, and the ITC-Irst multichannel dataset.