Detection of Overlapped Acoustic Events using Fusion of Audio and Video Modalities

Taras Butko, Climent Nadeu

Abstract: Acoustic event detection (AED) may help to describe acoustic scenes, provide a better (smart) service to the users of a meeting room, and contribute to improve the robustness of speech technologies. Even if the number of considered events is not large, that detection becomes a difficult task in scenarios where the AEs are produced rather spontaneously and often overlap in time with speech. In this work, the fusion of audio and video information at either feature or decision levels is performed, and the results are compared for different levels of signal overlaps. The best improvement was obtained using the feature-level fusion technique. Furthermore, we report a significant improvement of the recognition rate in the conditions where the AEs are overlapped with loud speech, mainly due to the fact that the video modality remains unaffected.

Index Terms: Acoustic Event detection, Multimodal Fusion, Fuzzy Integral, Acoustic Localization.

Full Paper