Temporal patterns and ensemble learning for environmental sound recognition
thesisposted on 23.05.2021, 11:48 by Wenjun Yang
This thesis explores features characterizing the temporal dynamics and the use of ensemble techniques to improve the performances of environmental sound recognition (ESR) system. Firstly, for acoustic scene classification (ASC), local binary pattern (LBP) technique is applied to extract the temporal evolution of Mel-frequency cepstral coefficients (MFCC) features, and the D3C ensemble classifier is adopted to optimize the system performance. The results show that the proposed method achieved a classification improvement of 8% compared to the baseline system. Secondly, a new approach for sound event detection (SED) using Nonnegative Matrix Factor 2- D Deconvolution (NMF2D) and RUSBoost techniques is presented. The idea is to capture the two dimensional joint spectral and temporal information from the time-frequency representation (TFR) while possibly separating the sound mixture into several sources. Besides, the RUSBoost ensemble technique is utilized in the event detection process to alleviate class imbalance in the training data. This method reduced the total error rate by 5% compared to the baseline method.