A discriminative analysis framework for multi-modal information fusion
thesisposted on 24.05.2021, 09:49 by Lei Gao
Since multi-modal data contain rich information about the semantics presented in the sensory and media data, valid interpretation and integration of multi-modal information is recognized as a central issue for the successful utilization of multimedia in a wide range of applications. Thus, multi-modal information analysis is becoming an increasingly important research topic in the multimedia community. However, the effective integration of multi-modal information is a difficult problem, facing major challenges in the identification and extraction of complementary and discriminatory features, and the impactful fusion of information from multiple channels. In order to address the challenges, in this thesis, we propose a discriminative analysis framework (DAF) for high performance multi-modal information fusion. The proposed framework has two realizations. We first introduce Discriminative Multiple Canonical Correlation Analysis (DMCCA) as the fusion component of the framework. DMCCA is capable of extracting more discriminative characteristics from multi-modal information. We demonstrate that optimal performance by DMCCA can be analytically and graphically verified, and Canonical Correlation Analysis (CCA), Multiple Canonical Correlation Analysis (MCCA) and Discriminative Canonical Correlation Analysis (DCCA) are special cases of DMCCA, thus establishing a unified framework for canonical correlation analysis. To further enhance the performance of discriminative analysis in multi-modal information fusion, Kernel Entropy Component Analysis (KECA) is brought in to analyze the projected vectors in DMCCA space, and thus forming the second realization of the framework. By doing so, not only the discriminative relation is considered in DMCCA space, but also the inherent complementary representation of the input data is revealed by entropy estimation, leading to better utilization of the multi-modal information and better pattern recognition performance. Finally, we implement a prototype of the proposed DAF to demonstrate its performance in handwritten digit recognition, face recognition and human emotion recognition. Extensive experiments show that the proposed framework outperforms the existing methods based on similar principles, clearly demonstrating the generic nature of the framework. Furthermore, this work offers a promising direction to design advanced multi-modal information fusion systems with great potential to impact the development of intelligent human computer interaction systems.