返回

余关元, Li, Q., Wang, J., Zhang, D., & Liu, Y. (2020). A multimodal generative and fusion framework for recognizing faculty homepages. Information Sciences, 525, 205-220.

Multimodal data consist of several data modes, where each mode is a group of similar data sharing the same attributes. Recognizing faculty homepages is essentially a multimodal classification problem in which a target faculty homepage is determined from three different information sources, including text, images, and layout. Conventional strategies in previous studies have been either to concatenate features from various information sources into a compound vector or to input them separately into several different classifiers that are then assembled into a stronger classifier for the final prediction. However, both approaches ignore the connections among different feature sets. We argue that such relations are essential to enhance multimodal classification. Besides, recognizing faculty homepages is a class imbalance problem in which the total number of samples of a minority class is far smaller than the sample numbers of other classes. In this study, we propose a multimodal generative and fusion framework for multimodal learning with the problems of imbalanced data and mutually dependent feature modes. Specifically, a multimodal generative adversarial network is first introduced to rebalance the dataset by generating pseudo features based on each mode and combining them to describe a fake sample. Then, a gated fusion network with the gate and fusion mechanisms is presented to reduce the noise to improve the generalization ability and capture the links among the different feature modes. Experiments on a faculty homepage dataset show the superiority of the proposed framework.