Cross-scenario Multi-modal Knowledge Fusion and Knowledge Recommendation Based on a MDR-DKD Model
DOI:
https://doi.org/10.13052/jwe1540-9589.2523Keywords:
Knowledge distillation, cross-scenario multi-modal, feature extraction, knowledge fusion, knowledge recommendationAbstract
With the widespread application of recommendation systems in e-commerce, education, and other fields, the heterogeneity of cross-scenario data and the insufficient integration of multi-modal information such as text, images, and user behavior are becoming increasingly prominent. To achieve cross-scenario multi-modal knowledge fusion and knowledge recommendation, a meta doubly robust-debiasing knowledge distillation (MDR-DKD) model is proposed. This model efficiently extracts universal features cross-scenarios using a small amount of unbiased data through a meta-learning mechanism and optimizes the model by combining knowledge distillation techniques. Finally, combined with the knowledge recommendation module, targeted knowledge recommendation is achieved by calculating the matching degree between user interests and knowledge nodes. The results showed that the multi-modal feature extraction of the model took an average of 18.61 ms, the parameter utilization rate during the feature extraction process was 91.3%, the feature extraction throughput reached 2460 samples/s, and the knowledge recommendation accuracy was 97.84%. This model can effectively extract cross-scenario multi-modal features for accurate knowledge recommendation. The research provides an effective technical path for cross-domain knowledge recommendation, which can promote the implementation of recommendation systems in multi-scenario and multi-modal practical scenarios, and help improve the personalized recommendation experience for users.
Downloads
References
Li J, Si G, Tian P, et al. Overview of indoor scene recognition and representation methods based on multimodal knowledge graphs[J]. Applied Intelligence, 2024, 54(1): 899–923. DOI:10.1007/s10489-023-05235-7.
Liang W, Meo P D, Tang Y, et al. A survey of multi-modal knowledge graphs: Technologies and trends[J]. ACM Computing Surveys, 2024, 56(11): 1–41. DOI: 10.1145/3656579.
Xu N, Gao Y, Liu A A, et al. Multi-modal validation and domain interaction learning for knowledge-based visual question answering[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(11): 6628–6640. DOI: 10.1109/TKDE.2024.3384270.
Lymperaiou M, Stamou G. A survey on knowledge-enhanced multimodal learning[J]. Artificial Intelligence Review, 2024, 57(10): 284–285. DOI: 10.1007/s10462-024-10825-z.
Wang H, Liu J, Duan M, et al. Cross-modal knowledge guided model for abstractive summarization[J]. Complex & Intelligent Systems, 2024, 10(1): 577–594. DOI: 10.1007/s40747-023-01170-9.
Huang X, Ma T, Jia L, et al. An effective multimodal representation and fusion method for multimodal intent recognition[J]. Neurocomputing, 2023, 548(1): 126373–126374. DOI: 10.1016/j.neucom.2023.126373.
Yue T, Mao R, Wang H, et al. KnowleNet: Knowledge fusion network for multimodal sarcasm detection[J]. Information Fusion, 2023, 100(1): 101921–101922. DOI: 10.1016/j.inffus.2023.101921.
Xing C, Lv J, Luo T, et al. Representation and fusion based on knowledge graph in multi-modal semantic communication[J]. IEEE Wireless Communications Letters, 2024, 13(5): 1344–1348. DOI: 10.1109/LWC.2024.3369864.
Ma T, Huang L, Lu Q, Hu S. Kr-gcn: Knowledge-aware reasoning with graph convolution network for explainable recommendation[J]. ACM Transactions on Information Systems, 2023, 41(1): 1–27. DOI: 10.1145/3511019.
Rubel, Kushwaha B P, Miah M H. Decision-making process of knowledge push service to improve organizational performance and efficiency: developing knowledge push algorithm[J]. VINE Journal of Information and Knowledge Management Systems, 2025, 55(3): 604–621. DOI: 10.1108/VJIKMS-08-2022-0280.
Yang Y, Zhang C, Song X, et al. Contextualized knowledge graph embedding for explainable talent training course recommendation[J]. ACM Transactions on Information Systems, 2023, 42(2): 1–27. DOI: 10.1145/3597022.
Feng J, Wang G, Zheng C, et al. Towards bridged vision and language: Learning cross-modal knowledge representation for relation extraction[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 34(1): 561–575. DOI: 10.1109/TCSVT.2023.3284474.
Chen D, Zhang R. Building multimodal knowledge bases with multimodal computational sequences and generative adversarial networks[J]. IEEE Transactions on Multimedia, 2023, 26(1): 2027–2040. DOI: 10.1109/TMM.2023.3291503.
Feng D, He X, Peng Y. MKVSE: Multimodal knowledge enhanced visual-semantic embedding for image-text retrieval[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2023, 19(5): 1–21. DOI: 10.1145/3580501.
Wang J, Wu T, Mao J, et al. A forecasting framework on fusion of spatiotemporal features for multi-station PM2. 5[J]. Expert Systems with Applications, 2024, 238(1): 121951–121952. DOI: 10.1016/j.eswa.2023.121951.
Wang J, Zhang L, Li X, et al. ULSeq-TA: Ultra-long sequence attention fusion transformer accelerator supporting grouped sparse softmax and dual-path sparse LayerNorm[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023, 43(3): 892–905. DOI: 10.1109/TCAD.2023.3329039.
He L, Bai L, Yang X, et al. High-order graph attention network[J]. Information Sciences, 2023, 630(1): 222–234. DOI: 10.1016/j.ins.2023.02.054.
Wang C, Tian R, Hu J, et al. A trend graph attention network for traffic prediction[J]. Information Sciences, 2023, 623(1): 275–292. DOI: 10.1016/j.ins.2022.12.048.
Katkade S N, Bagal V C, Manza R R, et al. Advances in real-time object detection and information retrieval: A review[C]//Artificial Intelligence and Applications. 2023, 1(3): 123–128. DOI: 10.47852/bonviewAIA3202456.
Wu C H, Wang Y, Ma J. Maximal marginal relevance-based recommendation for product customisation[J]. Enterprise Information Systems, 2023, 17(5): 1992018–1992019. DOI: 10.1080/17517575.2021.1992018.

