MULTIMODAL INFORMATION PROCESSING AND BEHAVIOUR GENERATION OF HUMANOID ROBOTS BASED ON PALM 2 MODEL AND MULTIMODAL TRANSFORMER ARCHITECTURE

doi:10.2316/J.2025.206-1233

MULTIMODAL INFORMATION PROCESSING AND BEHAVIOUR GENERATION OF HUMANOID ROBOTS BASED ON PALM 2 MODEL AND MULTIMODAL TRANSFORMER ARCHITECTURE

Jie Fang,∗ Wenliang Ju,∗∗ Feng Xiao,∗ Rubing Huang,∗ and Yanjun Liu∗

References

[1] T. Wang, T. Jin, W. Lin, Y. Lin, H. Liu, T. Yue, Y. Tian,L. Li, Q. Zhang, and C. Lee, Multimodal sensors enabledautonomous soft robotic 18(14), 2024, 9980-9996.
[2] A. Roychoudhury, S. Khorshidi, S. Agrawal, and M. Bennewitz,Perception for humanoid robots. Current Robotics Reports,4(4), 2023, 127–140.
[3] Z. Zhao, Q. Wu, J. Wang, B. Zhang, C. Zhong, and A.A.Zhilenkov, Exploring embodied intelligence in soft robotics: Areview, Biomimetics, 9(4), 2024, 248–264.
[4] Z. Ding, G. Chen, Z. Wang, and L. Sun, Robot graspingand manipulation combining vision and touch, InternationalJournal of Robotics and Automation, 39(3), 2024, 181–194.
[5] M. Andronie, G. L˘az˘aroiu, O.L. Karabolevski, R. S.tef˘anescu,I. Hurloiu, A. Dijm˘arescu, and I. Dijm˘arescu, Remote big datamanagement tools, sensing and computing technologies, andvisual perception and environment mapping algorithms in theInternet of Robotic Things, Electronics, 12(1), 2022, 22–53.
[6] S. Davarzani and M.T. Ejaz, A 2D path-planning performancecomparison of RRT and RRT for unmanned ground vehicle,IAES International Journal of Robotics and Automation, 13(1),2024, 105–112.
[7] S.A. Li, Y.Y. Liu, Y.C. Chen, H.M. Feng, P.K. Shen, andY.C. Wu, Voice interaction recognition design in real-lifescenario mobile robot applications, Applied Sciences, 13(5),2023, 3359–3379.
[8] D.T. Tran, D.H. Truong, H.S. Le, and J.H. Huh, Mobile robot:Automatic speech recognition application for automation andSTEM education, Soft Computing 27(15), 2023, 10789–10805.
[9] T.V. Nguyen, M.H. Do, and J. Jo, Robust-adaptive-behaviorstrategy for human-following robots in unknown environmentsbased on fuzzy inference mechanism, Industrial Robot: TheInternational Journal of Robotics Research and Application,49(6), 2022, 1089–1100.
[10] H.Y. Zhou, Y. Yu, C. Wang, S. Zhang, Y. Gao, J. Pan,J. Shao, G. Lu, K. Zhang, and W. Li, A Transformer-based representation-learning model with uniﬁed processing ofmultimodal input for clinical diagnostics, Nature BiomedicalEngineering, 7(6), 2023, 743–755.
[11] R. Koshy, and S. Elango, Multimodal tweet classiﬁcation indisaster response systems using transformer-based bidirectionalattention model, Neural Computing and Applications, 35(2),2023, 1607–1627.
[12] J. Qian, Z. Jin, Q. Zhang, G. Cai, and B. Liu, A livercancer question-answering system based on next-generationintelligence and the large model med-PaLM 2, InternationalJournal of Computer Science and Information Technology,2(1), 2024, 28–35.
[13] R. Yang, T.F. Tan, W. Lu, A.J. Thirunavukarasu, D.S.W. Ting,and N. Liu, Large language models in health care: Development,applications, and challenges, Health Care Science, 2(4), 2023,255–263.
[14] P. Xu, X. Zhu, and D.A. Clifton, Multimodal learn-ing with transformers: A survey, IEEE Transactionson Pattern Analysis and Machine Intelligence,45(10), 2023, 12113–12132.
[15] X. Han, Y.T. Wang, J.L. Feng, C. Deng, Z.H. Chen, Y.A.Huang, H. Su, L. Hu, and P.W. Hu, A survey of transformer-based multimodal pre-trained models, Neurocomputing, 515,2023, 89–106.
[16] H. Akbari, L. Yuan, R. Qian, W.H. Chuang, S.F. Chang,Y. Cui, and B. Gong, VATT: Transformers for multimodalself-supervised learning from raw video, audio and text,Advances in Neural Information Processing Systems, 34(1),2021, 24206–24221.
[17] A. Jebelli, A. Najaﬁyanfar, A. Mahabadi, and M. C. Yagoub,Fault-tolerant control of a quadrotor despite the complete rotorfailure, International Journal of Robotics and Automation,39(4), 2024, 258–269.
[18] Z. Niu, G. Zhong, and H. Yu, A Review on the AttentionMechanism of Deep Learning, Neurocomputing, 452(1), 2021,48–62.
[19] C. Xue, M. Yu, G. Yan, M. Qin, Y. Liu, and J. Jia, Amulti-modal fusion framework for continuous sign languagerecognition based on multi-layer self-attention mechanism,Journal of Intelligent & Fuzzy Systems, 43(4), 2022, 4303–4316.
[20] M.B. Rashid, M.S. Rahaman, and P. Rivas, Navigatingthe multimodal landscape: A review on integration of textand image data in machine learning architectures, MachineLearning and Knowledge Extraction, 6(3), 2024, 1545–1563.
[21] A. Gandhi, K. Adhvaryu, S. Poria, E. Cambria, and A.Hussain, Multimodal sentiment analysis: A systematic reviewof history, datasets, multimodal fusion methods, applications,challenges, and future directions, Information Fusion, 91(1),2023, 424–444.
[22] A. Staroverov, A.S. Gorodetsky, A.S. Krishtopik, U.A.Izmesteva, D.A. Yudin, A.K. Kovalev, and A.I. Panov, Fine-tuning multimodal transformer models for generating actionsin virtual and real environments, IEEE Access, 11(1), 2023,130548–130559.
[23] B. Wang, Z. Wang, X. Wang, Y. Cao, R.A. Saurous, andY. Kim, Grammar prompting for domain-speciﬁc languagegeneration with large language models, Advances in NeuralInformation Processing Systems, 36(1), 2024, 1–26.
[24] J. Li, T. Tang, W.X. Zhao, J.Y. Nie, and J.R. Wen, Pre-trained language models for text generation: A survey, ACMComputing Surveys, 56(9), 2024, 1–39.
[25] J. Dagdelen, A. Dunn, S. Lee, N. Walker, A.S. Rosen, G.Ceder, K.A. Persson, and A. Jain, Structured informationextraction from scientiﬁc text with large language models,Nature Communications, 15(1), 2024, 1418–1432.
[26] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A.Roberts, P. Barham, H.W. Chung, C. Sutton, S. Gehrmann,and P. Schuh, PaLM: Scaling language modeling with pathways,Journal of Machine Learning Research, 24(240), 2023, 1–113.
[27] H.W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, Y.Li, X. Wang, M. Dehghani, S. Brahma, and A. Webson, Scalinginstruction-ﬁnetuned language models, Journal of MachineLearning Research, 25(70), 2024, 1–53.
[28] F. Ming, F. Gao, K. Liu, and C. Zhao, Cooperative modularreinforcement learning for large discrete action space problem,Neural Networks, 161(1), 2023, 281–296.

Important Links:

Abstract
DOI: 10.2316/J.2025.206-1233
From Journal (206) International Journal of Robotics and Automation - 2025

Go Back