模式识别与人工智能
2025年1月10日 星期五   首 页     期刊简介     编委会     投稿指南     伦理声明     联系我们                                                                English
模式识别与人工智能  2024, Vol. 37 Issue (9): 839-849    DOI: 10.16451/j.cnki.issn1003-6059.202409007
研究与应用 最新目录| 下期目录| 过刊浏览| 高级检索 |
连续环境中基于语义拓扑图的视觉语言导航推理
谢子龙1, 许明1
1.辽宁工程技术大学 软件学院 葫芦岛 125105
Semantic Topological Maps-Based Reasoning for Vision-and-Language Navigation in Continuous Environments
XIE Zilong1, XU Ming1
1. Software College, Liaoning Technical University, Huludao 125105

全文: PDF (2177 KB)   HTML (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 针对现有视觉语言导航方法在连续环境中推理能力不足的问题,提出基于语义拓扑图的视觉语言导航推理模型.首先,通过场景理解辅助任务识别导航环境中的区域和物体,构建空间邻近知识库.然后,智能体在导航过程中与环境实时交互,收集位置信息,编码视觉特征,并预测区域和物体的语义标签,逐步生成语义拓扑图.在此基础上,提出辅助推理定位策略,利用自注意力机制,从导航指令中提取物体信息和区域信息,并结合空间邻近知识库和语义拓扑图,对物体和区域进行推理定位,以辅助导航决策,确保智能体的导航轨迹与指令对齐.最后,在公开数据集R2R-CE和RxR-CE上的实验表明,文中模型的导航成功率较高.
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
谢子龙
许明
关键词 视觉语言导航视觉推理多模态数据具身智能    
Abstract:To address the issue of inadequate reasoning ability of existing vision-language navigation methods in continuous environments, a method for semantic topological maps-based reasoning for vision-and-language navigation in continuous environments is proposed. First, regions and objects in the navigation environment are identified through scene understanding auxiliary tasks, and a knowledge base of spatial proximity is constructed. Second, the agent interacts with the environment in real time during the navigation process, collecting location information, encoding visual features and predicting semantic labels of regions and objects. Thereby a semantic topological map is gradually generated. On this basis, an auxiliary reasoning localization strategy is designed. A self-attention mechanism is employed to extract object and region information from navigation instructions, and the spatial proximity knowledge base is combined with semantic topological map to infer and localize objects and regions. The above assists navigation decisions and ensures that the agent navigation trajectory aligns with the instructions. Experimental results on public datasets R2R-CE and RxR-CE demonstrate the proposed method achieves a higher navigation success rate.
Key wordsVision-and-Language Navigation    Visual Reasoning    Multi-modal Data    Embodied Intelligence   
收稿日期: 2024-05-09     
ZTFLH: TP391.41  
基金资助:辽宁工程技术大学博士科研基金项目(No.21-1027)资助
通讯作者: 许 明,博士,副教授,主要研究方向为时空数据挖掘、深度学习、智能交通.E-mail:xum.2016@tsinghua.org.cn.   
作者简介: 谢子龙,硕士研究生,主要研究方向为具身智能、机器人导航.E-mail:zilong6037@gmail.com.
引用本文:   
谢子龙, 许明. 连续环境中基于语义拓扑图的视觉语言导航推理[J]. 模式识别与人工智能, 2024, 37(9): 839-849. XIE Zilong, XU Ming. Semantic Topological Maps-Based Reasoning for Vision-and-Language Navigation in Continuous Environments. Pattern Recognition and Artificial Intelligence, 2024, 37(9): 839-849.
链接本文:  
http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202409007      或     http://manu46.magtech.com.cn/Jweb_prai/CN/Y2024/V37/I9/839
版权所有 © 《模式识别与人工智能》编辑部
地址:安微省合肥市蜀山湖路350号 电话:0551-65591176 传真:0551-65591176 Email:bjb@iim.ac.cn
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn