连续环境中基于语义拓扑图的视觉语言导航推理

doi:10.16451/j.cnki.issn1003-6059.202409007

Abstract
Figure/Table
References
Related Citation (15)

Download: PDF (2177 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract To address the issue of inadequate reasoning ability of existing vision-language navigation methods in continuous environments, a method for semantic topological maps-based reasoning for vision-and-language navigation in continuous environments is proposed. First, regions and objects in the navigation environment are identified through scene understanding auxiliary tasks, and a knowledge base of spatial proximity is constructed. Second, the agent interacts with the environment in real time during the navigation process, collecting location information, encoding visual features and predicting semantic labels of regions and objects. Thereby a semantic topological map is gradually generated. On this basis, an auxiliary reasoning localization strategy is designed. A self-attention mechanism is employed to extract object and region information from navigation instructions, and the spatial proximity knowledge base is combined with semantic topological map to infer and localize objects and regions. The above assists navigation decisions and ensures that the agent navigation trajectory aligns with the instructions. Experimental results on public datasets R2R-CE and RxR-CE demonstrate the proposed method achieves a higher navigation success rate.

Key words： Vision-and-Language Navigation Visual Reasoning Multi-modal Data Embodied Intelligence

Received: 09 May 2024

ZTFLH:

TP391.41

Fund:Doctoral Scientific Research Foundation of Liaoning Technical University(No.21-1027)

Corresponding Authors: XU Ming, Ph.D., associate professor. His research interests include spatiotemporal data mining, deep lear-ning and intelligent transportation.

About author:: XIE Zilong, Master student. His research interests include embodied intelligence and robot navigation.

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	XIE Zilong
	XU Ming

Cite this article:

XIE Zilong,XU Ming. Semantic Topological Maps-Based Reasoning for Vision-and-Language Navigation in Continuous Environments[J]. Pattern Recognition and Artificial Intelligence, 2024, 37(9): 839-849.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202409007 OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2024/V37/I9/839