A Generalized Model for Multimodal Perception
We develop a graphical model for fusing object recognition results using two different modalities–computer vision and verbal descriptions. In this paper, we specifically focus
on three types of verbal descriptions, namely, egocentric positions, relative positions using a landmark, and numeric constraints.
文档评论