HomeElectronics NewsCan Large Language Models Help Robots To Navigate?

Can Large Language Models Help Robots To Navigate?

MIT and MIT-IBM Watson AI Lab researchers have created a navigation method that converts visual inputs into text to guide robots through tasks using a language model.

A new navigation method uses language-based inputs to direct a robot through a multistep navigation task like doing laundry.
Credits:Credit: iStock
A new navigation method uses language-based inputs to direct a robot through a multistep navigation task like doing laundry.
Credits:Credit: iStock

Someday, you might want a home robot to carry laundry to the basement, a task requiring it to combine verbal instructions with visual cues. However, this is challenging for AI agents as current systems need multiple complex machine-learning models and extensive visual data, which are hard to obtain.

- Advertisement -

Researchers from MIT and the MIT-IBM Watson AI Lab have developed a navigation method that translates visual inputs into text descriptions. A large language model then processes these descriptions to guide a robot through multistep tasks. This approach, which uses text captions instead of computationally intensive visual representations, allows the model to generate extensive synthetic training data efficiently. 

Solving a vision problem with language

Researchers have developed a navigation method for robots using a simple captioning model that translates visual observations into text descriptions. These descriptions, along with verbal instructions, are input into a large language model, which then decides the robot’s next step. After each step, the model generates a scene caption to help update the robot’s trajectory, continually guiding it towards its goal. The information is standardized in templates, presenting it as a series of choices based on the surroundings, like choosing to move towards a door or an office, streamlining the decision-making process.

Advantages of language

When tested, this language-based navigation approach didn’t outperform vision-based methods but offered distinct advantages. It uses fewer resources, allowing for rapid synthetic data generation—for instance, creating 10,000 synthetic trajectories from only 10 real-world ones. Also, its use of natural language makes the system more understandable to humans and versatile across different tasks, using a single type of input. However, it does lose some information that vision-based models capture, like depth. Surprisingly, combining this language-based approach with vision-based methods improves navigation capabilities.

- Advertisement -

Researchers aim to enhance their method by developing a navigation-focused captioner and exploring how large language models can demonstrate spatial awareness to improve navigation.

Nidhi Agarwal
Nidhi Agarwal
Nidhi Agarwal is a Senior Technology Journalist at Electronics For You, specialising in embedded systems, development boards, and IoT cloud solutions. With a Master’s degree in Signal Processing, she combines strong technical knowledge with hands-on industry experience to deliver clear, insightful, and application-focused content. Nidhi began her career in engineering roles, working as a Product Engineer at Makerdemy, where she gained practical exposure to IoT systems, development platforms, and real-world implementation challenges. She has also worked as an IoT intern and robotics developer, building a solid foundation in hardware-software integration and emerging technologies. Before transitioning fully into technology journalism, she spent several years in academia as an Assistant Professor and Lecturer, teaching electronics and related subjects. This background reflects in her writing, which is structured, easy to understand, and highly educational for both students and professionals. At Electronics For You, Nidhi covers a wide range of topics including embedded development, cloud-connected devices, and next-generation electronics platforms. Her work focuses on simplifying complex technologies while maintaining technical accuracy, helping engineers, developers, and learners stay updated in a rapidly evolving ecosystem.

SHARE YOUR THOUGHTS & COMMENTS

EFY Prime

Unique DIY Projects

Electronics News

Truly Innovative Electronics

Latest DIY Videos

Electronics Components

Electronics Jobs

Calculators For Electronics