Saturday, July 27, 2024

Open-Source Model For Robot Manipulation

- Advertisement -

Researchers from UC Berkeley, Stanford, and Carnegie Mellon have launched Octo, an open-source generalist model for robotic manipulation.

These are the robots we tested Octo on – you can see that there is a wide range of different robot arms, from small to large, single arm to bimanual. Octo was able to control all these robots. Credit: Team et al.
These are the robots we tested Octo on – you can see that there is a wide range of different robot arms, from small to large, single arm to bimanual. Octo was able to control all these robots. Credit: Team et al.

The release of ChatGPT and similar large language models (LLMs) has enabled developers globally to use these models to improve their systems’ interactivity. However, comparable models for robotic manipulation are still limited.

Researchers from the University of California, Berkeley, Stanford University, and Carnegie Mellon University have introduced Octo, an open-source generalist model for robotic manipulation. Published in a preprint on the arXiv server, this model could enable various robotic systems to effectively handle diverse objects, potentially paving the way for robots capable of performing a broad spectrum of manual tasks.

- Advertisement -

Octo is characterized as a “generalist” robot model by its creators. It is a type of neural network that can control various robots to execute tasks like ‘pick up the spoon,’ ‘close the drawer,’ and ‘wipe the table.’ They highlight the significance of Octo’s ability to function across multiple robotic platforms. This versatility is essential, they explain, because research labs worldwide employ different types of robots. Ensuring that Octo can be used universally is vital for its adoption by the global research community.

The recent project undertaken by the research team had two primary goals. The first was to develop a versatile generalist robotics model suitable for a range of robots, and the second was to produce open-source code to enable other researchers to develop similar models in the future.

In the technology research and development community, computational tools with high performance that can be utilized across various systems are commonly known as foundational models. ChatGPT, for instance, is an example of a model that provides natural language processing (NLP) capabilities to various agents and systems.

Octo, created by the research team, utilizes transformers—the same neural network technology behind ChatGPT. Octo’s standout features include its training on the most extensive dataset of robotic manipulation trajectories ever assembled, the Open X-Embodiment dataset, and its ability to process a wide array of sensory inputs. These inputs range from various image types and robot joint readings to language instructions and goal-specific images, enhancing its versatility.

Reference: Dibya Ghosh et al, Octo: An Open-Source Generalist Robot Policy, arXiv (2024). DOI: 10.48550/arxiv.2405.12213

Nidhi Agarwal
Nidhi Agarwal
Nidhi Agarwal is a journalist at EFY. She is an Electronics and Communication Engineer with over five years of academic experience. Her expertise lies in working with development boards and IoT cloud. She enjoys writing as it enables her to share her knowledge and insights related to electronics, with like-minded techies.

SHARE YOUR THOUGHTS & COMMENTS

Unique DIY Projects

Electronics News

Truly Innovative Tech

MOst Popular Videos

Electronics Components

Calculators