And researchers at the Toyota Research Institute, Columbia University and MIT have been able to quickly teach robots to do many new tasks with the help of an AI learning technique called imitation learning, plus generative AI. They believe they have found a way to extend the technology propelling generative AI from the realm of text, images, and videos into the domain of robot movements.Â
Many others have taken advantage of generative AI as well. Covariant, a robotics startup that spun off from OpenAI’s now-shuttered robotics research unit, has built a multimodal model called RFM-1. It can accept prompts in the form of text, image, video, robot instructions, or measurements. Generative AI allows the robot to both understand instructions and generate images or videos relating to those tasks.Â
3. More data allows robots to learn more skills
The power of large AI models such as GPT-4 lie in the reams and reams of data hoovered from the internet. But that doesn’t really work for robots, which need data that have been specifically collected for robots. They need physical demonstrations of how washing machines and fridges are opened, dishes picked up, or laundry folded. Right now that data is very scarce, and it takes a long time for humans to collect.
A new initiative kick-started by Google DeepMind, called the Open X-Embodiment Collaboration, aims to change that. Last year, the company partnered with 34 research labs and about 150 researchers to collect data from 22 different robots, including Hello Robot’s Stretch. The resulting data set, which was published in October 2023, consists of robots demonstrating 527 skills, such as picking, pushing, and moving. Â
Early signs show that more data is leading to smarter robots. The researchers built two versions of a model for robots, called RT-X, that could be either run locally on individual labs’ computers or accessed via the web. The larger, web-accessible model was pretrained with internet data to develop a “visual common sense,” or a baseline understanding of the world, from the large language and image models. When the researchers ran the RT-X model on many different robots, they discovered that the robots were able to learn skills 50% more successfully than in the systems each individual lab was developing.
Deeper Learning
Generative AI can turn your most precious memories into photos that never existed
Maria grew up in Barcelona, Spain, in the 1940s. Her first memories of her father are vivid. As a six-year-old, Maria would visit a neighbor’s apartment in her building when she wanted to see him. From there, she could peer through the railings of a balcony into the prison below and try to catch a glimpse of him through the small window of his cell, where he was locked up for opposing the dictatorship of Francisco Franco. There is no photo of Maria on that balcony. But she can now hold something like it: a fake photo—or memory-based reconstruction.