How Google’s AI is Enabling Robots to Follow Simple Commands
The Day When A Robot Cleans Your House Is Getting Closer
Imagine having the ability to simply tell a robot to “clean my room” and watch as it springs into action, seamlessly comprehending the tasks required. This vision of the future is closer than ever thanks to groundbreaking artificial intelligence called RT2.
RT2 is Google’s latest AI system and is short for Robotics Transformer 2. It represents an enormous leap forward in allowing robots to understand natural human language and convert it into physical actions.
Developed by Google, RT2 is an example of what is known as a Vision Language Action (VLA) model. VLA models utilise a combination of text, images, and robot action data to enable robots to carry out commands expressed in ordinary language, even if the specific task is entirely new to the robot. This differs drastically from previous robots that required highly detailed programming for individual tasks and could not adapt well to novel situations or commands. With RT2’s more flexible approach, you can give a robot an high-level instruction like “throw away the trash in the kitchen,” and it will simply figure out how to complete the task successfully.
RT2 is composed of two core components working in tandem: a Vision Language Model (VLM) and the VLA model itself. The VLM is first trained on massive amounts of text and images from the internet to build a deep understanding of objects, environments, and how they relate. This mirrors the way humans accumulate knowledge from diverse sources like books, videos, and life experiences. The VLA model then builds upon the capabilities of the VLM by adding in data from real-world robots, including camera images, commands, and robot actions. This enables the VLA model to make the leap from passive understanding to directing robotic control.
At the heart of the RT2 system is a technique called VLM transformation. This is the process that converts the knowledge gained from the internet into concrete robot actions and directives. Even without explicit training on certain tasks, this allows RT2 to perform activities like room cleaning that it understands conceptually from text and images online.
In demonstrations, Google has shown RT2 capable of performing several multi-step household activities.
- It can reliably sort various types of trash like food wrappers, cups, and bottles into the proper bins.
- It can differentiate between objects based on language descriptions, choosing the “extinct animal” dinosaur when asked versus a mythical creature dragon.
- RT2 can also complete tasks that involve reasoning.
Importantly, RT2 demonstrates an adaptability to novel situations that wasn’t in its predecessor model, RT1, such as avoiding obstacles when commands don’t take account of the environment or terrain. In fact, RT2 represents major progress over RT1, which was restricted to only performing physical tasks it had been explicitly trained on before. In comparative testing, RT2 achieved far higher scores than earlier models.
The immense promise of RT2 arrives at an opportune moment, as the industrial robotics industry continues its rapid expansion. Valued at $44.6 billion globally in 2020, this market is forecast to grow at a rate of 9.4% annually through 2028. RT2 paves the way for accelerated automation across sectors like manufacturing, logistics, and healthcare.
However, integrating RT2-enabled robots more fully into our lives also surfaces important concerns regarding human-robot trust and safety. How can we guarantee robots consistently follow established rules and pose no threat to humans even in unpredictable circumstances? Heavy responsibility lies with RT2’s developers to ensure it aligns with ethical and societal norms. Addressing these challenges will be key as AI like RT2 ushers in a new era of intelligent, responsive robotics. But if done thoughtfully, this technology has the potential to greatly enhance human productivity and quality of life.
Further Reading
Structuring Conversations for Optimal ChatGPT Output
Here’s an excerpt from chapter 3 of my beginner’s guide to ChatGPT:
When interacting with ChatGPT, the way you structure your conversation plays a crucial role in obtaining the desired output. To maximise the effectiveness of your interactions, consider the following guidelines:
1. Clear and Concise Prompts: Start your conversation with a clear and concise prompt that conveys your intent or question. Providing specific details or context can help guide the model’s responses and ensure relevant output.
2. Break Down Complex Questions: If you have a complex question or request, consider breaking it down into smaller parts. Presenting the AI with a series of simpler prompts often yields better results, as it allows the model to tackle one aspect at a time.
3. Use System Messages: System messages are instructions or guidelines you can include in your conversation to guide the model’s behaviour. By utilising system messages, you can nudge the model to adopt a specific role or writing style, providing additional context for generating responses.
The Beginner’s Guide to ChatGPT is written in an easy to understand and non-technical (as much as I could make it) format. It’s intended for anyone who wants to use ChatGPT, is not super technical, and wants to know the basics of how to go about it.
The book is free to all WiserPLUS! members.
Originally published at https://rickhuckstep.com on August 3, 2023.