Categories: Technology

Forget Sora, this is the AI video that will blow your mind – and maybe scare you

Humanoid robotic development has for the better part of two decades moved at a snail’s pace but rapid acceleration is underway thanks to a collaboration between Figure AI and OpenAI with the result being the most stunning bit of real humanoid robot video I’ve ever seen.

On Wednesday, startup robotics firm Figure AI released a video update (see below) of its Figure 01 robot running a new Visual Language Model (VLM) that has somehow transformed the bot from a rather uninteresting automaton into a full-fledged sci-fi bot that approaches C-3PO-level capabilities.

In the video, Figure 01 stands behind a table set with a plate, an apple, and a cup. To the left is a drainer. A human stands in front of the robot and asks the robot, “Figure 01, what do you see right now?”

After a few seconds, Figure 01 responds in a remarkably human-sounding voice (there is no face, just an animated light that moves in sync with the voice), detailing everything on the table and the details of the man standing before it.

“That’s cool,” I thought.

Then the man asks, “Hey, can I have something to eat?”

Figure 01 responds, “Sure thing” and then with a dextrous flourish of fluid movement, picks up the apple and hands it to the guy.

“Woah,” I thought.

Next, the man empties some crumpled debris from a bin in front of Figure 01 while asking, “Can you explain why you did what you just did while you pick up this trash?”

Figure 01 wastes no time explaining its reasoning while placing the paper back into the bin. “So, I gave you the apple because it’s the only edible item I could provide you with from the table.”

I thought, “This can’t be real.”

It is, though, at least according to Figure AI.

Speech-to-speech

The company explained in a release that Figure 01 engages in “speech-to-speech” reasoning using OpenAI’s pre-trained multimodal model, VLM, to understand images and texts and relies on an entire voice conversation to craft its responses. This is different than, say, OpenAI’s GPT-4, which focuses on written prompts.

It’s also using what the company calls “learned low-level bimanual manipulation.” The system matches precise image calibrations (down to a pixel level) with its neural network to control movement. “These networks take in onboard images at 10hz, and generate 24-DOF actions (wrist poses and finger joint angles) at 200hz,” Figure AI wrote in a release.

The company claims that every behavior in the video is based on system learning and is not teleoperated, meaning there’s no one behind-the-scenes puppeteering Figure 01.

Without seeing Figure 01 in person, and asking my own questions, it’s hard to verify these claims. There is the possibility that this is not the first time Figure 01 has run through this routine. It could’ve been the 100th time, which might account for its speed and fluidity.

Or maybe this is 100% real and in that case, wow. Just wow.

You might also like

https://www.techradar.com/rss

lance.ulanoff@futurenet.com (Lance Ulanoff)

lance.ulanoff@futurenet.com Lance Ulanoff

Share
Published by
lance.ulanoff@futurenet.com Lance Ulanoff

Recent Posts

Lucky Brit scoops £3.9MILLION National Lottery jackpot in Saturday’s draw – check your tickets now

A LUCKY Brit has scooped a £3.9 million National Lottery jackpot in Saturday's draw.The player…

2 hours ago

The world is fallen | @whatever

https://www.youtube.com/watch?v=JMtPp8o5lWE

3 hours ago

‘That was insane!’ cry Casualty viewers as BBC drama airs shock whistleblower plot twist

CASUALTY viewers have been left stunned after finally learning the identity of the hospital whistleblower.Recent…

4 hours ago

Harrison Butker Backlash BACKFIRES On Woke Mob!

https://www.youtube.com/watch?v=5KTM7qWHEo8

4 hours ago

Kemi Badenoch tells firms to ditch woke political agendas & crack on with the job

BUSINESS Secretary Kemi Badenoch says companies should ditch the wokery and get on with the…

5 hours ago