Speaker 1: Let’s go
Speaker 2: [00:00:30] Think
Speaker 3: About got some moves too. [00:01:00] So this is essentially the sample cell driving computer that runs in your Tesla cars. By the way,
Speaker 2: This
Speaker 1: Is the, is literally the first time the robot has operated without a tether was on stage tonight.
Speaker 2: [00:01:30] So,
Speaker 1: So, um, so the robot can actually do a lot more than we just showed [00:02:00] you. We just didn’t want it to fall on its face. Uh,
Speaker 3: Yeah, we wanted to show a little bit more what we’ve done over the past few months with the bud and just walking around and dancing on stage. Uh, just humble beginnings, but, uh, you can see the autopilot neural networks running as it is just retrained for the bud, uh, directly on that, on that new platform. That’s my watering
Speaker 1: Can. Yeah, when you, when you see a rendered view that’s, that’s the robot, what’s the, that’s the world the robot sees. So it’s, it’s very clearly [00:02:30] identifying objects that, like this is the object it should pick up. Picking it up. Um, yeah,
Speaker 3: We use the same process as we did for PI to connect data and train neural networks that we then deploy on the robot. That’s an example that illustrates the upper body a little bit more.
Speaker 1: And we actually have, uh, an optimist bot with, uh, fully Tesla designed and both actuators, um, [00:03:00] battery pack, uh, control system, everything. Um, it, it, it wasn’t quite ready to walk, uh, but it, I think it will walk in a few weeks. Um, but we wanted to show you the, the robot, uh, the, the, the, something that’s actually fairly close to what we’ll go into production and, um, and show you all all the things it can do. So let’s bring it up. [00:03:30] So here you’re using, uh, optimist with, uh, these, these are the,
Speaker 1: With the degrees of freedom that we expect to have in optimist production unit one, uh, which is the ability to move, uh, all the fingers independently, uh, move the, uh, to have the thumb have, uh, two degrees of [00:04:00] freedom. Uh, so it has opposable thumbs and, uh, both left and right hand, so it’s able to operate tools and do useful things. Our goal is to make, um, uh, a useful humanoid robot as quickly as possible. And, uh, we’ve also designed it using the same discipline that we use in designing the car, which is to say to, to design it for manufacturing, uh, such that it’s possible to make the robot at in, in high volume, uh, at low cost, uh, with higher reliability. [00:04:30] And I, it, it is expected to cost much less than a car. I just bring it quickly to the right here. Uh, I would say probably less than $20,000 would be my guess guest. The, the potential for optimist is I think appreciated by very few people.
Speaker 4: Hey,
Speaker 1: As usual, Tesla demos are coming in hot, [00:05:00] so,
Speaker 5: Okay, that’s good. That’s
Speaker 1: Good. Um, yeah. Uh, the, the, um, the team’s put put in the team has put in an incredible amount of work, uh, working days, you know, seven days a week, burning the 3:00 AM oil that to, to get to the demonstration today. Um, super proud of what they’ve done is they’ve really done, done a great job. I just like to give a hand to the whole optimist team.
Speaker 4: [00:05:30] We’re gonna show you how we deterministically solve interventions via data and walk you through the life of this particular clip. In this scenario, autopilot is approaching a turn and incorrectly predicts that crossing vehicle as stopped for traffic and thus a vehicle that we would slow down for. In reality, there’s nobody in the car, it’s just awkwardly parked. We’ve built this tooling to identify the mis predictions, correct the label and categorize this clip into an evaluation [00:06:00] set. This particular clip happens to be one of 126 that we’ve diagnosed as challenging parked cars at turns. Because of this infra, we can curate this evaluation set without any engineering resources custom to this particular challenge case to actually solve that challenge. Case requires mining thousands of examples like it, and it’s something Tesla can trivially do. We simply use our data sourcing infra request data and [00:06:30] use the tooling shown previously to correct the labels by surgically targeting the mis predictions of the current model.
Speaker 4: We’re only adding the most valuable examples to our training set. We surgically fix 13,900 clips, and, uh, because those were examples where the current model struggles, we don’t even need to change the MO model architecture. A simple weight update with this new valuable data is enough to solve the challenge case. So you see, we no longer predict [00:07:00] that crossing vehicle as stopped, as shown in orange, but parked as shown in red. In academia, we often see that people keep data constant, but at Tesla it’s very much the opposite. We see time and time and again, that data is one of the best, if not the most deterministic lever to solving these interventions. We just showed you the data engine loop for one challenge case, namely these parked cars at turns. But there are many challenge cases even for one signal of vehicle [00:07:30] movement. We apply this data engine loop to every single challenge case we’ve diagnosed, whether it’s buses, curvy roads stopped, vehicles, parking lots, and we don’t just add data once we do this again and again to perfect the semantic. In fact, this year we updated our vehicle movement signal five times, and with every weight update trained on the new data, we push our vehicle movement accuracy up and up. This data engine framework applies to all our [00:08:00] signals, whether they’re 3D multicam video, whether the data is human labeled, auto labeled or simulated, whether it’s an offline model or an online model model. And Tesla’s able to do this at scale because of the fleet advantage, the infer that our ENG team has built and the labeling resources that feed our networks.
Speaker 6: Now, last year we introduced only a couple of components of our system, the custom D one D and the training tile, but we tease the exon as our end goal. [00:08:30] We’ll walk through the remaining parts of our system that are required to build out this exit pod. Now, the system tray is a key part of realizing our vision of a single accelerator. It enables us to seamless seamlessly connect tiles together, not only within the cabinet, but between cabinets. We can connect these tiles at very tight spacing across entire accelerator, and this is how we achieve our uniform communication. [00:09:00] Next, we need to feed data to the training tiles. This is where we’ve developed the Dojo interface processor. It provides our system with high bandwidth DRAM to stage our training data, and it provides full memory bandwidth to our training tiles using ttp, our custom protocol that we can use to communicate across our entire accelerator. It also has high speed ethernet that helps us extend this custom protocol over a standard ethernet, [00:09:30] and we provide native hardware support for this with little to no software overhead. And lastly, we can connect, connect to it through a standard gen four P C I E interface.
Speaker 6: Now we pair 20 of these cards per tray, and that gives us 640 gigabytes of high bandwidth dram. And this provides our disaggregated memory layer for our training tiles. Now we actually integrate the host directly underneath [00:10:00] our system tray. These hosts provide our ingest processing and connect to our interface processors through P C I E. These hosts can provide hardware, video, decoders support for video based training, and our user applications land on these hosts that we, so we, we can provide them with the standard X 86 Linux environment. Now we can put two of these assemblies into [00:10:30] one cabinet and pair it with redundant power supplies that do direct conversion of three phase four 80 volt AC power to 52 volt DC power. Now, by focusing on density at every level, we can realize the vision of a single accelerator. Now starting with the uniform nodes on our custom D one dye, we can connect them together in our fully integrated training tile, [00:11:00] and then finally seamlessly connecting them across cabinet boundaries to form our Dojo accelerator. And together we can house two full accelerators in our exit pod for a combined one exit flop of ML compute.
Speaker 7: The first Exito is part of a total of seven ex exit pods that we plan to build in Palo Alto right here across the wall. And we have a display cabinet from one of these exo [00:11:30] pods for everyone to look at. Six tiles densely packed on a tray, 54 PDO flops of compute, six 40 gigabytes of high bandwidth memory with power and host defeated.
Speaker 1: I think we want to have, um, really fun versions of optimist, um, and uh, so that opt optimist can both be u utilitarian and do tasks, but can also be kind of like a friend, um, and a buddy [00:12:00] and, and, um, hang out with you. And I’m sure people will think of all sorts of creative uses for this robot. I think the mission effectively does, does somewhat broaden with the advent of optimist, uh, to, uh, you know, I don’t know, making the future. Awesome. So, you know, I think you look at optimist and um, I know about you, but I, I’m excited to see what optimist will become. And you know, this is like, you know, if, if you could, I [00:12:30] mean, we can tell like any given technology, if are you, do you want to see what it’s like in a year, two years, three years, four years, five years, 10?
Speaker 1: I’d say for sure. You definitely wanna see what what’s happened with Optimist. Um, whereas, you know, a bunch of other technologies are, you know, sort of plateaued. Um, don’t name names here, but, uh,
Speaker 1: Um, so I don’t know. I’m like, I’d say probably within three years, not more than five years, within three to five years, you could probably receive an optimist. I mean, think about it, What, what drives any vehicle? It’s, um, a biological neural net, uh, with, uh, with eyes, uh, with cameras essentially. So [00:14:00] if, um, and, and really, what, what is your, your, your primary sensors are, uh, two, uh, cameras on a slow gimble, a very slow gimble. Um, that’s, uh, that’s your head. Uh, so if, if, um, you know, if a biological neural net with, uh, two cameras on a slow gimble can drive a semi truck, then um, if you’ve got like eight cameras with continuous 360 degree vision, uh, operating at a higher frame rate and much higher reaction rate, um, then I think [00:14:30] it is obvious that you should be able to drive a semi or any, any vehicle much better than human.