Elon Musk Live-Streams Test Drive of Tesla FSD V12, World’s First End-to-End AI Autonomous Driving, Trained with 10,000 H100s
8 min readOn August 26th, local time, Elon Musk personally went live for a test drive of Tesla’s FSD Beta V12, which attracted millions of viewers.
Reportedly, FSD Beta V12 is the world’s first Full AI End-to-End autonomous driving system, marking a significant upgrade for Tesla.
In a 45-minute live stream, the FSD Beta V12 system demonstrated a smooth journey throughout the entire drive, effortlessly maneuvering around obstacles and recognizing various road signs.
Elon Musk expressed his excitement:
“The V12 system is entirely AI-driven from start to finish. We didn’t program it, no programmers wrote a single line of code to recognize roads, pedestrians, etc. It’s all neural network.”
Specifically, the C++ code controlling V12 has been reduced by a factor of 10, going from over 20,000 lines to just 2,000 lines.
What’s unique is that 99% of Tesla’s decisions are made by the neural network—visual input and control output, much like the human brain.
Furthermore, its remarkable capabilities were achieved through an enormous amount of “video data” and the support of 10,000 H100s.
However, during the live stream, there was only one minor glitch in V12 where it attempted to run a red light, prompting human intervention. Musk reacted swiftly with an emergency brake.
Musk stated that the FSD Beta V12 is still in the debugging phase, and therefore, the official release date has not been confirmed.
Musk: “We didn’t program it.”
The live stream by Musk commenced from the Tesla headquarters.
Driving the Tesla Model S with one hand, Musk recorded the entire 45-minute journey of V12 using his phone.
Musk randomly selected a destination on the map – Stanford University. Let’s take a look at how V12 guided him to the first destination.
During the drive, Musk mentioned that the buildings and road signs Tesla encountered were unfamiliar. Despite being close to the headquarters, these were new elements for him.
As they reached an intersection, the Tesla autonomously came to a stop and patiently waited for the traffic light to change.
Musk chuckled and quipped, “It’s stopping!”
When the light turned green, the Tesla smoothly executed a left turn.
Musk explained that this entire process was achieved through artificial intelligence and cameras, much like the way our brains function with neural networks and our eyes.
Upon encountering a speed bump, the V12 prompted the car to slow down.
With excitement, Musk emphasized that there was no line of code instructing the Tesla to stop at a stop sign or wait for another vehicle, no “wait x seconds” kind of code. It’s all neural network—there’s nothing else.
As the conversation went on, they arrived at the first destination, Stanford. Musk still had his playful desire to have a bout with Zuckerberg, so he set the second destination to none other than Zuck’s house.
He proceeded to enter Zuckerberg’s home address, letting the V12-powered Tesla lead the way.
On the road, as the Tesla navigated a roundabout, the V12 system once again demonstrated its top-tier capability.
After waiting for the first two cars to move, the Tesla decided to make the turn.
Here, Musk reiterated that the team never programmed the concept of a roundabout. They simply showed the system a bunch of videos about roundabouts.
In essence, the V12 achieved this through extensive video training data.
The FSD AI is now fed massive amounts of video to understand what needs to be done in various situations, rather than individually coding for each road element or scenario.
This allows Tesla to skip hundreds of thousands of lines of code in FSD V12, making it lighter and more flexible, while still functioning in unfamiliar terrain without requiring a data connection.
Musk explained that all of this was achieved on Tesla’s HW3, with inference computing power at about 100 watts. All inference is done locally, without the need for a network connection. This is crucial for safety – you can’t rely on an internet connection to drive safely.
After parking, Musk also discussed Tesla’s frame rate.
“We are running at full frame rate. All eight cameras are shooting at 36 frames per second. The pure AI version is better and faster than the ‘normal software and AI mixed’ version.”
In fact, it could exceed 36 frames per second, but the cameras are capped at 36 fps. We estimate it could be shooting at around 50 frames per second, and real-world conditions basically require around 24 frames per second for FSD V12 to function properly.
Upon reaching Zuckerberg’s home, Musk didn’t exit the car. Instead, he turned around and decided to head to the next destination in Palo Alto.
During the livestream, Musk also demonstrated how FSD V12 autonomously parked the car in the appropriate spot upon arriving at the destination.
“Video Data” Directly Trains AI
Musk has previously stated that FSD V12 will be a level 4 autonomous driving system.
At the end of June this year, Musk announced that Tesla’s autonomous driving FSD V12 version would no longer be a beta.
During the live demonstration, Musk also mentioned the potential challenges in the direction of achieving autonomous driving through training AI with video data.
“Just because there are no lines of code doesn’t mean it’s uncontrollable. Having data alone means you can still have control.”
Firstly, the quality of training videos is crucial, so only videos of skilled drivers are used for training, not those of less skilled drivers.
Currently, there are quite a number of software programs that can determine what data to select and train.
So, the quantity of data doesn’t matter; it’s the quality of the data that is crucial for providing safe autonomous driving technology.
For a car, the software that can run is quite small, but the software for backend training is much larger and more complex.
As a result, the V12 system uses regular software in Python to decide which data to pick from the queue and then determine what is high-quality data, what’s pretty good data.
Once you have an AI model, you can also send these models into the system in a “shadow mode,” and every time it does something that doesn’t match what the user would do, Tesla gets data, which is more valuable than just collecting random data.
With FSD from V12, it can recognize people. Just send a picture and tell the car you’re waiting at the Starbucks. It’ll come find you and take you to your car because it can automatically go to Starbucks and then find the passenger based on the picture.
Musk stated that we’re actually quite excited about the fact that the system has a very rapid virtuous cycle of feedback.
When a human intervenes with the car, that intervention behavior is automatically uploaded, and training and training are combined, and it’s just updating the weights.
In the programming of V12, the Tesla team didn’t include the concept of traffic lights, which was present in the regular stack of V11.
The car’s smooth driving is primarily based on video training.
“We’ve converted over 300,000 lines of C++ from the explicit control stack of V11, and in V12, there’s basically none of that.”
Another interesting challenge is the issue of stopping at stop signs. Humans don’t really come to a complete stop at stop signs; they slow down significantly. Only about 1% of drivers will actually come to a complete stop.
However, regulatory agencies require a full stop, so FSD must be specifically trained to fully stop in these situations.
The only intervention during the live demonstration was because the Model S equipped with the V12 system did not stop at a red light.
To this, Musk stated on the spot that we just need to feed more videos containing a lot of traffic lights, and the problem will be solved.
In the next two weeks, Tesla will be releasing a “Shadow Mode” that runs in the background to monitor driving conditions.
Additionally, Musk mentioned a challenge with the system: how to handle driving conditions with low visibility.
Because Tesla’s headquarters in California never experiences rain, they need driving videos from various weather conditions around the world for training.
Currently, there are 12 beta testers for FSD V12 globally, located in places like New Zealand, Thailand, Norway, and Japan.
He also joined WholeMars’ Space chat for over 10 minutes before the test drive livestream to warm up.
When users noticed him in the chat, they naturally began interviewing him. Musk started talking and discussed his upcoming livestream and related topics with the users.
Firstly, he mentioned that Tesla is about to deploy a GPU cluster consisting of 10,000 H100 chips to train the new version of the FSD system.
Currently, the training process relies heavily on NVIDIA GPUs, with only a small portion utilizing Tesla’s own supercomputer, Dojo.
One of their biggest technical challenges right now is the need for high-speed network connections like Infiniband to parallelize larger computing tasks.
While GPU shortages are a problem, there’s at least hope for improvement. However, the shortage of Infiniband devices, which are even scarcer than GPUs, is a significant issue.
Exchanging data between large-scale computing clusters can be quite difficult.
Musk mentioned that their dependence on NVIDIA will continue for quite some time.
As the conversation continued, Musk naturally shifted into his role as Earth’s spokesperson, stating:
Looking ahead, humans will become a civilization heavily reliant on powerful computing, with 80%-90% of energy consumption devoted to computing.
Thus, improving the energy efficiency of existing computing infrastructure is crucial.
The efficiency of transformers isn’t great, their performance isn’t optimal, and the user latency is too long, requiring further optimization.
GPU energy efficiency isn’t great either, and GPUs like the H100 no longer output images, so calling them GPUs might not be fitting anymore.
Musk also tweeted that the energy efficiency of self-regressive Transfomer-based LLM (Language Model) is extremely poor, not only in training but also in inference. I believe it deviates by several orders of magnitude.
Next, Musk began generating excitement for his upcoming livestream, discussing his upcoming demonstration of FSD V12 on-road.
He repeatedly emphasized that with the new version of FSD, Tesla’s vehicles will provide an incredibly smooth riding experience. The system is designed to simulate skilled drivers, avoiding the occasional novice-like behavior that was present in previous versions.
As the conversation reached its conclusion, he left with a parting remark, urging everyone to quickly tune in to his imminent livestream demonstration of autonomous driving.