Smart Driving Entering the City, No Silver Bullet
13 min readCode and algorithms are limited, while real-world scenarios are limitless.
The spotlight continues to shine on the smart electric vehicle industry, but there are inevitably “blind spots.” These blind spots could quickly arrive unnoticed, and they might persist unchanged for a long time while still governing the rules. We call this the “PowerOn Unseen” section, dedicated to revealing the critical “blind spots” in the automotive industry.
“This thing can’t be called NAD (NIO Autonomous Driving).”
In July of this year, NIO CEO William Li experienced his company’s latest smart driving feature on urban roads in Shanghai and was somewhat disappointed. NIO’s NAD is similar to Tesla’s FSD (Full Self Driving), aiming to achieve point-to-point assisted driving on city and highway roads.
At the beginning of the year, NIO’s smart driving team assembled a group of personnel with the goal of implementing Navigation on Autopilot (NoA) features on thousands of kilometers of urban roads in Shanghai in the first half of the year. However, the development did not meet expectations. The NIO smart driving team has set a “declaration of war” with a commitment to “release a new version every two weeks.”
NIO’s anxiety is not without reason. A battle for resources, manpower, speed, and scale in the city NoA (Navigate on Autopilot) realm has erupted.
First comes the numbers race. Huawei and Xiaopeng Motors have set their goals for opening NoA features in 45 and 50 cities respectively this year, while Ideal Auto has directly set a bold target of 100 cities.
The city-opening strategies are also confrontational. Xiaopeng Motors has successively promoted NoA deployment in Beijing, where Ideal’s headquarters is located. According to PowerOn’s information, Ideal plans to set its first city for NoA deployment as Guangzhou – Xiaopeng Motors’ stronghold.
The domestic competition in city NoA deployment has even added pressure to Tesla, which is actively preparing for the entry of its Full Self Driving (FSD) feature into China. An insider revealed to PowerOn that this application was initiated by Tesla’s China sales team, “hoping for swift support from the headquarters in the United States.”
The automotive industry is engaging in comprehensive competition in terms of product space, pricing, and supply chains. However, intelligence, especially in intelligent driving, remains the major point of differentiation for new car companies.
This has forced City NoA to become one of the few “head-of-the-line” projects in the industry. According to PowerOn’s information, the goal of deploying NoA features in 100 cities was personally set by Li Xiang, the founder of Ideal Auto. After Wu Xinzhou, the key figure behind Xiaopeng Motors’ intelligent driving, left the company, He Xiaopeng himself took charge of the smart driving team’s efforts.
However, the aggressive publicity goals cannot hide the difficulties of deploying City NoA. As of now, Xiaopeng Motors’ actual NoA deployment has only progressed to five cities: Beijing, Shanghai, Guangzhou, Shenzhen, and Foshan. Huawei’s pace is similar, and both still rely on high-precision maps. As for Ideal Auto and NIO, their City NoA features have not yet been officially delivered.
From highways to city roads, deploying City NoA still faces significant challenges, especially after eliminating high-precision maps, City NoA is confronted with a perception crisis – how to re-understand the world in front of the vehicle.
Removing high-precision maps: the perception crisis of intelligent driving
With high-precision maps as a “crutch,” smart vehicles have little trouble navigating through intersections, as the maps provide all the static road information needed, including which lane to take, the location of lane markings, whether to turn or go straight, the position of traffic lights, waiting areas, speed limits, and more.
However, the disadvantages of high-precision maps are also clear: the high costs, insufficient freshness, and policy uncertainties. To quickly promote City NoA features from one city to a hundred cities, automakers must abandon this “crutch.”
Without this precise digital tool, vehicles will enter a sparse wilderness and must rely on their own perception capabilities to reconstruct an environmental model.
Tesla remains a leader in this field, having started rebuilding its perception system in 2021. It has successively introduced technologies like Bird’s Eye View (BEV) based on the Transformer model and the Occupancy network.
Similar to how humans perceive the world through their eyes, BEV can convert 2D images into a 3D spatial representation. In addition, BEV+Transformer can recognize static lane markings, provide lane boundary information, differentiate between real and virtual lane markings, and more. On the dynamic level, the Occupancy network is used to identify irregular obstacles, such as cones in construction scenes.
“The BEV solution has a very high upper limit and can allow vehicles to see the world like humans do. It’s essentially the ultimate solution for autonomous driving.” An engineer from a leading automaker stated to PowerOn.
In other words, after eliminating high-precision maps, the combination of BEV+Transformer+Occupancy generates a real-time map for the intelligent driving system.
BEV+Transformer’s Perception of the World, Image Source: Ideal Auto
PowerOn has learned from sources close to Tesla’s engineering team that based on the new perception solution, Tesla’s Full Self Driving (FSD) feature has been deployed to over a million vehicles. Without the need for high-precision maps, the disengagement rate of Tesla’s FSD has dropped to as low as 1.6-1.7 interventions per hundred kilometers. This represents a standard emphasizing comfort.
As such, Tesla has already validated the feasibility of the “remove map trio” – BEV+Transformer+Occupancy – and domestic companies such as Xiaopeng Motors, Huawei, NIO, and Ideal Auto have all followed suit.
However, while this perception solution has a high ceiling, it is not easy to control, especially for domestic companies just entering this path. For example, consider simple lane information. Originally, this information could be easily obtained using high-precision maps. However, real-time perception using BEV+Transformer is an entirely different situation.
For instance, when queuing at an intersection, lane markings are often obstructed by vehicles ahead. If the smart driving system follows the leading vehicle but then realizes after a while that there are no lane markings, the vehicle might start swaying.
“This can lead to the vehicle moving left and right erratically, providing a poor experience,” an engineer told PowerOn. The most direct impact on the smart driving experience isn’t perception, but rather planning and control. “At the very least, the experience of being in the car won’t be jarring.”
Similarly, strong lighting conditions can lead to insufficient clarity in the camera’s perception of lane markings, or lane markings might be obscured due to road construction, and so on. All of these issues can cause intelligent vehicles to become confused on city roads.
Furthermore, compared to recognizing obscured lane markings, the difficulty of recognizing traffic lights at intersections is even more challenging.
Previously, whether it was Tesla or domestic companies like Ideal Auto, NIO, and Xiaopeng Motors, advanced intelligent driving systems mostly focused on highways, often overlooking traffic lights at intersections. However, during the rush to deliver City NoA features, these automakers discovered that navigating traffic lights is a dilemma.
Lang Xianpeng, Vice President of Intelligent Driving at Ideal Auto, once explained: If using high-precision maps to navigate through traffic lights, the vehicle’s recognition and detection of traffic lights need to match the high-precision maps and high-precision positioning. If the recognized traffic lights do not match the high-precision maps, the perception will fail.
Without high-precision maps, the varied types and placements of traffic lights create a challenge in accurately recognizing them solely through cameras and aligning them with the appropriate lanes.
Complex Traffic Light Scenario, Image Source: Xiaopeng Motors
An engineer from Xiaopeng Motors openly admitted to PowerOn that traffic lights are indeed challenging to handle, stating, “Last year, the traffic light team at the company was heavily criticized.”
Recognizing obscured lane markings, navigating through complex intersections, and identifying various types of traffic lights are all significant challenges that must be overcome for urban intelligent driving. For companies eager to accelerate their progress, finding a fast lane seems to be the goal.
New Technology Leads to Verbal Sparring
In the past, Ideal Auto’s progress in intelligent driving was often regarded as “lagging behind.” Although their high-speed NoA (Navigate on Autopilot) feature was not slow to deploy, compared to NIO and Xiaopeng Motors, Ideal’s investment in intelligent driving was conservative. In 2022, the intelligent driving teams at both Xiaopeng Motors and NIO exceeded 800 members, while Ideal Auto maintained a team of around 500.
However, starting this year, Ideal Auto suddenly gained momentum to catch up. Their team rapidly expanded to around 800 members, setting the ambitious goal of deploying a “No-Map” version of City NoA in 100 cities within the year. This bold shift caught the attention of competitors and led to remote questioning from Huang Xin, the head of NIO’s intelligent driving product.
What gave Ideal Auto the confidence to move from the back of the pack to the front?
In May of this year, at the Family Tech Day event, Ideal unveiled its City NoA solution. In addition to the industry-standard “No-Map trio” – BEV+Transformer+Occupancy – Ideal introduced two proprietary neural networks: NPN (Neural Prior Network), tailored for scenarios like lane markings being obscured and complex intersections, and TIN (Traffic Light Intent Network), tailored for traffic lights at intersections.
It’s evident that these two neural networks are designed to address the challenges faced by urban intelligent driving. For example, NPN can extract and store features from perception data at complex intersections, generating road features. When the vehicle encounters the same intersection again, the road features provided by NPN can be fused with the on-board BEV perception, resulting in more accurate perception results.
In terms of functionality, NPN’s role is similar to that of high-precision maps (providing lane, road edge, and traffic sign information). However, its ingenuity lies in transforming these road features into parameter values for the neural network, providing BEV algorithms with encrypted road feature information, thereby avoiding regulatory risks associated with map surveying.
A map industry professional informed PowerOn that, technically speaking, NPN’s confidentiality far exceeds map surveying requirements. In terms of commercial aspects, NPN is closely integrated with Ideal Auto’s specific BEV algorithms. Even if another automaker obtains the information, they wouldn’t know how to use it. “The NPN information provided to Xiaopeng’s BEV algorithm is definitely incomprehensible. But with high-precision maps, Xiaopeng could understand.”
This is akin to installing an encrypted “road information plug-in”. When encountering challenging situations like complex intersections, inclement weather, or obscured lane markings, Ideal vehicles can access and utilize this “plug-in” to handle situations with more ease.
However, from a technological logic perspective, NPN’s shortcomings cannot be overlooked. As it mainly extracts road features from a vehicle’s historical driving data, it naturally faces the challenge of data “freshness”. If a vehicle has not traveled a certain route before, NPN cannot extract road features for that route. If the route frequently changes, NPN’s feature extraction may lag behind.
A senior engineer from another car company straightforwardly told PowerOn, “The NPN network essentially solves the regulatory compliance issue of high-precision map information, but in terms of the implementation of perception technology itself, its assistance is limited.”
The Ideal Auto engineering team seems to have a plan in mind. According to insiders, in Ideal Auto’s plans up to 2025, the application scope of the NPN network will gradually narrow down, “and eventually be used only in very specific scenarios.”
Addressing another challenge of city driving – passing through traffic light intersections – Ideal has also proposed the TIN (Traffic Light Intent Network) solution. Unlike traditional methods that detect the specific status of traffic lights, TIN primarily learns the correspondence between historical image data, such as intersection images, and the vehicle’s throttle and brake data, with the intent of the traffic lights and vehicle movement. The TIN network can then directly provide probabilities for four traffic light states: left turn, right turn, straight ahead, and stop. For example, if the highest probability is for a green light at 75%, the vehicle will choose to go straight.
Another engineer who has worked extensively on traffic light intersections expressed skepticism, stating that there might only be one or two types of traffic lights in the whole city, or even less than ten nationwide. Teaching a neural network with just a dozen cases to recognize them is nearly impossible with current technology.
Although Ideal’s two “plug-in” neural networks have sparked controversy, there are no other effective solutions in the industry for addressing obscured lane markings, complex intersections, and traffic lights. Most methods require substantial effort.
A Xiaopeng insider informed PowerOn that, when encountering obscured lane markings, their internal approach involves supplementing the lane markings with labeled data after image feedback, allowing the perception model to learn. Once the training data becomes abundant, the perception system can also “fill in the gaps” for lane markings.
However, errors in the system’s “filling in” can also occur. For instance, the model might predict a non-existent lane marking. Concerning traffic light detection, if a traffic light on the main road is obstructed, Xiaopeng’s intelligent driving might look at the traffic light on the side road.
It’s undeniable that Ideal’s solution has sparked industry contemplation. Insiders revealed to PowerOn that Xiaopeng is also developing a similar solution to Ideal’s NPN. Through perception capabilities and some lightweight elements, they are working on structured information for complex intersections, aiming for a “more lightweight” approach.
Taming a new perception system is a lengthy process. Internally, Xiaopeng’s intelligent driving team even uses the metaphor of “alchemy” to describe training the perception module. The team often jokes that, “If this is a perception problem, we need to pick up the cauldron and start alchemy.”
City Deployment: No Silver Bullet
Even for Tesla, which has provided a “No-Map” solution, deploying a technology framework doesn’t guarantee smooth sailing. Constructing the algorithm model is just the foundation of the intelligent driving edifice. What follows is continuous testing and optimization.
Moreover, the more you rely on end-to-end models like BEV+Transformer, the more parameter tuning and optimization are needed. An individual close to Tesla’s engineering team told PowerOn that even Tesla “needs to add certain restrictions (some rule logic code) for assistance.”
In addition, every round of algorithm training
and optimization is a time-consuming and resource-intensive endeavor.
An individual from Xiaopeng Motors explained to PowerOn that, to ensure the model’s effectiveness, the development team typically makes the model heavy in the early stages, pushing perception accuracy to high levels. Algorithm optimization involves pruning useless parameters from the model while maintaining perception accuracy. The goal is to fit the algorithm into computation-limited platforms, minimizing computational consumption. “Pruning, quantization, software deployment – each step requires a lot of effort.”
Because of the immense volume of data, each round of algorithm training incurs significant costs. From feeding data to producing a usable model, “each training process takes several days and costs several million yuan.”
In addition to optimizing algorithms repeatedly, countless rounds of software testing are required on actual roads – the most laborious “dirty work” of the intelligent driving deployment process.
An industry insider explained to PowerOn the general process of intelligent driving deployment: For a target city, testing engineers usually take the overall solution for road testing. They spend ten or more hours a day in the car, with a minimum of a half month’s testing for at least a dozen vehicles, to accumulate data for special road conditions.
Subsequently, they iterate models through algorithm adjustments to address specific issues. However, new models may not necessarily solve problems smoothly. “Perhaps out of ten problems, only five are resolved. The only option is continuous testing. After numerous iterations, resolving a city’s problems may take three months.”
As the scale of deployment grows larger, car companies need to establish a more extensive closed-loop data system, encompassing data mining, active learning, automatic labeling, model debugging, testing validation, and model deployment. Most importantly, this closed-loop system must withstand the deluge of data sent back by hundreds of thousands or even millions of vehicles, truly forming a data-driven approach that empowers vehicles to evolve autonomously.
Thus, even with Xiaopeng’s relatively extensive experience in deployment, the speed of city deployment is not as fast as initially anticipated. “There is currently no mature assembly-line city deployment model. We can only tackle one challenge at a time.” A Xiaopeng intelligent driving professional explained.
According to insiders, the greatest pressure internally now is on planning and control. For instance, most of Xiaopeng’s data was from Guangdong Province. After expanding to other cities, they encountered many Beijing-specific scenarios.
To ensure consistent user experience, they can only layer various control rules to modify driving strategies. Even Wu Xinzhou, a key figure in Xiaopeng’s intelligent driving team, had to deal with the challenge of NGP deployment in Beijing, even on the night before his departure. “Running between Beijing and Guangzhou.”
Clearly, even with a viable perception technology path, the rapid deployment of urban intelligent driving is not a “silver bullet”. Whether it’s model optimization or rule adjustments, both processes are time-consuming and complex engineering endeavors.
During the Chengdu Auto Show on August 25th, Ideal Auto’s Vice President Liu Jie quietly adjusted the description of “City NoA in 100 cities”. In the detailed city deployment plan released, “City NoA” was changed to “Commuting NoA”.
Compared to City NoA, which covers the entire urban area, the Commuting NoA mode is much lighter. Users can set 1 to 2 commuting routes themselves, and vehicles can autonomously learn road NPN features, activating NoA functionality on these routes once training is complete. “For relatively simple routes, activation can be completed within a week, while for more complex routes, training can be finished in 2-3 weeks,” according to Ideal.
Xiaopeng has also introduced a similar “AI Valet” mode to achieve its goal of deploying in 50 cities this year. “It’s low cost, high marginal effect, and all the routes are frequently used by users,” a professional in the intelligent driving industry stated.
BEV+Transformer almost provides an end-to-end perception solution that removes the reliance on high-precision maps. This path leads intelligent driving from highways into cities, ultimately toward full autonomy. However, it must be acknowledged that codes and algorithms have limitations, while real-world scenarios are boundless. This determines that intelligent driving is an engineering process built brick by brick.
(Note: This response is a translation of the provided text, which seems to be a technical article discussing the challenges and strategies related to urban autonomous driving using electric vehicles and the Transformer architecture. The content appears to come from a Chinese source and covers various aspects of the topic, including the challenges of dealing with obscured lane markings, complex intersections, and traffic lights, as well as the neural networks developed by Ideal Auto for tackling these challenges. The article also discusses the ongoing efforts and complexities of deploying intelligent driving technology in urban environments.)