XPENG's Dr. Xianming Liu says it isn't just a car company, but a physical AI company.
By

Amanda Yeo
Assistant Editor
Amanda Yeo is an Assistant Editor at Mashable, covering entertainment, culture, tech, science, and social good. Based in Australia, she writes about everything from video games and K-pop to movies and gadgets.
on

The XPENG P7 is capable of autonomous driving using VLA 2.0. Credit: XPENG
EV manufacturer XPENG is targeting a 2027 global rollout for its next-generation VLA 2.0 autonomous driving system. Announcing its launch plans in March, XPENG stated that VLA 2.0 is the first AI driving model with L4 potential in China, marking a significant step toward the dream of the self-driving car.
XPENG's cars aren't completely driverless just yet. But speaking to Mashable, XPENG's General Intelligence Center head Dr. Xianming Liu explained that VLA 2.0 may be key to finally achieving that goal.
What is L4 autonomous driving?
Autonomous driving systems are commonly categorised into one of six levels, as defined by global automotive standards organisation SAE International. These range from no driving automation at Level 0 (L0) to full driving automation at Level 5 (L5).
Most of the currently available cars with such systems operate at L2, offering partial driving automation. Despite its name, Tesla's Full Self-Driving (FSD) system is a L2 system. The company changed up the system's branding earlier this year after the term "Full Self-Driving" was found to be misleading, adding the suffix "(Supervised)" as well as axing the name "Autopilot." (The feature was rebranded again to "Tesla Assisted Driving" to comply with regulations for its Chinese launch in May.)
Some other automakers have reached L3, with Mercedes-Benz becoming the first automaker to offer U.S. customers conditional driving automation in 2024. However, this L3 autonomous driving system only works in very narrow circumstances, such as on specific freeways during clear daytime weather, which limits its practicality.
L4 marks the next high watermark of driving automation, just one step away from not needing a driver at all. Now XPENG claims its AI-powered VLA 2.0 system — standing for Vision-Language-Action — is all but there.
Mashable took a test ride in an XPENG P7 with VLA 2.0 enabled, and found it a difficult claim to dispute. The car was able to smoothly navigate Beijing's large city roads, rough rural streets, and busy pedestrian and scooter traffic with minimal driver intervention. It even could park by itself after everyone had exited the vehicle.
The driver did need to take over at a couple of points, demonstrating that VLA 2.0 isn't a completely autonomous system just yet. Self-driving robotaxi companies such as Tesla and Waymo have suffered concerning safety incidents after removing humans from behind the wheel, and XPENG doesn't claim it's ready to take that leap. But overall, VLA 2.0 felt safe, efficient, and even luxurious.
From autonomous driving to physical AI
According to Liu, VLA 2.0 is a fundamental change when compared to XPENG's previous L2 Navigation Guided Pilot (NGP) system. While NGP focused on developing autonomous driving, VLA 2.0 is focused on solving physical AI problems.
"Once you work on the AI problem, everything changes," said Liu.
Autonomous driving systems such as NGP typically operate on a framework of perception, prediction, planning, and control. In such a system, the vehicle uses onboard sensors to detect its environment and abstract it into data, such as using boxes to represent other cars. It then predicts what these obstacles will do, plans a course of action, and controls the car to execute that plan.
"The NGP is a traditional autonomous driving system where we do the perception first and then do the planning secondly. This is a very old paradigm of autonomous driving, or even currently in robotics," said Liu. "There's a lot of limitations in these kinds of algorithms. Once you work on autonomous driving or AI for more than 10 years, you'll see the limitation. You can never scale up or generalise enough of the entire system to different kinds of scenarios."
To create an L4 autonomous driving system, and eventually a completely self-driving robotaxi, the car must be able to identify and respond to unexpected situations that haven't been specifically accounted for in its programming. The problem, Liu explained, is that developers don't necessarily know what these problems might be.
"We call it unknown unknown," said Liu. "There's so much unknown unknown problems. You can never solve them one by one. So you need to change the paradigm, and try to change the system to be generalised enough and scalable enough to solve all the problems."
XPENG's Dr. Xianming Liu has worked in AI and autonomous driving for almost a decade. Credit: XPENG
For XPENG, the solution was to shift its entire approach to autonomous driving, switching to instead focus on physical AI — the integration of AI software with tangible hardware such as cars or robotics. Unlike digital AI such as chatbots ChatGPT, DeepSeek, and Claude, physical AI is capable of directly interacting with the physical world. It's also able to ingest and adjust to a constant stream of information, breaking away from the sequential structure of previous autonomous driving systems.
"Physical AI is totally different from digital because the signal input is not structured, it's continuous," Liu said. "The information load is much higher than structured data like text or voice. And also the control signal requires high latency and high efficiency. That means your latency needs to be very small."
Turning to physical AI enabled XPENG to scale up, widening their model parameters and feeding it large amounts of data to learn from.
Mashable Light Speed
"We just take all the camera sensor input and directly train the model. We enlarge the model capacity, make it into billion parameters, and train the model using an even much larger data scale compared to large language models, and [then] ask the model to make a decision," said Liu.
"We changed the paradigm of auto driving, and luckily we see the result. The model is generalised enough to be ready for L4 autonomous driving."
How autonomous cars are linked to humanoid robots
Rather than focusing solely on autonomous driving, XPENG is developing the foundation AI model behind VLA 2.0 to be applied across a variety of use cases — including robotics. The company went viral when it debuted its uncannily lifelike humanoid IRON robot last November, even cutting it open to dispel speculation that it was a real person in a suit.
Though the connection between the two projects may not be immediately apparent, Liu told Mashable that many of the challenges facing autonomous car and humanoid robot development are very similar. As such, innovations are transferable.
"A lot of our R&D budget is spent on the AI or the training infrastructure, the data, the modeling itself," said Liu, noting that XPENG views itself as both an EV and physical AI company.
A significant focus of XPENG's ongoing R&D is the AI model's ability to recognise and respond to increasingly complex verbal instructions. This is an important function for both humanoid robots and autonomous cars.
"Robots not only need to understand the environment, which is the world, [but] need to reconstruct the world," said Liu. "But also sometimes [they] need to understand how to communicate with humans or even with other agents in the world."
The XPENG X9 EV can seat up to seven people. Credit: XPENG
While VLA 2.0 navigates using vision input from the car's camera sensors, it's able to take verbal instructions as well. This functionality is currently limited to executing straightforward, immediate instructions, such as telling the car to turn left in 300 metres or change to the right lane. Eventually, XPENG aims for passengers to be able to simply climb in the car, verbally tell it where to go, and relax as they're ferried to their destination.
"You ask the car, 'hey, just pull over in front. I want to buy a coffee so you need to pull over in front of the Starbucks.' The car needs to understand your instruction, needs to translate your instruction into some actions," said Liu. "We want to make sure the model can understand not only the world, which is the sensor [data from its cameras], but also the instruction and human intent."
XPENG and Tesla are driving toward the same destination
XPENG's work with autonomous EVs and humanoid robots have prompted frequent comparisons to Elon Musk's company Tesla. Liu acknowledged such parallels, noting that Tesla is also building a similar AI model aimed at achieving L4 driving.
"I think there is only one way to solve the problem entirely: you need to rethink the problem from the very beginning," Liu told Mashable. "[XPENG and Tesla are] doing something on the same trajectory. We want to solve the problem following the first principles [i.e. breaking it down to its most basic elements]. Directly go to L4, try to solve the problem not using rules, only using AI models. I think this is a similarity."
Humourously, Liu noted that where XPENG distinguishes itself from Tesla is in the sheer volume of data it has on bad driving. Using this data, XPENG has been able to develop its model to respond to such scenarios, ensuring it's better prepared for any unexpected events that might occur on the road.
"For XPENG, we have a lot of data in China which is terrible driving. So you will meet a lot of corner cases, [i.e. rare, unexpected situations outside the norm]," said Liu. "So every day, the problem we are facing is not that we don't have enough data to solve the corner cases, but we have too many corner cases. So we need to solve it. And that's our advantage, and also difference, compared to Tesla."
Ditching the roadmap
Rather than ingesting and relying on road map data, VLA 2.0 has been trained on human driving behaviours. This is to ensure it's capable of appropriately responding to a limitless, non-prescriptive variety of situations. For example, it can look at a live scenario and determine the typical, safe speed limit for that type of environment and conditions.
"Sometimes, even though the road is limited, it's like 80 [speed limit], but it's pretty crowded, you need to slow down and pay attention. Or during inclement weather, for example, raining or foggy, people will slow down because of the situation, because of the environment," said Liu.
"So in these cases, you cannot ask a car to follow instructions from the speed limit from the map, or from all the signs. You need to make sure the model is aware of the risk and understands how to drive safely and how to control the speed."
Importantly, drivers are able to manually adjust the car's maximum speed as well, so it won't travel at a pace that makes them uncomfortable.
"For safety and comfort, the key is to control speed," said Liu. "People can control the wheel, control the scroll to set the speed limit. But the model tries to learn what kind of typical speed people will drive in this kind of situation, because we need to make sure the car is safe enough and also not too slow."
XPENG has numerous car models available in China, but none have entered the U.S. Credit: XPENG
Though VLA 2.0 is trained on a large amount of general data, XPENG hopes to eventually offer a more customised experience. Liu confirmed that the company is developing the ability for individual cars to learn from and adjust to their specific owners, adjusting to suit their personal driving habits. (Significantly, VLA 2.0 does not transfer data to the cloud, with all necessary processing done locally on the car.)
"We're working on that," said Liu. "Definitely customised driving behaviours is one of the things we're working on, so hopefully sometime later you will see it."
Exactly what sort of timeline that feature might be on isn't clear. What is clear is that XPENG has grand ambitions — and maybe even the technology to back it up. Liu acknowledged that VLA 2.0 isn't yet perfect, still requiring driver intervention at times. Even so, there's no denying that it's an important advancement toward the ultimate goal of creating safe, fully autonomous vehicles.
This interview has been lightly edited for grammar and clarity.
Disclosure: Mashable travelled to China as a guest of XPENG.

Amanda Yeo is an Assistant Editor at Mashable, covering entertainment, culture, tech, science, and social good. Based in Australia, she writes about everything from video games and K-pop to movies and gadgets.