During an advanced driver-assistance system (ADAS) or automated driving (AD) simulation of an unprotected left turn maneuver, the ego vehicle drives into the path of an oncoming vehicle. This behavior is clearly unsafe, but why did it happen? The ADAS or AD development team runs the simulation again and again, but every time the ego navigates the turn successfully.
How can ADAS and AD development teams reproduce issues like the one described above reliably? How can they root cause and fix issues that defy repetition? In short, what should teams do when they cannot trust their own ADAS or AD stack?
This is where determinism comes in.
In software development, determinism means that, given the same input, a program always generates the same output while passing through the same intermediate states, no matter how many times we run the program.
This blog post will explore what issues might arise when an ADAS or AD stack is nondeterministic. We will look at a specific example of a hypothetical AD stack, explore the symptoms of one of its nondeterministic behaviors, cover possible approaches to find the root cause, and discuss potential fixes.
We are developing an AD stack that uses the system clock for its time information (i.e., “wall time”). We have a budget of 10 milliseconds per update. If our controls module does not generate a command within the 10-millisecond budget, the AD stack will use the most recent command. We have tested our stack thousands of times and always received the same output when simulating a given scenario. This leads us to believe that our AD stack is deterministic.
After integrating a new perception module, we run an unprotected left turn scenario in simulation. The ego drives into the path of an oncoming vehicle while performing the unprotected left turn. We run the same scenario again, and this time, the ego does not drive into the oncoming vehicle’s path. What happened? How can we identify and address the source of this nondeterministic simulation result?
We run the same scenario 1,000 more times. Looking at the data, we recognize the following:
Next, we run the same scenario 1,000 more times using the old perception module. This time, we do not see any difference in the results.
Based on the symptoms above, we suspect that our new perception module is causing the nondeterminism. Unfortunately, we do not have any other pointers to root cause the issue. Luckily, there are a few checks that we can perform to rule out different sources of nondeterminism in our AD stack.
There are a few common sources of nondeterminism that we can inspect our code to find:
Upon inspecting our perception module code and its third-party libraries, we quickly rule out random number generators as possible sources of nondeterminism. We also know that we always correctly initialize variables, so we rule out undefined behavior as well.
When looking for concurrent operations, we notice a potential issue with the way we update the ego state. The controls module is supposed to update once per cycle. However, if the controls module has not produced a new command within the 10-millisecond budget, the physics model just uses the most recent control command. Could that be the source of the nondeterminism?
To understand the order of operations within our AD stack, we use a tracing tool that reveals how much time we spend in each module and in which order those computations occur. Looking at the trace output, we notice that some AD stack updates require more wall time than allocated in the 10-millisecond budget. We also notice that these instances are correlated perfectly with the instances of nondeterministic behavior.
The core issue is that our physics model uses the elapsed wall time to decide when to compute the next vehicle state. When it computes this update, it will just use whatever control command was most recently provided by the controls module. Consequently, in times of high CPU load the controls module does not finish computing the new control command within the 10-millisecond budget, and the physics model proceeds with whatever control command exists already. In those cases, we execute a different set of control commands and end up with a different ego path.
Knowing the root cause of the nondeterminism, we reevaluate our simulation results. It turns out that there are far more instances of different behavior than we initially thought. Nearly half of our simulation runs have an ego path that varies by a small, almost unnoticeable amount. A drastic change in vehicle behavior occurs only when we miss the budget at a critical moment in the simulation. The decision point during the unprotected left turn is one of those critical moments. The ego is effectively frozen in a “moderate throttle” state for a fraction of a second—a relatively short amount of time but long enough to cause a major safety violation.
After completing the root cause analysis of our nondeterministic simulation results, how can we fix the issue?
We have several options to address the timing-related issues in our AD stack.
As the issue we are facing is a performance issue, a more powerful computer might remedy it. This way of addressing the issue can be fast and is sometimes the best way to unblock engineers. Of course, it is not an actual solution to the problem but rather an expensive way of delaying the issue’s symptoms. Still, hardware upgrades can help relieve time pressure while we invest in a proper solution. In some cases, they might be a fair approach we can take.
Alternatively, we can optimize our code to execute in less time. If we have never done this before, we will likely find some easy fixes to implement quickly. Similarly to hardware upgrades, these optimizations might resolve the symptoms, but they will not resolve the root cause of our problem. We should expect to see nondeterministic behavior again if we do not pair this approach with another, longer-term solution.
The only real solution to our problem is to allow our AD stack to accept time from an external clock source and ensure that our modules wait for their respective inputs before proceeding. In this case, the AD stack should wait for a new control command instead of using whatever command was computed most recently.
During simulation runs, if we want our AD stack to wait for a control command, it needs to rely on an external time source. Using an external time is required to achieve determinism because it gives us control over when the simulation advances. If we rely on the system clock, we cannot guarantee that our code finishes executing in time. In our example case of the new perception module, our AD stack has always been nondeterministic. It was only a matter of time before the nondeterminism became evident.
This solution is the most complex and time-consuming of the three mentioned options. It can be difficult to identify all the places where an AD stack uses the system clock. It can also be challenging to test the change to an external clock source. Still, this solution is the only way to guarantee deterministic simulation results. The best way to implement this solution depends on our existing AD stack structure and will require clock synchronization, which is outside the scope of this blog post.
It is critical to note that blocking on execution should be a mode of operation only for offline use cases, such as simulation. When testing and operating our AD stack in the real world, we cannot make a strong safety case if certain aspects are blocked on execution. We can design our stack with a “deterministic” mode of execution and a “hard real-time” mode where calls are non-blocking. This allows us to gain the benefits of determinism within simulations without sacrificing utility in the real world.
As the real world is not deterministic, why should we put in the effort to achieve determinism for simulations in the first place? Why does determinism matter?
The example explored in this blog post was relatively easy to debug and root cause. However, the issue at hand was only easy to notice and fix because we dramatically simplified the interconnectedness of our example AD stack and made several assumptions about our source code quality. In real-world ADAS and AD development, myriad other issues and conflating factors might complicate things and obfuscate our view of potential solutions.
For example, what if our AD stack never behaved deterministically in the first place? Our engineers would get used to ignoring small differences in simulation results, so we would likely not recognize the performance issue right away. We might miss the unprotected left turn issue entirely since it occurs in less than 1% of simulations. What if we did not run this specific scenario enough times to notice the issue? Our AD stack would contain a serious bug that can lead to collisions, but we might not uncover these safety issues during virtual testing. Any issues we do not catch in simulation will surface on real hardware, which is much more expensive to test and comes with serious safety implications.
Without determinism, we lose substantial confidence in our approach. In fact, simulation results could be completely wrong without us noticing. If our controls module runs too slowly during simulation without appropriately blocking on execution, then it might not converge at all, or it might end up with too few samples to track perception properly.
Assuming that we cannot ensure determinism, we can only take a statistical approach of running thousands of simulations and analyzing the results. If we find a problem, we can guess its solution, implement that guess, and then run the same scenario thousands more times to see if the fix worked. For long-tail scenarios, this approach is often not viable, as millions of simulations are required to have confidence in just a few edge cases. On the other hand, if our AD stack is deterministic, then a single simulation proves success for a given scenario. We can, and likely will, still run millions of simulations, but each simulation will cover just one scenario.
Without determinism, will we know whether our AD stack can handle an unprotected left turn? What if we vary the scene in terms of other traffic, the size of the road, the speed at which each agent is moving, or the weather? With a deterministic stack, each of these scenario variations is a single simulation. Without determinism, we might need to run each variation hundreds or thousands of times to achieve an approximation of confidence. Learn more about creating high-quality scenarios and scenario variations in this blog post.
If we do not receive the same result every time we run a simulation, software development becomes much more difficult. Suppose we fix a bug and run a simulation before and after making the fix. If the results of the two simulation runs differ, is that difference caused by the fix we performed or by something else? If our stack is not deterministic, there is no way to know for sure, making it harder for software developers to move quickly and confidently.
This difficulty in ADAS and AD development gets amplified if we run simulations in the cloud since it is generally more time-consuming to develop software in the cloud. Our AD stack exhibits the issue explored in this blog post when it encounters performance issues, and it is common for cloud machines to be less performant than local desktops. What if this issue is only exhibited in the cloud? Software developers will need more time to resolve the issue since it is harder to root cause issues and takes longer to deploy fixes to the cloud compared to local environments.
If any aspect of a system is nondeterministic, then the entire system is nondeterministic. A single nondeterministic piece of data will cause nondeterministic results in all downstream computations. In an ADAS or AD stack, computations form a loop where the new state of the vehicle is the input to the next iteration. This means that every module must be deterministic, or the entire stack cannot be deterministic. Even the simulation tools we use to test ADAS and AD stack performance must be deterministic since they are part of the computation loop.
The Applied Development Platform’s (ADP’s) simulator is deterministic. It follows several steps to achieve determinism:
There are some specific cases within ADP that cannot be perfectly deterministic, such as using GPU acceleration. There are very few such cases, all of which are listed in Applied Intuition’s documentation and do not impact macro-level determinism.
Determinism is a powerful tool. Without it, ADAS and AD development teams operate with less confidence. With it, teams can execute quickly and confidently. Contact our engineering team if you are interested in learning more about determinism or want to implement a deterministic mode of operation within your ADAS or AD stack.