Accurate and robust perception models are the backbone of any advanced driver-assistance system (ADAS) and automated driving (AD) system. Planning and controls modules can make informed decisions about how to navigate safely in the world only when they can see and understand the world around an agent. Deep learning-based perception systems require large amounts of diverse training data during development and extensive testing during validation. 3D worlds provide simulated environments that allow developers to create a wide range of driving scenarios, including complex, rare, and dangerous situations, without the risks associated with real-world testing.
Sensor simulation helps perception engineers by developing a platform that can emulate what sensors would see in the real world. In this blog, we will focus on the 3D worlds that Applied Intuition’s sensor models interact with to create synthetic data.
3D worlds have multiple components, including the following elements that build upon a base map:
While these are necessary components of a 3D world, there is a high variance in the quality of 3D worlds. In the next section, we explore what differentiates a good 3D world from a bad one.
The quality of synthetic environments comes down to one thing: Realism. But how can we measure realism?
Synthetic environments need accurate and realistic road properties to effectively simulate planning and controls modules. Without these realistic details, applying high-fidelity vehicle dynamics models to these roads does not result in an accurate reflection of how a real vehicle would behave on a real road.
For example, roads in the real world often have camber to help with precipitation runoff and improve rider comfort. Most map data representations, however, do not account for this effect, and instead treat roads as planar objects. Real roads are not perfectly flat surfaces; they have potholes, speed bumps, cracks, drains, and other artifacts that affect how a vehicle traverses them. Realistic 3D worlds model these effects, which are critical for accurate and useful vehicle dynamics simulations. Similarly, at shallow angles, these perturbations can drastically affect the returned range from a lidar system.
3D assets are the objects that populate the 3D world such as trees, vehicles, buildings, etc. They are defined by a mesh that defines a shape and a material that defines how light interacts with the surface, i.e., how it looks. For physically based rendering (PBR) systems that attempt to simulate exactly how electromagnetic waves interact with different objects, the quality of object material properties is thus vital for high-fidelity simulation. Material properties define, as a function of frequency, how certain wavelengths of light scatter, often in the form of a bidirectional scattering distribution function (BSDF).
While accurately characterizing these functions can be a significant investment, it will result in actually useful camera, lidar, and radar outputs.
Assuming that one has high-quality assets, the realism of a scene will then follow from the accuracy of the placement of such assets. When creating a large variety of large-scale 3D worlds, it is impractical to place individual assets by hand. This challenge necessitates some form of proceduralism—a method of creating content based on a set of rules or algorithms, which we will discuss further below. When making procedural edits, asset placement realism is crucial to creating data that is useful for planning modules.
For example, placing a bench, a trash bin, or some other random object in the middle of a road does not reflect what roads in the real world are like. This will degrade a simulation’s quality, and the resulting data is often useless.
There are multiple ways to create 3D worlds, each with its advantages and disadvantages.
Traditionally, 3D worlds have required technical artists to put in significant time painstakingly recreating a world by hand, then filling in that world with premade assets.
Digital twins use various inputs, reference images, and lidar scans. Of all the techniques covered in this blog post, digital twins produce the most faithful digital recreations of the world. They are customizable since they do not have any baked-in vehicles, pedestrians, or other classes, and they include ground-truth data on each object.
Unfortunately, building digital twins from scratch is a costly and lengthy way to create 3D worlds, requiring months or even years of work. They are also hard to keep up to date, as changing a virtual world to mirror the real world can take weeks. Updating digital twins is resource intensive and requires teams of artists, as well as extensive and expensive onsite access, which is not always possible.
Procedural techniques cut the time consumption of handcrafted digital worlds while still achieving a realistic 3D world.
A team of artists does not need to place countless lights, trees, or other objects by hand throughout the 3D world. Procedural editing is precise and can use a variety of inputs, including base maps, lidar point clouds, photogrammetry, OpenStreetMaps, and USGS digital elevation data.
Proceduralism enables teams to generate multiple versions of a 3D world in a fraction of the time traditional workflows take to create a single one. Each of these versions can help train and test an ADAS or AD stack. This technique allows teams to expand to new regions and operational design domains (ODDs) where map data is unavailable. The resulting 3D worlds have the correct ground-truth data for each object and are free of baked-in vehicles or pedestrians.
As a tradeoff, the procedurally generated 3D worlds sacrifice a level of specificity. Procedural techniques generate a geotypical place but not an exact reproduction of a specific place in the real world. They compromise on the hyperrealism of a digital twin to deliver many different versions of the same world, giving confidence and flexibility when testing.
Applied Intuition leverages a mixture of procedural tools and limited handcrafted assets to ensure fidelity, correct annotations, accuracy, and precision. Our simulation tools fill in the 3D world by default while allowing teams to modify automatically generated parts that do not satisfy their use case. They also provide speed and control.
Photogrammetry is another rather quick way to build a 3D world. It involves taking multiple images of an object from different angles to reconstruct an accurate representation of a scene.
However, photogrammetry has some limitations. It lacks a clear separation between objects in the scene. Assets like trees and bushes meld together into a single blob. The storage requirements and computational complexity of photogrammetry can also make it a costly process. Incomplete scans may cause generated worlds to have holes and gaps. Assets may be baked in without the benefits of proper labels.
NeRFs are a more advanced version of a scan leveraging machine learning (ML) techniques that use camera image inputs to render 3D worlds from various angles. As a newer technology, this approach has valuable use cases in specific domains such as drive data replay, but it is not best suited when trying to performantly reproduce 3D worlds.
The main advantage of NeRFs is an exact recreation of the desired world using camera inputs without producing holes or “bad data” common to photogrammetry scans.
On the other hand, NeRFs require significant amounts of data, compute, and storage for both generation and playback. They struggle to represent out-of-distribution features, such as snow in the city of San Francisco.
In general, the 3D worlds created by NeRFs are inflexible and do not generalize to other sensor modalities such as radar, lidar, and ultrasonic sensors. In certain cases, generated images can appear fuzzy or blurry, and objects can seem merged together in a blob. Any assets in the world are baked in but usually not labeled automatically.
Despite their limitations, NeRFs are an interesting technology that our team at Applied Intuition is exploring for potential integration into its world generation systems. We believe a hybrid approach combining the strength of NeRFs with our precise and handcrafted assets allows us to better capture and recreate unique assets such as distinctive buildings in the world.
At Applied Intuition, we have developed and commercialized a user-friendly procedural world generation system. ADAS and AD development teams can generate a fully featured map that supports all weather conditions. Teams can use Applied Intuition’s environment editing tool, Meridian, to customize 3D worlds with procedural zones, randomized assets by run, and large-scale domain randomization tools. Using our procedural world generation system benefits ADAS and AD development teams by reducing the time, and therefore cost, of creating environments for sensor simulation and synthetic data generation.
The randomization tools also allow developers to test many versions of the same 3D world. In particular, tests can not only run on a particular version of a map that matches a particular date but also investigate questions such as “What if that tree were not there?” or “What if there were an obstruction on the corner of the intersection instead?”
Applied Intuition’s technical artistry team solves the monotony and repetition problems of 3D world generation by creating more realistic rule sets and automatically varying assets with dust, rust, weathering, weather effects, and other variations.
For example, a simple rule could place a streetlight every 30 meters. On the surface, this technique seems to work well, but it will cause streetlights to be too evenly distributed to be realistic and can perform poorly on wide roads. A more complex procedure might be ensuring that the road area is evenly lit by placing circles on the edge and filling in the space. This generates a more realistic distribution of lights that automatically uses the right lighting depending on the context, such as a tall highway light or a smaller suburban one.
Good rules mean good worlds. Applied Intuition’s tools allow teams to customize a 3D world’s rules and settings but also leave room for local and specific overrides. This flexibility gives teams complete control over their 3D environments.
The Applied Intuition team made the choice to derive its sensor rendering engine from Unreal Engine 5. With full access to the source code, we can enhance the base engine, for example, by adding different lighting spectra and integrating custom Vulkan Linux-based ray tracing for lidar, radar, and camera sensors. We also extended asset material properties wherever needed—for example, retroreflective surfaces or infrared retroreflective (IR) properties. Unreal Engine 5 leads the class for lighting fidelity and gives us the best support for vast scenes and highly detailed worlds.
Contact us to learn more about building best-in-class 3D environments for ADAS and AD simulation.