Develop ML models with high-fidelity synthetic data

Synthetic Datasets empower users to improve the robustness of their machine learning (ML) models with camera, lidar, radar, and other sensor data generated in high-fidelity 3D worlds. Users can generate millions of labeled samples with diverse actors, behaviors, and environmental conditions.

Key features

Library of validated sensor models tuned to represent your sensor hardware; support for camera, lidar, radar, and more
Error-free ground truth labels generated programmatically; flexible annotation format to enable integration with your existing pipeline
Assets and procedurally generated 3D worlds with support for domain randomization or customization to your task domain
High-level dataset definition language and visual editor to easily define the data you need
Dataset management tooling to view statistics, filter, and export your data
Cloud-first infrastructure enabling rapid dataset generation, elastic scalability, and easy collaboration

Accelerate your ML development loop

Integrate model results and automatically generate data to address failure cases. Rapidly target data sparsity issues, class imbalances, and other biases with scalable synthetic data generation.

Synthetic camera data with raytraced reflections and difficult outdoor lighting conditions
Data generation beyond the automotive domain; support for warehouses, farms, and more

Train with dense labels

Programmatically generate and train with dense labels like semantic segmentation, depth, and optical flow, which are expensive or impossible to obtain for real data.

Per-pixel semantic segmentation
Per-pixel depth
Per-pixel optical flow

Broaden your task domain

Rapidly expand to new regions, classes, or sensor hardware by utilizing synthetic data for transfer learning. Use region-specific assets such as traffic signs to safely expand operations with less reliance on real data.

Use synthetic data to bootstrap labels

Improve the efficiency and accuracy of manual data labeling by training auto labelers with synthetic data. Utilize labeled synthetic data to kickstart semi-supervised learning on real data.

Programmatically generated 2D bounding boxes (yellow), 3D bounding volumes (purple), and lane markings (green)

Increase your ML team’s efficiency

Applied’s cloud-first tools help users define, generate, and manage data easily across their ML team.

Generate datasets from a high-level language, from logs, or from scenarios

Scale ML model training and shorten time to market

Complement real data

Use synthetic data as a complement when real data is sparse or difficult to collect

Decrease labeling costs

Leverage large, diverse synthetic datasets to minimize the time and resources spent labeling real data

Speed up data collection

Accelerate your ML team’s time to market with synthetic data generated in days instead of real data collected in months