ICRA 2026

HetroD: A High-Fidelity Drone Dataset and Benchmark for Autonomous Driving in Heterogeneous Traffic

Yu-Hsiang Chen^1,2, Wei-Jer Chang², Christian Kotulla³, Thomas Keutgens³, Steffen Runde³ Tobias Moers³, Christoph Klas³, Wei Zhan², Masayoshi Tomizuka², Yi-Ting Chen¹

¹ National Yang Ming Chiao Tung University ² UC Berkeley ³ fka GmbH

Paper Code Dataset Challenge

Note: We will release the first version of the sample dataset and code soon.
Our dataset now also supports OpenDRIVE maps.

HetroD is a high-fidelity, drone dataset that captures unstructured maneuvers such as hook turns, aggressive overtakes, queue cutting, and congested crossings among vehicle, scooters, and pedestrians in heterogeneous traffic environments. These maneuvers are critical for testing autonomous driving systems yet remain underexplored in the community. To address this, we construct a benchmark to evaluate existing methods in motion planning, motion prediction, traffic simulation, and conduct a thorough investigation of their generalization across datasets.

Dataset Processing and Usage

Data processing partner

Commercial Use

The HetroD dataset is free for non-commercial use only. For commercial use, please visit levelxdata.com.

Acknowledgment

The HetroD dataset was processed by leveLXData's high-quality, real-world trajectory and scenario data processing pipeline.

Note on scale: We report the dataset size as over/roughly 65.4k trajectories because counts may change slightly during ongoing quality control.

Abstract

We present HetroD, a dataset and benchmark for developing autonomous driving systems in heterogeneous environments. HetroD targets the critical challenge of navigating real-world heterogeneous traffic dominated by vulnerable road users (VRUs), including pedestrians, cyclists, motorcyclists, and vehicles. These mixed agent types exhibit complex behaviors such as hook turns, lane splitting, and informal right-of-way negotiation. Such behaviors pose significant challenges for autonomous vehicles but remain underrepresented in existing datasets focused on structured, lane-disciplined traffic. To bridge the gap, we collect a large-scale drone-based dataset to provide a holistic observations of traffic scenes with centimeter-accurate annotations, HD maps, and traffic signal states. We further develop a modular toolkit for extracting per-agent scenarios to support downstream task development. In total, the dataset comprises over 65.4k high-fidelity agent trajectories, 70% of which are from VRUs. HetroD supports modeling of VRU behaviors in dense, heterogeneous traffic and provides standardized benchmarks for forecasting, planning, and simulation tasks. Evaluation results reveal that state- of-the-art prediction and planning models struggle with the challenges presented by our dataset: they fail to predict lateral VRU movements, cannot handle unstructured maneuvers, and exhibit limited performance in dense and multi-agent scenarios, highlighting the need for more robust approaches to heterogeneous traffic.

Location Samples

Location 1

Location 2

Location 3

Location 4

Location 5

Location 6

Complex Behaviors

Congested Crossings

Hook Turns

Illegal Right Turns

Navigate Around Illegal Parking

Reverse Lane Usage

Unauthorized U-turns

Statistics

TABLE I: Comparison of Datasets on Interaction, Density & Diversity Metrics

Dataset	Platform	Tracks	Duration	Interaction Scale¹	Heterogeneous Interaction Scale²	Geographical Density³	VRUs (%)⁴
NuScenes [1]	On-board	∼90k†	320h	0.675	0.549	—	20.1%
Waymo [2]	On-board	7.6M	574h	1.000	1.000	—	11.5%
Argoverse2 [3]	On-board	13.9M	763h	0.632	0.318	—	10.0%
NuPlan [4]	On-board	∼5M†	1282h	0.274	0.213	—	46.3%
INTERACTION [5]	Drone	40k	16.5h	0.132	—	0.011	—
inD [6]	Drone	13.5k	10h	0.071	0.185	0.023	39.4%
SinD [7]	Drone	13.2k	7.02h	0.099	0.324	0.016	62.1%
HetroD	Drone	~65.4k	17.5h	0.223	0.889	0.026	69.9%

Notes:

† Estimated values based on official statistics.

— Metric not available.

¹ S_inter = Σ_scenarios D_inter.

² S_het = Σ_scenarios Σ_i,j 1(TTC_i,j < 2 s ∧ type_i ≠ type_j).

³ D_geo = N/A, where N is the number of agents within an 8 s window and A is the corresponding area.

⁴ VRUs = 100 × N_VRU/(N_VRU + N_Veh) (VRU: pedestrians, bicycles/cyclists, motorcycles, tricycles; Vehicles: cars, trucks, buses, vans)

Evaluation Results

Prediction Results

Motion prediction models struggle due to heterogeneous traffic complexity.

TABLE II: Cross-dataset evaluation (Brier-FDE ↓)

MTR

Train \ Test	NuScenes	Waymo*	SinD	HetroD
NuScenes	2.95	10.43	5.14	6.76
Waymo	4.01	2.28	4.26	6.71
SinD	16.07	26.34	2.06	3.30
HetroD	21.39	26.49	3.71	0.44

Wayformer

Train \ Test	NuScenes	Waymo*	SinD	HetroD
NuScenes	2.99	8.79	5.23	9.37
Waymo	2.67	2.20	3.53	10.75
SinD	8.23	13.40	1.96	9.23
HetroD	19.57	25.28	8.06	0.75

*Waymo uses 30% of its original training data due to resource constraints.

TABLE III: Ablation on HetroD splits

Setting	MTR	Wayformer
Same-map	0.44	0.75
Diff-map	1.17 (+166%)	1.53 (+104%)
Diff-time	0.42 (−5%)	0.76 (+1%)

TABLE IV: Scenario-conditioned Brier-FDE (↓) on HetroD

Scenario	MTR-Waymo*	MTR-SinD	MTR-HetroD
Agent type
Vehicle	3.64	2.55	0.83
Two-wheeler	8.69	4.63	1.16
Pedestrian	2.85	1.17	0.26
Cross-type TTC risk
High risk (TTC < 2 s)	8.51	4.76	1.25
Moderate (2 ≤ min TTC < 4 s)	8.48	4.67	1.19
Low (TTC > 4 s)	7.87	4.00	0.90
Scene-level heterogeneity
Low heterogeneity density	5.01	3.08	0.90
High heterogeneity density	8.04	4.25	1.06

Distribution: Vehicles 39.4%, Two-wheelers 51.5%, Pedestrians 9.1% | TTC: High 26.5%, Moderate 8.7%, Low 64.8% | Heterogeneity: Low 66.3%, High 33.7%

Planning Results

SOTA Rule-based planners show increased collision rates in heterogeneous traffic scenarios.

TABLE V: Closed-loop Planning Results

Dataset	Planner	NuPlan Score ↑	TTC Bound ↑	Progress ↑	Lane Score ↑	Comfort ↑	Collisions ↓
NuPlan	IDM	0.85	0.94	0.92	0.99	0.48	0.016
NuPlan	PDM-Closed	0.83	0.97	0.91	0.99	0.31	0.006
HetroD	IDM	0.68	0.91	0.81	0.89	0.37	0.074
HetroD	PDM-Closed	0.70	0.95	0.78	0.97	0.21	0.040

TABLE VI: VRU collision breakdown on HetroD

Planner	At-Fault Collision Rate	VRU Front Collision Rate	VRU Lateral Collision Rate
IDM	0.074	0.008	0.031
PDM-Closed	0.040	0.004	0.022

Qualitative Results

Planning Failed Examples

Dense Traffic Scenario

VRU Interaction Failure

Unstructured Maneuver

Multi-Agent Collision

Prediction Qualitative Results

Latent scenario embeddings from Wayformer trained on Waymo, revealing distinct clustering and prediction challenges across datasets

Latent Scenario Embeddings and Prediction Error Analysis — **Latent scenario embeddings and prediction error analysis.** Waymo and NuScenes share similar latent structure with substantial overlap, whereas HetroD scenarios occupy distinct regions, reflecting complex behaviors and marked differences in heterogeneity. HetroD scenarios are particularly challenging: the model frequently over-predicts in dense VRU interactions, failing to capture rich frontal interactions and nuanced intent among crowded agents.

BibTeX

@inproceedings{hetrod,
title={HetroD: A High-Fidelity Drone Dataset and Benchmark for Autonomous Driving in Heterogeneous Traffic},
author={Yu-Hsiang Chen and Wei-Jer Chang and Christian Kotulla and Thomas Keutgens and Steffen Runde and Tobias Moers and Christoph Klas and Wei Zhan and Masayoshi Tomizuka and Yi-Ting Chen},
booktitle={Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) },
year={2026}
}