HetroD: A High-Fidelity Drone Dataset and Benchmark for Autonomous Driving in Heterogeneous Traffic

Yu-Hsiang Chen1,2, Wei-Jer Chang2, Christian Kotulla3, Thomas Keutgens3, Steffen Runde3 Tobias Moers3, Christoph Klas3, Wei Zhan2, Masayoshi Tomizuka2, Yi-Ting Chen1
1 National Yang Ming Chiao Tung University 2 UC Berkeley 3 fka GmbH

Note: We will release the first version of the sample dataset and code soon.
Our dataset now also supports OpenDRIVE maps.

HetroD Teaser Image

HetroD is a high-fidelity, drone dataset that captures unstructured maneuvers such as hook turns, aggressive overtakes, queue cutting, and congested crossings among vehicle, scooters, and pedestrians in heterogeneous traffic environments. These maneuvers are critical for testing autonomous driving systems yet remain underexplored in the community. To address this, we construct a benchmark to evaluate existing methods in motion planning, motion prediction, traffic simulation, and conduct a thorough investigation of their generalization across datasets.

Abstract

We present HetroD, a dataset and benchmark for developing autonomous driving systems in heterogeneous environments. HetroD targets the critical challenge of navigating real-world heterogeneous traffic dominated by vulnerable road users (VRUs), including pedestrians, cyclists, motorcyclists, and vehicles. These mixed agent types exhibit complex behaviors such as hook turns, lane splitting, and informal right-of-way negotiation. Such behaviors pose significant challenges for autonomous vehicles but remain underrepresented in existing datasets focused on structured, lane-disciplined traffic. To bridge the gap, we collect a large-scale drone-based dataset to provide a holistic observations of traffic scenes with centimeter-accurate annotations, HD maps, and traffic signal states. We further develop a modular toolkit for extracting per-agent scenarios to support downstream task development. In total, the dataset comprises over 65.4k high-fidelity agent trajectories, 70% of which are from VRUs. HetroD supports modeling of VRU behaviors in dense, heterogeneous traffic and provides standardized benchmarks for forecasting, planning, and simulation tasks. Evaluation results reveal that state- of-the-art prediction and planning models struggle with the challenges presented by our dataset: they fail to predict lateral VRU movements, cannot handle unstructured maneuvers, and exhibit limited performance in dense and multi-agent scenarios, highlighting the need for more robust approaches to heterogeneous traffic.

Location Samples

Location 1

Location 2

Location 3

Location 4

Location 5

Location 6

Complex Behaviors

Congested Crossings

Congested Crossing 1
Congested Crossing 2
Congested Crossing 3

Hook Turns

Hook Turn 1
Hook Turn 2
Hook Turn 3

Illegal Right Turns

Illegal Right Turn 1
Illegal Right Turn 2
Illegal Right Turn 3

Navigate Around Illegal Parking

Navigate Illegal Park 1
Navigate Illegal Park 2
Navigate Illegal Park 3

Reverse Lane Usage

Reverse Lane Usage 1
Reverse Lane Usage 2
Reverse Lane Usage 3

Unauthorized U-turns

Unauthorized U-turn 1
Unauthorized U-turn 2
Unauthorized U-turn 3

Statistics

Agent Type Distribution
Risk Assessment Analysis

TABLE I: Comparison of Datasets on Interaction, Density & Diversity Metrics

Dataset Platform Tracks Duration Interaction Scale1 Heterogeneous Interaction Scale2 Geographical Density3 VRUs (%)4
NuScenes [1] On-board ∼90k† 320h 0.675 0.549 20.1%
Waymo [2] On-board 7.6M 574h 1.000 1.000 11.5%
Argoverse2 [3] On-board 13.9M 763h 0.632 0.318 10.0%
NuPlan [4] On-board ∼5M† 1282h 0.274 0.213 46.3%
INTERACTION [5] Drone 40k 16.5h 0.132 0.011
inD [6] Drone 13.5k 10h 0.071 0.185 0.023 39.4%
SinD [7] Drone 13.2k 7.02h 0.099 0.324 0.016 62.1%
HetroD Drone 65.4k 17.5h 0.223 0.889 0.026 69.9%

Notes:

† Estimated values based on official statistics.

— Metric not available.

1 Sinter = Σscenarios Dinter.

2 Shet = Σscenarios Σi,j 1(TTCi,j < 2 s ∧ typei ≠ typej).

3 Dgeo = N/A, where N is the number of agents within an 8 s window and A is the corresponding area.

4 VRUs = 100 × NVRU/(NVRU + NVeh) (VRU: pedestrians, bicycles/cyclists, motorcycles, tricycles; Vehicles: cars, trucks, buses, vans)

Evaluation Results

Prediction Results

Motion prediction models struggle due to heterogeneous traffic complexity.

TABLE II: Cross-dataset evaluation (Brier-FDE ↓)
MTR
Train \ Test NuScenes Waymo* SinD HetroD
NuScenes 2.95 10.43 5.14 6.76
Waymo 4.01 2.28 4.26 6.71
SinD 16.07 26.34 2.06 3.30
HetroD 21.39 26.49 3.71 0.44
Wayformer
Train \ Test NuScenes Waymo* SinD HetroD
NuScenes 2.99 8.79 5.23 9.37
Waymo 2.67 2.20 3.53 10.75
SinD 8.23 13.40 1.96 9.23
HetroD 19.57 25.28 8.06 0.75

*Waymo uses 30% of its original training data due to resource constraints.

TABLE III: Ablation on HetroD splits
Setting MTR Wayformer
Same-map 0.44 0.75
Diff-map 1.17 (+166%) 1.53 (+104%)
Diff-time 0.42 (−5%) 0.76 (+1%)
TABLE IV: Scenario-conditioned Brier-FDE (↓) on HetroD
Scenario MTR-Waymo* MTR-SinD MTR-HetroD
Agent type
Vehicle 3.64 2.55 0.83
Two-wheeler 8.69 4.63 1.16
Pedestrian 2.85 1.17 0.26
Cross-type TTC risk
High risk (TTC < 2 s) 8.51 4.76 1.25
Moderate (2 ≤ min TTC < 4 s) 8.48 4.67 1.19
Low (TTC > 4 s) 7.87 4.00 0.90
Scene-level heterogeneity
Low heterogeneity density 5.01 3.08 0.90
High heterogeneity density 8.04 4.25 1.06

Distribution: Vehicles 39.4%, Two-wheelers 51.5%, Pedestrians 9.1% | TTC: High 26.5%, Moderate 8.7%, Low 64.8% | Heterogeneity: Low 66.3%, High 33.7%

Planning Results

SOTA Rule-based planners show increased collision rates in heterogeneous traffic scenarios.

TABLE V: Closed-loop Planning Results
Dataset Planner NuPlan Score ↑ TTC Bound ↑ Progress ↑ Lane Score ↑ Comfort ↑ Collisions ↓
NuPlan IDM 0.85 0.94 0.92 0.99 0.48 0.016
PDM-Closed 0.83 0.97 0.91 0.99 0.31 0.006
HetroD IDM 0.68 0.91 0.81 0.89 0.37 0.074
PDM-Closed 0.70 0.95 0.78 0.97 0.21 0.040
TABLE VI: VRU collision breakdown on HetroD
Planner At-Fault Collision Rate VRU Front Collision Rate VRU Lateral Collision Rate
IDM 0.074 0.008 0.031
PDM-Closed 0.040 0.004 0.022

Qualitative Results

Planning Failed Examples

Dense Traffic Scenario

VRU Interaction Failure

Unstructured Maneuver

Multi-Agent Collision

Prediction Qualitative Results

Latent scenario embeddings from Wayformer trained on Waymo, revealing distinct clustering and prediction challenges across datasets

Latent Scenario Embeddings and Prediction Error Analysis
Latent scenario embeddings and prediction error analysis. Waymo and NuScenes share similar latent structure with substantial overlap, whereas HetroD scenarios occupy distinct regions, reflecting complex behaviors and marked differences in heterogeneity. HetroD scenarios are particularly challenging: the model frequently over-predicts in dense VRU interactions, failing to capture rich frontal interactions and nuanced intent among crowded agents.

BibTeX

BibTex Code Here