We present HetroD, a dataset and benchmark for developing autonomous driving systems in heterogeneous environments. HetroD targets the critical challenge of navigating real-world heterogeneous traffic dominated by vulnerable road users (VRUs), including pedestrians, cyclists, motorcyclists, and vehicles. These mixed agent types exhibit complex behaviors such as hook turns, lane splitting, and informal right-of-way negotiation. Such behaviors pose significant challenges for autonomous vehicles but remain underrepresented in existing datasets focused on structured, lane-disciplined traffic. To bridge the gap, we collect a large-scale drone-based dataset to provide a holistic observations of traffic scenes with centimeter-accurate annotations, HD maps, and traffic signal states. We further develop a modular toolkit for extracting per-agent scenarios to support downstream task development. In total, the dataset comprises over 65.4k high-fidelity agent trajectories, 70% of which are from VRUs. HetroD supports modeling of VRU behaviors in dense, heterogeneous traffic and provides standardized benchmarks for forecasting, planning, and simulation tasks. Evaluation results reveal that state- of-the-art prediction and planning models struggle with the challenges presented by our dataset: they fail to predict lateral VRU movements, cannot handle unstructured maneuvers, and exhibit limited performance in dense and multi-agent scenarios, highlighting the need for more robust approaches to heterogeneous traffic.
| Dataset | Platform | Tracks | Duration | Interaction Scale1 | Heterogeneous Interaction Scale2 | Geographical Density3 | VRUs (%)4 |
|---|---|---|---|---|---|---|---|
| NuScenes [1] | On-board | ∼90k† | 320h | 0.675 | 0.549 | — | 20.1% |
| Waymo [2] | On-board | 7.6M | 574h | 1.000 | 1.000 | — | 11.5% |
| Argoverse2 [3] | On-board | 13.9M | 763h | 0.632 | 0.318 | — | 10.0% |
| NuPlan [4] | On-board | ∼5M† | 1282h | 0.274 | 0.213 | — | 46.3% |
| INTERACTION [5] | Drone | 40k | 16.5h | 0.132 | — | 0.011 | — |
| inD [6] | Drone | 13.5k | 10h | 0.071 | 0.185 | 0.023 | 39.4% |
| SinD [7] | Drone | 13.2k | 7.02h | 0.099 | 0.324 | 0.016 | 62.1% |
| HetroD | Drone | 65.4k | 17.5h | 0.223 | 0.889 | 0.026 | 69.9% |
Notes:
† Estimated values based on official statistics.
— Metric not available.
1 Sinter = Σscenarios Dinter.
2 Shet = Σscenarios Σi,j 1(TTCi,j < 2 s ∧ typei ≠ typej).
3 Dgeo = N/A, where N is the number of agents within an 8 s window and A is the corresponding area.
4 VRUs = 100 × NVRU/(NVRU + NVeh) (VRU: pedestrians, bicycles/cyclists, motorcycles, tricycles; Vehicles: cars, trucks, buses, vans)
Motion prediction models struggle due to heterogeneous traffic complexity.
| Train \ Test | NuScenes | Waymo* | SinD | HetroD |
|---|---|---|---|---|
| NuScenes | 2.95 | 10.43 | 5.14 | 6.76 |
| Waymo | 4.01 | 2.28 | 4.26 | 6.71 |
| SinD | 16.07 | 26.34 | 2.06 | 3.30 |
| HetroD | 21.39 | 26.49 | 3.71 | 0.44 |
| Train \ Test | NuScenes | Waymo* | SinD | HetroD |
|---|---|---|---|---|
| NuScenes | 2.99 | 8.79 | 5.23 | 9.37 |
| Waymo | 2.67 | 2.20 | 3.53 | 10.75 |
| SinD | 8.23 | 13.40 | 1.96 | 9.23 |
| HetroD | 19.57 | 25.28 | 8.06 | 0.75 |
*Waymo uses 30% of its original training data due to resource constraints.
| Setting | MTR | Wayformer |
|---|---|---|
| Same-map | 0.44 | 0.75 |
| Diff-map | 1.17 (+166%) | 1.53 (+104%) |
| Diff-time | 0.42 (−5%) | 0.76 (+1%) |
| Scenario | MTR-Waymo* | MTR-SinD | MTR-HetroD |
|---|---|---|---|
| Agent type | |||
| Vehicle | 3.64 | 2.55 | 0.83 |
| Two-wheeler | 8.69 | 4.63 | 1.16 |
| Pedestrian | 2.85 | 1.17 | 0.26 |
| Cross-type TTC risk | |||
| High risk (TTC < 2 s) | 8.51 | 4.76 | 1.25 |
| Moderate (2 ≤ min TTC < 4 s) | 8.48 | 4.67 | 1.19 |
| Low (TTC > 4 s) | 7.87 | 4.00 | 0.90 |
| Scene-level heterogeneity | |||
| Low heterogeneity density | 5.01 | 3.08 | 0.90 |
| High heterogeneity density | 8.04 | 4.25 | 1.06 |
Distribution: Vehicles 39.4%, Two-wheelers 51.5%, Pedestrians 9.1% | TTC: High 26.5%, Moderate 8.7%, Low 64.8% | Heterogeneity: Low 66.3%, High 33.7%
SOTA Rule-based planners show increased collision rates in heterogeneous traffic scenarios.
| Dataset | Planner | NuPlan Score ↑ | TTC Bound ↑ | Progress ↑ | Lane Score ↑ | Comfort ↑ | Collisions ↓ |
|---|---|---|---|---|---|---|---|
| NuPlan | IDM | 0.85 | 0.94 | 0.92 | 0.99 | 0.48 | 0.016 |
| PDM-Closed | 0.83 | 0.97 | 0.91 | 0.99 | 0.31 | 0.006 | |
| HetroD | IDM | 0.68 | 0.91 | 0.81 | 0.89 | 0.37 | 0.074 |
| PDM-Closed | 0.70 | 0.95 | 0.78 | 0.97 | 0.21 | 0.040 |
| Planner | At-Fault Collision Rate | VRU Front Collision Rate | VRU Lateral Collision Rate |
|---|---|---|---|
| IDM | 0.074 | 0.008 | 0.031 |
| PDM-Closed | 0.040 | 0.004 | 0.022 |
Latent scenario embeddings from Wayformer trained on Waymo, revealing distinct clustering and prediction challenges across datasets
BibTex Code Here