HetroD Challenge v1.0 Evaluation

Challenge Overview

A heterogeneous traffic simulation benchmark.

HetroD Challenge v1.0 focuses on generating realistic heterogeneous closed-loop traffic simulation in dense urban scenes.

The benchmark emphasizes mixed-agent interaction, vulnerable road-user safety, and type-balanced evaluation instead of vehicle-only performance.

Closed-loop rollout generation for vehicles, two-wheelers, and pedestrians.
Hidden test evaluation with official HetroD metrics.
Submission deadline: September 5, 2026 AoE.

Prize

Awards and certificates for the top three teams.

1st Place USD 500

Certificate included.

2nd Place USD 300

Certificate included.

3rd Place USD 200

Certificate included.

Public Package

Train and validation GT are public; test GT is hidden.

Train: 5087 scenarios with ScenarioNet inputs.
Validation: 955 scenarios with ScenarioNet inputs.
Test: 955 ScenarioNet-compatible inputs with future targets masked.

Submission

Submit one zipped folder of test rollouts.

Each submitted pickle must contain all required agents from metadata.required_agent_ids and 80 future steps in global (x, y, z, yaw) coordinates.

HetroD Challenge teaser showing heterogeneous traffic participants in an urban scene. — Heterogeneous urban traffic scenes combine vehicles, scooters, cyclists, and pedestrians in close-range interactions.

Challenge Timeline

Submit before the final deadline.

1

Develop

Train + validation

Use the public split and the GitHub evaluator for local validation.

2

Submit

September 5, 2026 AoE

Zip the test rollout folder and fill out the Google submission form.

3

Score

Hidden test evaluation

The organizers compute the final score with hidden test GT.

Files

Public data and evaluator layout.

The public package contains ScenarioNet inputs and GT pickles for train and validation; the test package contains inputs only.

Please use this link to download the dataset: https://levelxdata.com/hetrod-dataset/.

HetroD-Challenge-v1.0-public/
  train/{gt,scenarionet}/
  valid/{gt,scenarionet}/
  test/input/
  manifests/
  split_summary.json

Evaluation Metrics

Type-balanced metrics for heterogeneous interaction.

The official score combines kinematic realism, safety, cross-type interaction, and a quality-gated coverage bonus.

0.30 Kinematic

0.35 Safety

0.25 Cross-type

0.10 Coverage bonus

Base = 0.30 Kinematic + 0.35 Safety + 0.25 Cross-type Coverage Bonus = 0.10 * Coverage * Kinematic * Safety Overall = Base + Coverage Bonus

1. Kinematic

Realism by type.

Linear Speed
Linear Acceleration
Angular Speed
Angular Acceleration

Each metric is computed for vehicle, two-wheeler, and pedestrian, then macro-averaged.

2. Safety

Collision and valid region behavior.

Collision with 0.1 m annotation tolerance
Type-aware valid region margin

Safety = 0.5 Collision + 0.5 Valid Region

3. Cross-type Interaction

Close-range heterogeneous behavior.

Cross-type distance proximity to GT
Cross-type time-to-proximity proximity to GT

Only unique vehicle-pedestrian, vehicle-two-wheeler, and pedestrian-two-wheeler pairs are used.

4. Coverage

Multi-rollout occupancy diversity.

Rasterized BEV occupancy
Type-wise average
Quality-gated bonus

Coverage rewards diverse valid rollouts without rewarding unsafe or unrealistic spread.

Submission

Submit one folder of scenario-level PKL files.

The final zip should contain exactly one rollout pickle for every scenario listed in manifests/test.txt.

your_team_submission.zip
  your_team_submission/
    <scenario_id_0>.pkl
    <scenario_id_1>.pkl
    ...

{
    "agent_id": ...,          # [num_agents]
    "simulated_states": ...,  # [32, num_agents, 80, 4]
}

scenario_id is the top-level input id.
agent_id should use metadata.required_agent_ids.
simulated_states must contain exactly 32 rollouts and store future steps 11..90 as (x, y, z, yaw).

Please fill out the Google submission form before September 5, 2026 AoE: https://forms.gle/L9mcG5TQMa3mkhr7A

Leaderboard

Placeholder leaderboard for hidden test evaluation.

Official rankings will report only the overall score and four major metric groups.

Rank	Team	Overall	Kinematic	Safety	Cross-type	Coverage
Oracle	GT Reference	0.897	1.000	0.992	1.000	0.000
1	Team Placeholder	-	-	-	-	-
2	Team Placeholder	-	-	-	-	-
3	Team Placeholder	-	-	-	-	-