List of public large-scale datasets for autonomous driving research

2 minute read

Sep 2019, Audi released AEV Autonomous Driving Dataset (A2D2) an open multi-sensor dataset for autonomous driving research.

Aug 2019, Waymo released Open Datasets which comprised of high-resolution sensor data collected by Waymo self-driving cars in a wide variety of conditions.

Jul 2019, Lyft released Level 5 Datasets, a subset of their autonomous driving data collected by Lyft’s Level 5 team with high-quality data from camera and LIDAR sensors.

Mar 2019, Aptiv released nuScenes Datasets which comprised of labelled data of comprehensive autonomous vehicle multi-sensor suite.

May 2018, Berkeley released DeepDrive datasets includes Instance segmentation, object detection, drivable areas, lane markings as part of challenges at their hosted CVPR 2018 Workshop on Autonomous Driving.

Side note: Audi has also uploaded their A2D2 Datasets onto Registry of Open Data on AWS and simplify the setup for working on their datasets.

Owner Dataset Feature License
Audi A2D2 - 2D semantic segmentation
- 3D point cloud labels
- 3D bounding boxes
- Unlabelled sensor data
CC BY-ND 4.01
Waymo Open
- Labelled Camera Data
- Labels for LiDAR
- 3D bounding boxes
- Sensor data
Non-Commercial Use
Lyft Level 5 - Labelled Camera Data, LiDAR
- 3D bounding boxes
- Drivable surface map
- Spatial semantic map
CC BY-NC-SA 4.02
Aptiv nuScences - Full sensor suite (LiDAR, RADAR, Camera, IMU, GPS)
- 3D bounding boxes
- Detailed map information
CC BY-NC-SA 4.02
Berkeley DeepDrive - Labelled Caemera Data, IMU, GPS)
- 2D bounding boxes
- Drivable surface map
- Lane Markings
BSD 3-Clause “New”
or “Revised”3

Summary Table of Autonomous Driving in Urban Area Datasets

The common goal is an in-depth understanding of perception for autonomous driving vehicles in complex environment such as the urban area.

While some datasets focus on imaging technologies, others also offer spatial map, surface map, lane marking and furthermore.

"Close-up with RGB adaptation at AEV HQ of Autonomous Driving Dataset (A2D2)" Audi Electronics Venture GmbH public their Autonomous Driving Dataset (A2D2). Close-up with RGB adaptation at AEV HQ.

Another interesting aspects is their licenses where Waymo, Lyft, Aptiv (Waymo Dataset License Agreement for Non-Commercial Use, CC BY-NC-SA 4.02) explicitly state that the dataset is intended for research and require a license for any Commercial Use.

On the other hand, Berkeley and Audi with their license (BSD 3-Clause3, CC BY-ND 4.01) means that it is permissble for commercialization of the technology developed based on such datasets under one modification: the copyrights from the original dataset left intact.

Honorary mentions: Cityscapes with Semantic Urban Scene Dataset and Stanford AI Laboratory with 3D Object Representation for Car Dataset

  1. Creative Commons Attribution–NonDerivative 4.0  2

  2. Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License  2 3

  3. Berkeley Software Distribution  2