India Driving Dataset (IDD): Powering AI for the Real-World Roads of India

By Dr. Anbumani Subramanian
Adjunct Faculty, IIIT-Hyderabad
Published by IHub-Data, IIIT Hyderabad


Introduction

As autonomous driving and mobility technologies advance across the world, they are largely trained and tested on structured, rule-bound traffic environments—far removed from the chaotic, richly textured roads of India.

The India Driving Dataset (IDD), developed at IIIT-Hyderabad and supported by IHub-Data, steps into this gap. A large-scale, open-access dataset, IDD offers the AI and research community a window into the realities of unstructured driving conditions, like on the Indian roads, unlocking new possibilities for better algorithm improvement and development in autonomous navigation, computer vision, and machine learning.

Over the past few years, IDD has grown into a globally recognized resource, used by thousands of researchers, students, and mobility startups to train, test, and validate AI models that can handle complexity and diversity at scale.


Origins: Local Problem, Global Relevance

In December 2017, a group of researchers, industry experts, and entrepreneurs came together with a shared vision: India’s roads—with their unpredictability, dense traffic, and cultural nuances—deserved dedicated research attention if autonomous technologies were to truly scale.

One of the biggest gaps identified? High-quality, curated data that reflected driving realities in unstructured conditions.

 Led by Prof. C.V. Jawahar and supported by Intel, the team at IIIT-Hyderabad launched the India Driving Dataset—setting out to democratize access to data and push forward innovation in unstructured driving environments. The first version was released in 2018, and the momentum has not stopped since.


What’s in the Dataset?

IDD is no longer a single dataset— it is a family of datasets, each designed to address a different research question. Together, they cover detection, segmentation, temporal modeling, 3D perception, traffic sign recognition, and more.

IDD Detection

40,000+ images with bounding box annotations
Used for object detection tasks like pedestrian and vehicle identification.

IDD Segmentation

10,000 finely annotated images
For semantic scene understanding. Includes over 20K images for train/validation/test.

IDD Multimodal

Stereo images, LIDAR data, GPS & CAN bus info
Supports sensor fusion, localization, and 3D scene reconstruction.

IDD Lite

A compact 50MB version with 7 core classes
Ideal for low-resource environments or quick prototyping.

IDD Temporal

Adjacent (+/- 15) frames to segmentation images
Enables motion prediction, tracking, and temporal consistency tasks.

IDD-3D

Annotated camera and LIDAR data
Useful for depth estimation and 3D object recognition.

IDD MTSVD (Missing Traffic Signs Video Dataset)

70+ categories of traffic signs
For sign detection in dynamic, real-world scenes.

IDD-FGVD (Fine-Grained Vehicle Dataset)

Vehicle types in a 3-level hierarchy
Enables fine-grained classification of Indian vehicles.


IDD Is Picking Up Steam—Globally and Nationally

The traction IDD has gained is remarkable—and growing.

  • Over  10,000 registered users
  • 15,000+ downloads
  • Users from 88+ countries, across research labs, universities, startups, and global corporations

 It is not just the numbers— it is the quality of use. IDD has been featured in top-tier global conferences including:

  • CVPR, ICCV, ECCV – Flagship computer vision venues where dedicated data challenges in AutoNUE workshop on autonomous navigation
  • AutoNUE
  • NCVPRIPG – India’s leading vision and pattern recognition conference

Across academia and industry, IDD is being used to benchmark new algorithms, validate performance under challenging conditions, and push forward the frontier of real-world AI.


How IDD Is Being Used

In Academia

  • Research papers are increasingly citing IDD in segmentation, detection, and scene understanding tasks.
  • PhD students and postgraduates are using IDD for theses and course projects.
  • Professors are adopting IDD Lite in teaching AI and computer vision fundamentals.

In Startups and Industry

  • Indian startups are prototyping intelligent transportation systems using IDD as a base.
  • Global companies exploring emerging markets are  using IDD data in their research.
  • Autonomous tech players are testing edge-case behavior using IDD-Temporal and IDD-Multimodal.

In Open Challenges and Hackathons

  • IDD has powered public challenges where innovators try to outperform existing models.
  • These competitions are encouraging community contributions and model improvements.

Why This Matters

Many datasets in AI are trained on ideal or sanitized conditions—structured lanes, regulated signs, predictable environments. But the real world isn’t always like that—especially not in regions like India.

By offering open, well-annotated, and diverse data from Indian roads, IDD ensures that the next generation of AI systems are resilient, inclusive, and globally relevant.

It’s not just about India—it’s about ensuring that AI works everywhere for everyone.


What’s Next?

As interest grows in emerging market mobility, smart cities, and real-world AI deployment, IDD will continue to expand—with more data types, annotations, and tools for researchers.

  • New releases are in the pipeline
  • Improved documentation and usage examples are being added
  • The community of users and contributors continues to grow

With IDD, India is not just consuming AI innovation—it is actively contributing to it.


Explore the Dataset

Visit the official IDD homepage:
👉 https://idd.insaan.iiit.ac.in/ (or https://india-data.org/datasets-listing/smart-mobility)


About the AuthorDr. Anbumani Subramanian https://sites.google.com/view/anbumani
Adjunct Faculty, IIIT-Hyderabad
anbumani@iiit.ac.in

Publisher

Soumya Das Bhaumik, IHUB-Data
LinkedIn URL: https://www.linkedin.com/in/skdasbhaumik/

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top