Byte Sized – II (Data Science at Ola)

A few weeks ago, I attended a talk by P. Swaminathan, the head of data science at Ola Cabs, who did his M. Tech in CSE, at IIT Madras. The talk was part of a new IIT Madras technology lecture series called Tech Saloon. It was an interesting talk, which covered the data which is available to Ola Cabs, and the economics and technology behind its optimal use.

The speaker mentioned some of the data available to Ola, including real time taxi positions, customer app session details, ride details, GPS readings and smartphone sensor readings. Their main challenges include estimating the ETA (estimated time of arrival), using route optimization and navigation mapping, and dynamic pricing, which needs to be adjusted in real time, on the basis of peak hour use and aims to optimize driver profit margins, while also retaining their customers.

Ola uses graph models for ETA prediction, by modelling traffic density and optimal routes. They predict distances and quality of the road on the basis of the data they obtain from their own vehicles, as a result of the smartphones provided to the drivers. This gives them a “base-line” ETA, after which they incorporate real-time data (such as traffic and road-blocks) to correct this “base-line” ETA prediction.

As a result of the collective city-wide sensor-data mapping Ola has access to, their ETA predictions become extremely accurate, rivaling those of Google itself, especially in Indian cities. This is primarily because Google’s traffic modelling is based on Android phone GPS readings from users who opt-in for Google’s “My Location” feature in Google Maps, in addition to cell tower triangulation.


In Windows, the estimate is based on the latest file transfer rate, which can lead to some funny file dialogs.

From the smartphone sensor readings Ola receives, they can identify quality of the roads, and whether the driver is good, or rash. They send frequent customers drivers who drive well and are well established. They developed different algorithms for different cities, so as to cover high-traffic cities like Bangalore, as well as Tier-2 cities like Jaipur, which will obviously have different traffic densities.

The traffic data enables Ola to estimate future traffic trends as well, since their machine learning algorithms can learn relations between separate locations, and can identify geolocations with similar traffic characteristics. The K Means clustering algorithm can cluster using features such as the average speed, the number of cabs, or the time taken. This clustering gives a reasonable estimate of the traffic density.

Each trip is split into a series of “trip segments”, which are short, easily executable “mini-trips”, such as a U-turn, or a left turn at a signal. They are highly predictable, and have a low variance in transition times. To find the optimal route from a point A to point B, they use a variant of Djikstra’s Algorithm for optimal path planning, and also use route snapping algorithms to correct noisy GPS readings.

Ola is using the power of crowd sourcing, intelligently analyzing the data they collect, and using it to build a better user experience. I hope you liked my summary of Mr. Swaminathan’s talk about data science at Ola Cabs. Bouquets or brickbats are welcome in the comments below!


  1. Article on Google Crowdsourcing
  2. The Google Blog
  3. XKCD

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s