Driving digital transformation with machine learning


2 December 2015
Kevin Lee of GrabTaxi

Just-in-time analysis is emerging as the cornerstone of digital business transformation. With things happening at Internet pace, customer and business priorities are changing fast and continuously. Businesses are no longer thinking about applications the way they did in the past, said Rod Smith, IBM fellow and vice president of the IBM Emerging Internet Technologies organisation. Instead of specifying requirements and building applications to meet those requirements, they are now taking a more open ended approach and seeing where the projects will lead them. The focus is now on the pace at which they need to change, and this is transformative for businesses today.

Speaking at the inaugural Strata + Hadoop World 2015 in Singapore, Smith said machine learning will play an important role in this transformation. Today, complexity is a barrier to digital transformation. Multiple, different lines of businesses are accessing the same backend data. The great user experiences which sell apps require a lot of client-side interactivity. And there is a need to add speed into the mix - products now need to be created within days or weeks.

Machine learning can help address these issues and drive the transformation. But it has to be tailored to specific businesses and business needs, said Smith.  It is not enough to have machine learning; the learnings have to be transformed into a solution that can be made useful to a person on mobile device in particular point in time.

And this is what GrabTaxi is trying to do. GrabTaxi is platform that matches drivers and passengers, explained Kevin Lee, vice president of Data and Growth at GrabTaxi. The platform knows where the passenger is; where the drivers in that area are; and broadcasts the intended trip to these drivers. Whoever is the first to respond will get the booking.

However, things do not always flow that smoothly. Besides the taxis that are available, the broadcast also goes out to taxis that are not. For example, they may be on call, hired, or changing shift. “We are not linked to the system that is on board these taxis. Passengers doing a street hail can see the status of the taxi, but we don’t actually know.”

The driver has to process the information that GrabTaxi is sending to him on the app, for example, the pickup point, drop-off point, whether there are any discounts, whether the passenger is paying by cash for by credit. This, on top of deciding whether to respond to a street hail, whether to grab lunch, or needing to change shift.

So what happens is that drivers sometimes bid for a job without looking at the details, but cancel afterwards if they can immediately pick up street hailers or if the job is far away. The result- unhappy customers. And those drivers who do not get the booking may be inclined to make fast and frivolous bids the next time round.

To address this, GrabTaxi decided to build a predictive model of the driver’s bidding probability using historical data. This bidding probability model is combined with real time information, for example, the driver’s distance to the passenger, and a numerical score is assigned to each driver. The scores are then ranked for the assignment of jobs. This allows GrabTaxi to direct jobs to a much smaller pool of drivers who are likely to be available, instead of trying to introduce behavioural changes to address the issue.

The company started by looking at about 50 features covering parameters such as driver profile (single shift, split shift), vehicle (location, speed, type, age); performance (volume, frequency of jobs, type of jobs) , bookings (pickup/drop off, time of day) , payment (cash/ credit), weather, and many more. This was narrowed down to 10 features using dimensionality reduction, a process for reducing the number of random variables under consideration, and finally from 10 to five using exhaustive search, where different possibilities are tested. Conditional inference tree analysis was used to construct ROC (receiver operating characteristic) curves to compare the classification performance of seven machine learning models.

The resulting model enabled GrabTaxi to construct guarantees around how many drivers it had to broadcast to, to have a high probability that the job would be fulfilled. This enabled it to broadcast jobs to a smaller set of drivers and to minimise spam. The result – higher driver win rate, lower cancellation rate, and happier customers.

Predicting driver availability is just one of several areas that GrabTaxi is working on using machine learning and data science. Others include working out optimal prices, improving the management of drivers and passengers, and estimating arrival times.

“We strongly believe that machine learning on the unique data set that we have is what is going to drive a lot of competitive advantage in this space,” said Lee.