“A satisfied customer is the best business strategy of all” – Michael Leboeuf
In this two-part series, we will explore how we engineer the Delivery Partner-ETA predictive model to navigate successfully around roadblocks like sparsity of road mapping and dense networks of inroads in smaller towns and cities.
Part 2 – How we built and deployed a DP-ETA model
ETA estimation, at its core, is a regression modelling problem. The data is structured like a non-negative & non-normal probability distribution, after the data exploration steps. For modelling structured data, gradient boosting trees have been shown to function as well as, if not better than, Neural Nets-based solutions. We examined popular gradient frameworks like XGboost LightGBM and Catboost, and chose to go ahead with LightGBM keeping in mind its quicker training pace, accuracy and flexibility.
Never underestimate the power of statistics –
We leverage the power of statistics, after employing finely planned data filtration procedures depending on speed, locations, errors, client interactions and several other factors. When the data is split by city, a non-negative right-skewed exponential distribution with a long tail is observed. The curve is best fitted by a Gamma-Poisson distribution. We evaluated a gamma distribution with L1 loss using basic temporal and geospatial features, which offered improved accuracy but increased model size.
Tweedie response can be a Poisson-Gamma response around some configurable Tweedie variance power parameter that may be iterated during training. By capturing the long tail events, the properties of Tweedie distribution with a log-link & ‘L1’ as the metric resulted in a superior fit of the model with a smaller memory footprint.
Divide and conquer –
Another method we employed was to create unique models for each city to better capture the city’s key traits for us. Similarly, the forecast was split into two parts: one from pickup to the drop-zone Geofence, and the other from drop time to handover.
Hyper-parameter tuning –
The increased number of models created a requirement for an auto-tuner. After rigorous testing, we depended upon LightGBMTuner in the Optuna hyperparameter optimisation framework. This optimization tunes key parameters, such as L1-L2 regularisation param, bagging frequency param and the number of leaves param, controlling the complexity and depth of the tree to prevent overfitting of the model.
Other tuning steps included tuning of Tweedie variance power, boosting types, learning rate, min boosting round, early_stopping, max_bin which essentially define the distribution characteristics, depth, number of trees, no. of bins for the feature, etc in the model.
Feature Engineering –
For the base model, we were prudent in choosing features and kept them to a minimum to ensure our results were meaningful. To preserve latency performance, the accuracy improvement threshold was kept high. Key features included Geospatial features like shipping & drop-zone coordinates, time-domain features, restaurant features. Transformative features included polar coordinates system with the origin at the restaurant, aerial distance as radius, and angle as the bearing between shipping and drop zone.
To speed up the serving process, we ran a few experiments, the key ones being the following –
To improve model performance in real-time serving, we used the Microsoft Hummingbird framework, which converted the model into tensors that could subsequently be served using ONNX, PyTorch, and Apache TVM-based serving. Although these optimisations reduced model size, the native LightGBM booster was able to manage higher throughput per instance with comparable latencies in CPU-based deployments.
ML Serving Design
Data scientists develop ML models for different business use cases and schedule automated training for regular updates of the model. The model gets logged in the S3 artifact store. The prediction cluster loads models from the artifact store and serves the incoming requests.
Upgrading the ML framework –
For our ML platform, we switched to a FastAPI-based serving platform from MLflow; serving which cut our framework latency in half. As Python is widely used in our machine learning and data science teams, it supports Python-based syntax, which aided us in our objective of democratising model deployment. It enabled us to serve an ensemble of models. Low latency also aided us in expanding the model’s depths in order to enhance predictions. We also put FastAPI-based serving of TVM, Pytorch, and ONNX-based models to the test which also showed similar improvements. The following graph shows the improvements in performance with FastAPI for the same traffic and model.
With MLflow serving:
With FastAPI serving:
With this experiment, we were able to see a 3x improvement in throughput per pod and a 2x improvement in latency. From our observations, MLflow takes a significantly longer time (7 – 8 milliseconds in our use case) to serve any request as compared to FastAPI (2 – 3 milliseconds in our use case).
All this led to improvement in mean error, absolute error, R2 score, and standard deviation using a limited number of features and distribution analysis. Without sacrificing much on compliance, the improvement in accuracy was the largest. As previously noted, optimising the model for compliance without compromising accuracy is more beneficial in our business case. We found tuneable scaling factors that transform the results into a significant increase in compliance matrices with only marginal loss in accuracy.
The overall cost of serving millions of calls was less than $10 (approx. INR 754.57) per day for the base model with limited features. Both CPU and memory costs were reduced to half as compared to our initial stable deployments.
Improving Systems and Operations –
The investigation into the causes of DP-ETA delays also indicates different improvements in processes and systems such as DP application, DP assignment, restaurant dispatch, and location databases.
We’ve been creating relevant nudges for other systems utilising process mining concepts. Shap was used to ensure that the model was easy to understand and that attributes had a role in determining the forecast. It also aided in the trimming of features. Another use case for this level of explainability is determining the source of the delay, which may aid us in developing better nudges for our system. It can also assist us in performing an inverse search to anticipate better restaurant locations.
We trained models with features from several domains, such as the DP domain and the Location domain, and then analysed them using model explainability with process mining principles to derive relevant inferences.
- Delivery Partner App – Enhancements to the UI/UX are critical in preventing many sources of misunderstanding for DPs, particularly to those who are new to the gig economy. On the app, the emphasis is on higher usage of navigation by providing seamless routing to it.
- Onboarding – Reinforced the usage of bike mobile holders by adding them to our onboarding kits. Initial training also included informative videos on how to use the DP app, as well as Google Maps navigation system tips and techniques.
- Delivery Partner Assignment System – As a byproduct of the feature engineering process, an equation-based ultra-fast ETA was created based on DP speed and medal features. This increased the compliance of the DP assignment system’s predictions by 4 percent and enhanced DP assignment and dispatch.
- Location – When the customer delivery location is snapped in the navigation system, it tends to be snapped on the nearest road rather than the road’s entrance point. This makes last-mile delivery navigation confusing for delivery partners.
Future Scope –
The successful deployment of the base model in production provided us with a headway to improve in all directions. These incremental gains will come from digging deeper into tree depths and using a larger number of features as inputs to the model. These advancements also made it possible to use auxiliary models with specific goals, such as the monsoon model for rains or the quantile model for high-stress delivery.
We’ll be working on upgrading the existing basic model into a larger ensemble with even higher accuracies in the coming weeks to capture these incremental gains.
Disclaimer – We don’t share our ETAs with our DPs or give them a countdown as it might create a sense of on-road urgency leading to rash driving. Additionally, in the above developments, we have increased our average predictions to adjust tolerance for our DPs to be within the predicted ETAs.
This blog was written in collaboration with Shubh Chaurasia and Siddhartha Agnihotri.
If you found this to be an exciting problem to solve and would like to be a part of our engineering team, please reach out to me on LinkedIn.