PredictBench successfully predicts product classifications for one of the world’s largest ecommerce and FMCG companies

by Guido Tapia

in software-engineering,

June 2, 2015

As any large FMCG (CPG) is aware, classifying products correctly is critical to having good analytics capabilities. It is also clear to any global organisation that this is a surprisingly difficult task to achieve. Most regions use different classifications and combining them on a global scale is a non-trivial task. It is such a difficult task that many organisations simply ignore it and miss out on potential insights from a global view of product sales.

The Otto Group is a German ecommerce company that sells tremendous amounts of goods and they recently released a dataset to address this exact issue. Given details of over 200,000 products it was the data scientist’s job to correctly distinguish between Otto’s main product categories.

We used PredictBench to tackle this job and we had amazing accuracy in classification. In fact the PredictBench team was able to come within 0.021 points of the optimal solution which was no mean feat, beating out around 3500 teams from around the world. Our final position in this challenge was 16th (out of 3514).

On this project we teamed up with American data scientist Walter Reade who brought invaluable experience and knowledge to the PredictBench team.

Working together with Walter we were able to put together and ensemble of hundreds of models including linear models, neural nets, deep convolutional nets, tree based ensembles (random forests and gradient boosted trees) and many others. The huge scale of the final solution goes to show how incredibly complex this problem was and the skills that were required to achieve such amazing results.

Working with Otto data and teaming up Walter was a great boon to the PredictBench team and we hope to replicate this in the near future.