Sales Forecasting Using Data Mining

Sales Forecasting Using Data Mining

Accurately forecasting sale volumes is vital for retail companies. Currently, sales forecasting is normally fairly straightforward. One important exception is, however, when sales are affected by marketing campaigns or promotions. Specific tasks addressed in this project include: 

  • Combined effects of marketing and promotions. Promotional effects typically concern price elastics, so the problem is to find an optimal prize for maximizing revenues.
  • Increased sales due to marketing may be divided into accelerated sales, brand switching and incremental sales. The problem is to estimate these proportions correctly.
  • Cross selling, i.e. where the purchase of one product initiates other purchases, is of course quite beneficial. The problem here is to understand cross selling patterns. 

We will use generic data mining techniques for forecasting actual sales of specific products, during specific time frames. Technically, this is referred to as predictive regression or, if the problem has a temporal aspect, time series forecasting.


The main purpose of this project is to evaluate the use of data mining techniques for sales forecasting in the retail domain. The overall goal is to produce reliable models, while enabling market analysts to understand the reasons behind model predictions; i.e. models must be accurate and comprehensible. The project will also investigate techniques for estimating uncertainties in sales forecasting, using inspiration from the weather forecasting domain.

Research questions

This project will focus on the following four tasks: 

  • Optimizing ensemble accuracy: We will investigate how diversity measures can be utilized to improve predictive performance. In addition, we will look into the importance of the score function (i.e. the optimization criterion used for the modeling) to analyse how this choice affects the predictive performance.
  • Development of robust rule extraction methods: We will investigate methods that are more powerful than less informative approaches (e.g. sensitivity analysis or variable importance) but also more robust than approximating the opaque model with single decision trees or rule sets.
  • Development of algorithms for estimating ensemble forecasting uncertainties: We will investigate new methods for estimating ensemble uncertainty. Suggested methods should be evaluated on how well they rank instances; i.e. typically using the AUC metric.
  • Prototyping software: Suggested algorithms should be evaluated by analysts and managers at ICA using prototype software. These prototypes are therefore central to the entire project.