Overview

As more and more companies are relying on online sales, demand forecasting has become more critical today than ever. Often times, demand forecasting is done using previous human knowledge and intuition or at best using Excel Spreadsheets where prior sales are analyzed and predictions for future sales are made. However, this process can be improved further by combining multiple data sources (e.g. weather data, twitter data, Holidays, Sporting Events, etc.) into a Data Lake and using Machine Learning to create Forecasting Models on top of all of that data. This Blog article will discuss how Azure with it’s datalake and automated machine learning capabilities can achieve just that.

Details

Step 0 – Data Acquisition

Data can be acquired from any On-Premises or Cloud data source and moved into Azure Data Lake Gen 2 where tools like Spark and Pandas can be used to combine, analyze, and make data ready for Machine Learning training.

Step 1 – Machine Learning Model Training

Once the data is ready for Machine Learning, Azure Machine Learning service provides AutoML which is used to automatically build and tune the forecasting models. In fact, there are many configurations that be done by the end user such, featurizing the data further, selecting which algorithms AutoML can try, turning on Model Explainability to understand why a specific data point was scored a certain way, etc. Below is an example of an AutoML Configuration:

from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
from azureml.automl.core.forecasting_parameters import ForecastingParameters

import logging

target_column_name = 'Quantity'
dateColumn = 'WeekStarting'

training_data = data_train_ds

forecastingParams = ForecastingParameters(
    time_column_name=dateColumn,
    forecast_horizon=24,
    time_series_id_column_names=['Store', 'Brand'])

automl_config = AutoMLConfig(task='forecasting',
                             path = './project',
                             debug_log='automl_debuglog.log',
                             primary_metric='r2_score',
                             iteration_timeout_minutes = 20,
                             experiment_timeout_hours=1,
                             featurization='auto',
                             max_concurrent_iterations=15,
                             max_cores_per_iteration=-1,
                             enable_dnn=False,
                             enable_early_stopping=True,
                             n_cross_validations=3,
                             verbosity = logging.INFO,                             
                             compute_target=compute_target,
                             training_data=data_train_ds,
                             label_column_name=target_column_name,
                             forecasting_parameters = forecastingParams,
                             model_explainability=True)

Step 2 – Model Registration

Once AutoML training is completed, a data scientist can analyze all of the models generated along with their metrics to determine which model is the best one to deploy to production. Below is an example of a list of models that AutoML generated in one experiment run,

To help the Data Scientist Explain the model, “view explanation” can be used,

Step 3 – Model Handoff/Deployment

At this point, the data scientist has determined the best model and is now ready to deploy it. Deployment can be done for either realtime inferencing via Azure Kubernetes service or batch inferencing either by using Azure Machine Learning service Pipelines or embbeding the model .pkl file into an external application.

Step 4 – Perform Predictions for Future Data

As forecasts are needed (weekly, monthly), the model is invoked and forecast results are saved off to blob storage from where they can then be copied into any system that needs it.

Step 5 – Visualize Results

A visualization tool such as Power BI can then consume the results to show a business analyst the current demand and supply chain can be adjusted, as needed. Example trend lines showing forecasts for three Orange Juice Brands shown below,

GitHub Code Repository

https://github.com/mlonazure/AzureMachineLearning/tree/master/AutoML%20Forecasting

Summary

In conclusion, Azure Data Lake Gen 2 allows a company to store disparate datasets and combine them to use for training forecasting machine learning models. The training of these models is simplified and made more robust through the Azure Machine Learning AutoML. Once a model is trained, it can then be deployed to use for making future demand predictions.

Categories:

Tags:

Comments are closed

About This Site

This may be a good place to introduce yourself and your site or include some credits.