Description

Machine Learning (ML) Pipelines are used to automate the ML training processes (Feature Engineering, Train Mode, Register Model, Deploy Model) and to perform batch inferencing (Note that realtime inferencing is done through an AKS endpoint and Azure Functions; see How and Where to Deploy).

In the Azure ML SDK, there is a Pipeline Class (ParallelRunStep Class for batch Inference) that is used to create the pipelines. A full list of Pipeline Steps is Steps Package. Below are the most common ones:

Note: Each Step in a Pipeline runs on its own Compute Target which provides the flexibility of having multiple training clusters with the appropriate configurations for the Step being performed. For example, if a PythonScriptStep is running a Tensorflow script and requires GPUs then a cluster with GPUs can be used while a second Step might be another PythonScriptStep that is doing a much simpler python script can use a smaller, maybe even one node cluster.

Major Steps for a creating a Training Pipeline

  1. Construct Pipeline by defining multiple steps each with an appropriate compute target to run on.
  2. Test Pipeline through Experiment.Submit(your_pipeline)
  3. Publish Pipeline (Note: a new Pipeline Id is created everytime it is published, therefore Step (4) creates an EndPoint that stays constant)
  4. Publish a PipelineEndPoint
  5. Automate the Pipeline run either through a Pipeline Schedule, Azure DevOps or Azure Data Factory.

Walkthrough and Code Samples

Pipelines in Studio

Categories:

Comments are closed

About This Site

This may be a good place to introduce yourself and your site or include some credits.