Summary

Often times when creating reproducible Machine Learning pipelines (See Blog Article: AML Pipelines), the need to transfer data between various data stores arises. This articles shows the architecture for performing data transfer and links to a GitHub repository of a code sample walkthrough of this architecture.

Pipeline Architecture for Transferring Data

AML Pipeline View

Code Walkthrough

Code for the above architecture: AML Data Transfer GitHub

Step 1 01. Transfer Data Configuration.ipynb Configure necessary components to perform a Data Transfer in the next notebook

Step 2 02. Transfer Data.ipynb Transfer Data from Blob Storage to Azure SQL Database using an existing Azure Data Factory

DataTransferStep

A Data Transfer Step is used within an Azure ML Pipelines to transfer data between locations using an Azure Data Factory. Currently supported source and destinations include,

Data StoreSourceDestination
Azure Blob StorageYesYes
Azure Data Lake Storage Gen 1YesYes
Azure Data Lake Storage Gen 2YesYes
Azure SQL DatabaseYesYes
Azure Database for PostgreSQLYesYes
Azure Database for MySQLYesYes
from azureml.pipeline.steps import DataTransferStep

datatransferstep_name = 'transfer_blob_to_sql'

data_transfer_step = DataTransferStep(
  name = datatransferstep_name, 
  source_data_reference=blob_data_ref, 
  destination_data_reference=sql_query_data_ref,       
  compute_target=adf_compute, 
  source_reference_type='file', 
  #destination_reference_type=None, 
  allow_reuse=False) 

print("Data transfer step created")

Running a DataTransferStep

A Data Transfer Step comprises of Source DataStore and a Destination DataStore. For a walkthrough on DataStores, see Dealing with Data in AML. A Data Transfer Step is then added to a Pipeline and executed through an Experiment.

Conclusion

Azure Machine Learning services provides Pipelines as a mechanism to automated Machine Learning processes. Within those Pipelines, as this articles demonstrated, a DataTransferStep can be used to transfer data between two data stores using an Azure Data Factory.

Categories:

Comments are closed

About This Site

This may be a good place to introduce yourself and your site or include some credits.