This example uses sample data that is available from within the IRIS python library for processing and analysing gridded Earth Science data.
Here, we demonstrate a scenario where that data is not obtained via the library package, but instead already exists in Azure Storage.
The sample data can be obtained here, and the relevant files need to be uploaded manually to a previously configured Azure ML linked datastore to run this example. In this example, the linked datastore is called azuregigdatalake_bronze
, referring to the bronze container of an Azure datalake (gen2).
👉 sample directory here, start with reviewing pipeline 2, and the components it references.
▶️ Quickstart: To run the pipeline with the Azure ML cli v2, update the data locations in the pipeline yml to point to your datastore locations, and ensure a compute target namedcpu-cluster
exists in your workspace. Then runaz ml job create pipelines/2-pipeline-two-step.yml
The pipeline components read the data from mounted storage location, and write the result out again to mounted storage.
The image below illustrates mounting data locations on job submission:
Output of pipeline 1 in the Azure ML job portal:
Logging diagnostic images to the pipeline run with mlflow
Verifying lazy loading of the iris cube data mounted within the compute, using mlflow metrics logging.
[todo]