ZenML pipelines can be executed natively as Airflow DAGs. This brings together
the power of the Airflow orchestration with the ML-specific benefits of ZenML
pipelines. Each ZenML step can be run as an Airflow
PythonOperator
,
and executes Airflow natively. We will use a very simplified pipeline that consist of only three steps.
The pipeline performs a very simple operation. The subtraction of a random integer from a given integer.
Note that this tutorial installs and deploys Airflow locally on your machine, but in a production setting you might be already have a deployed Airflow instance.
If you're really in a hurry and just want to see this example pipeline run without wanting to fiddle around with all the individual installation and configuration steps, just run the following:
zenml example run airflow_orchestration
In order to run this example, you need to install and initialize ZenML and Airflow.
# install CLI
pip install zenml
# install ZenML integrations
zenml integration install airflow
# pull example
zenml example pull airflow_orchestration
cd zenml_examples/airflow_orchestration
# Initialize ZenML repo
zenml init
# Start the ZenServer to enable dashboard access
zenml up
zenml orchestrator register airflow_orchestrator --flavor=airflow
zenml stack register airflow_stack \
-a default \
-o airflow_orchestrator \
--set
ZenML takes care of configuring Airflow, all we need to do is run:
zenml stack up
This will bootstrap Airflow, start up all the necessary components and run them in the background. When the setup is finished, it will print username and password for the Airflow webserver to the console.
WARNING: If you can't find the password on the console, you
can navigate to the
<APP_DIR>/zenml/airflow_root/<ORCHESTRATOR_UUID>/standalone_admin_password.txt
file. The username will always be admin
.
APP_DIR
will depend on your OS. See which path corresponds to your OS here.ORCHESTRATOR_UUID
will be the unique id of the Airflow orchestrator. There will be only one folder here, so you can just navigate to the one that is present.
python run.py
Sometimes you don't want to run your pipeline only once, instead you want to schedule them with a predefined frequency.
To schedule the DAG to run every 3 minutes for the next 9 minutes, simply open run.py
and uncomment the lines at the
end of the file.
After a few seconds, you should be able to see the executed dag here
In order to clean up, tear down the Airflow stack and delete the remaining ZenML references.
zenml stack down --force
rm -rf zenml_examples
If you want to learn more about orchestrators in general or about how to build your orchestrators in ZenML check out our docs.