In this tutorial we will guide you through how to deploy a CI/CD flow for Zeppelin notebooks from the ground on AWS.
At first you have to deploy our Pipeline Control Plane which takes care all the hard work.
Follow the steps below for hosting Pipeline Control Plane
on AWS
.
On AWS
we use a Cloudformation template in order to provision a Pipeline control plane.
The control plane image (AMI) is currently published to one region, eu-west-1
aka Ireland. When launching the control plane please pass the following ImageId ami-ece5b095
.
- AWS account
- AWS EC2 key pair
For creating the control plane launcher through command line take a look at .env.example
as a start to learn what environment variables are required by the Makefile
. Note the makefile uses aws
cli which needs to be installed first if not available on the machine.
- deploy -
make create-aws
- delete -
make terminate-aws
-
Select
Specify an Amazon S3 template URL
and add the URL to our templatehttps://s3-eu-west-1.amazonaws.com/cf-templates-grr4ysncvcdl-eu-west-1/2018079qfE-new.templateejo9oubl16
-
Fill in the following fields on the form:
-
Stack name
-
AWS Credentials
- Amazon access key id - specify your access key id
- Amazon secret access key - specify your secret access key
-
Control Plane Instance Config
- InstanceName - name of the EC2 instance that will host the Control Plane
- ImageId -
ami-ece5b095
- KeyName - specify your AWS EC2 key pair
-
Pipeline Credentials
- Github Client - GitHub OAuth
Client Id
- Github Secret - Github OAuth
Client Secret
- Github Client - GitHub OAuth
-
Banzai-Ci
- Orgs - comma-separated list of Github organizations whose members to grant access to use Banzai Cloud Pipeline's CI/CD workflow
-
Grafana Dashboard
- Grafana Dashboard Password - specify password for accessing Grafana dashboard with defaults specific to the application
-
Prometheus Dashboard
- Prometheus Password - specify password for accessing Prometheus that collects cluster metrics
-
Advanced Pipeline Options
- PipelineImageTag - specify
0.3.0
for using current stable Pipeline release.
- PipelineImageTag - specify
-
Slack Credentials
- this section is optional. Complete this section to receive cluster related alerts through a Slack push notification channel.
-
Alert SMTP Credentials
- this section is optional. Fill this section to receive cluster related alerts through email.
-
-
Finish the wizard to create a
Control Plane
instance.
Check the output section of the deployed cloud formation template for the endpoints where the deployed services can be reached:
- PublicIP - the IP of the host where Pipeline is running
- Pipeline - the endpoint for the Pipelne REST API
- Grafana - the endpoint for Grafana
- PrometheusServer - the endpoint for federated Prometheus server.
Let's start with our example Zeppelin project
Fork the repository into your GitHub account. You'll find a couple of Banzai Cloud Pipeline CI/CD flow descriptor templates for the released cloud providers (Amazon, Azure). Make a copy of the template corresponding to your chosen cloud provider and name it .pipeline.yml. This is the Banzai Cloud Pipeline CI/CD flow descriptor which is one of the spotguides associated with the project.
On the Drone UI running on the Banzai Cloud control plane enable the build for your fork. On the project's build details section add the secrets needed (Pipeline endpoint, credentials). Check the descriptor for any placeholders and substitute them with your corresponding values.
Note: there is a video for our Spark CI/CD example available driving through the CI/CD UI
That's all, your project is now configured for the Banzai Cloud Pipeline CI/CD flow! The flow will be triggered whenever a new change is pushed to the repository (configurable on the UI).
The Banzai Cloud Pipeline CI/CD flow descriptor has to be named .pipeline.yml; it contains the steps of the flow that are executed sequentially. (The Banzai Cloud Pipeline CI/CD flow is using Drone in the background, however the CI/CD flow descriptor is an abstraction that is not directly tied to any particular product or implementation. It can also be wired to use CircleCI or Travis).
The example descriptor has the following steps:
create_cluster
- creates or reuses a (managed) Kubernetes cluster supported by Pipeline like EC2, AKS, GKE- install tooling for the cluster (using helm charts):
install_monitoring
cluster monitoring (Prometheus)install_spark_history_server
Spark History Serverinstall_zeppelin
Zeppelin
Note: these steps related to the infrastructure are only done once, and reused after the first run if the cluster is not deleted as the last step
remote_checkout
check out the code from the git repositoryrun
run the Notebook
You can name the steps as you wish, they are only used for delimiting the phases of the flow. Steps are implemented as Docker containers that use the configurations items passed in the flow descriptor (step section).
Note that compared to manually setting up History Server and event logging using the Banzai Cloud Pipeline CI/CD flow you only have to replace the S3 bucket / Blob container name in install_spark_history_server.logDirectory
and install_zeppelin.deployment_values.zeppelin.sparkSubmitOptions.eventLogDirectory
properties in .pipeline.yml with your own.
As of the version 0.2.0 of Banzai Cloud Pipeline supports the deployments and use of the Spark History Server. The details of the Spark jobs can be checked as usual on the History Server UI. Information provided by the Spark jobs are saved to a persistent storage (s3, wasb) and the Spark History Server reads and displays it. This way the details of the execution can be kept even after the Kubernetes cluster is destroyed.
If you don't destroy the infrastructure as part of the Banzai Cloud Pipeline CI/CD flow, you can query the available endpoints (zeppelin, monitoring, spark history server) by issuing a GET request:
curl --request GET --url 'http://[control-plane]/pipeline/api/v1/clusters/{{cluster_id}}/endpoints'
Warning! Be aware that clusters created with the flow on the cloud provider costs you money. It's advised to destroy your environment when the development is finished (or at the end of the day). If you are running on AWS you might consider using spot instances and our watchguard to safely run spot clusters in production with Hollowtrees