Azure batch

services:
- azure batch account
- azure storage account
- azure container registry: hosting docker images
- azure service principle: allow tasks to pull from azure container registry
- ?data factory: could be useful for parameterised running, but expect just upload script with configuration?

Useful tools

https://azure.github.io/BatchExplorer/

notes

Structure of running jobs:

Pools
- Define VM configuration for a job
- Best practice
  - Pools should have more than one compute node for redundancy on failure
  - Have jobs use pools dynamically, if moving jobs move them to new pool and once complete delete the old pool
  - Resize pools to zero every few months
Applications
- Zipped code python code, may need dependencies too?
Jobs
- Set of tasks to be run
- Best practice
  - 1000 tasks in one job is more efficient than 10 jobs with 100 tasks
  - Job has to be explicitly terminated to be completed, onAllTasksComplete property/maxWallClockTime does this
Tasks
- individual scripts/commands
- Best practice
  - task nodes are ephemeral so any data will be lost unless uploaded to storage via OutputFiles
  - retention time is a good idea for clarity and cleaning up data
  - Bulk submit collections of up to 100 tasks at a time
  - should build for some retry to withstand failure
Images
- Custom images with OS
- the storage blob containing the VM?
- conda from: linux datascience vm
  - windows has python 3.7
  - linux has python 3.5, but could install fstrings

options for running your own packages

All of these are defined at the pool level.

Define start task
- Each compute node runs this command as it joins the pool
- Seems slow wasteful to run this for each node
create an application=package
- zip file with all dependencies
- can version these and define which version you want to run
- Issue with default version of Python on azure batch linuxa
- Seems like a pain to do and redo when updating requirements or applications
Use a custom image
- limit of 2500 dedicated compute nodes or 1000 low priority nodes in a pool
- can create a VHD and then import it for batch service mode
- linux image builder or can use Packer directly to build a linux image for user subscription mode
- Seems like a reasonable option if the framework is stable
Use containers
- can prefetch container images to save on download
- They suggest storing and tagging the image on azure container registry
  - Higher cost tier allows for private azure registry
- Can also pull docker images from other repos
- Most flexible option without having too much time sent on node setup

Conatiner workloads

Can use docker images or any OCI images.
- Is there a benefit for sinularity here?
VM without RDMA
- Publisher: microsoft-azure-batch
- Offer: centos-container
- Offer: ubuntu-server-container
need to configure batch pool to run container workloads by ContainterConfiguration settings in the Pool's VirtualMachineConfiguration
prefetch containers - Use Azure container registry in teh same region as the pool

image_ref_to_use = batch.models.ImageReference(
    publisher='microsoft-azure-batch',
    offer='ubuntu-server-container',
    sku='16-04-lts',
    version='latest')

"""
Specify container configuration, fetching the official Ubuntu container image from Docker Hub.
"""

container_conf = batch.models.ContainerConfiguration(
    container_image_names=['custom_image'])

new_pool = batch.models.PoolAddParameter(
    id=pool_id,
    virtual_machine_configuration=batch.models.VirtualMachineConfiguration(
        image_reference=image_ref_to_use,
        container_configuration=container_conf,
        node_agent_sku_id='batch.node.ubuntu 16.04'),
    vm_size='STANDARD_D1_V2',
    target_dedicated_nodes=1)
...

maybe try batch shipyard exists for deploying HPC workloads,
- nice monitoring, task factory based on parameter sweeps, random or custom python generators
- might be a bit more than we need.

Orchestrating via python API

python batch examples
- ran the first few examples, straightforward

Running python scripts in batch

Running python script in azure

using the batch explorer tool, can find the data science desktop

data factories

select VM with start task for installing requirements
use and input and ouput storage blobs for input and output
create an azure data factory pipeline to run the python script on inputs and upload outputs

Running docker container, orchestrated by python API

Have deployed simple docker project https://github.com/stefpiatek/azure_batch-with_docker
- uses azure container registry for hosting docker images
- uploads multiple scripts and have a node run a script each
- then post-processing task run on one node (would be aggregation of runs)

Azure pipelines for building and pushing to container registry

azure pipelines guide
Need to have an azure DevOps organisation, need to be an admin of the Azure DevOps project
Create pipeline using azure-piplines.yml (dev ops generates one for you)
Can automatically generate your tag with the commit id
Or only build when you've explicitly tagged in git

TLO Model Wiki

Azure batch

Azure batch

Useful tools

notes

options for running your own packages

Conatiner workloads

Orchestrating via python API

Running python scripts in batch

Running docker container, orchestrated by python API

Azure pipelines for building and pushing to container registry

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally