Skip to content

Commit

Permalink
Put changes on correct platform version
Browse files Browse the repository at this point in the history
  • Loading branch information
adamrtalbot committed Aug 15, 2024
1 parent 2fcda24 commit dcc64cc
Show file tree
Hide file tree
Showing 2 changed files with 133 additions and 151 deletions.
127 changes: 53 additions & 74 deletions platform_versioned_docs/version-23.4/compute-envs/azure-batch.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,41 +13,37 @@ Ensure you have sufficient permissions to create resource groups, an Azure Stora

## Concepts

### Regions

Azure regions are distinct geographic areas that contain multiple data centers, strategically located around the world, to provide high availability, fault tolerance, and low latency for cloud services. Each region offers a wide range of Azure services, and by choosing a specific region, users can optimize performance, ensure data residency compliance, and meet regulatory requirements. Azure regions also enable redundancy and disaster recovery options by allowing resources to be replicated across different regions, enhancing the resilience of applications and data.

### Resource group

An Azure Resource Group is a logical container that holds related Azure resources such as virtual machines, storage accounts, databases, and more. It serves as a management boundary, allowing you to organize, deploy, monitor, and manage all the resources within it as a single entity. Resources in a Resource Group share the same lifecycle, meaning they can be deployed, updated, and deleted together. This grouping also enables easier access control, monitoring, and cost management, making it a foundational element in organizing and managing cloud infrastructure in Azure.

### Accounts

Seqera Platform relies on an existing Azure Storage and Azure Batch account. You need at least 1 valid Azure Storage account and Azure Batch account within your subscription.

Azure uses 'accounts' for each service. For example, an [Azure Storage account][az-learn-storage] will house a collection of blob containers, file shares, queues, and tables. While you can have multiple Azure Storage and Azure Batch accounts in an Azure subscription, each compute environment on the platform can only use one of each (one storage and one Batch account). You can set up multiple compute environments on the platform with different credentials, storage accounts, and Batch accounts.

### Service Principal

An Azure Service Principal is an identity created for use with applications, hosted services, or automated tools to access Azure resources. It acts like a "user identity" with a specific set of permissions assigned to it. Seqera Platform can use an Azure Service Principal to authenticate and authorize access to Azure Batch for job execution and Azure Storage for data management. By assigning the necessary roles to the Service Principal, Seqera can securely interact with these Azure services, ensuring that only authorized operations are performed during pipeline execution.

## Azure Resources

### Resource group

An Azure Batch and Azure Storage account need to be linked to a resource group in Azure, so it is necessary to create one.
To create Azure Batch and Azure Storage accounts, first create a [resource group][az-learn-rg] in your preferred region.

:::note
A resource group can be created while creating an Azure Storage Account or Azure Batch account.
:::

### Regions

Azure resources can operate across regions, but this incurs additional costs and security requirements. It is recommended to place all resources in the same region. See the [Azure product page on data residency][az-data-residency] for more information.

## Resource group

A resource group in Azure is a unit of related resources in Azure. As a rule of thumb, resources that have a similar lifecycle should be within the same resource group. You can delete a resource group and all associated components together. We recommend placing all platform compute resources in the same resource group, but this is not necessary.

### Create a resource group

1. Log in to your Azure account, go to the [Create Resource group][az-create-rg] page, and select **Create new resource group**.
2. Enter a name for the resource group, e.g., _towerrg_.
3. Choose the preferred region.
4. Select **Review and Create** to proceed.
5. Select **Create**.

### Storage account
## Storage account

After creating a resource group, set up an [Azure storage account][az-learn-storage].

Expand Down Expand Up @@ -82,7 +78,7 @@ After creating a resource group, set up an [Azure storage account][az-learn-stor
Blob container storage credentials are associated with the Batch pool configuration. Avoid changing these credentials in your Seqera instance after you have created the compute environment.
:::

### Batch account
## Batch account

After you have created a resource group and storage account, create a [Batch account][az-learn-batch].

Expand Down Expand Up @@ -121,37 +117,7 @@ After you have created a resource group and storage account, create a [Batch acc
- **Spot/low-priority vCPUs**: Platform does not support spot or low-priority machines when using Forge, so when using Forge this number can be zero. When manually setting up a pool, select an appropriate number of concurrent vCPUs here.
- **Total Dedicated vCPUs per VM series**: See the Azure documentation for [virtual machine sizes][az-vm-sizes] to help determine the machine size you need. We recommend the latest version of the ED series available in your region as a cost-effective and appropriately-sized machine for running Nextflow. However, you will need to select alternative machine series that have additional requirements, such as those with additional GPUs or faster storage. Increase the quota by the number of required concurrent CPUs. In Azure, machines are charged per cpu minute so there is no additional cost for a higher number.

### Credentials

There are two Azure credential options available: primary Access Keys and a Service Principal. Primary Access Keys are simple to use but provide full access to the storage and batch accounts. Additionally, there can only be two keys per account, making them a single point of failure. A Service Principal, on the other hand, provides an account that can be granted access to Azure Batch and Storage resources, allowing for role-based access control with more precise permissions. Moreover, some Azure Batch features are only available when using a Service Principal instead of primary Access Keys.

:::note
The two Azure credential types use entirely different authentication methods. You can add more than one credential to a workspace, but only one can be used at a time. While they can be used concurrently, they are not cross-compatible, and access granted by one will not be shared with the other.
:::

#### Access Keys

1. Navigate to the Azure Portal and sign in.
2. Locate the Azure Batch account and select "Keys" under "Account management." Here, you will see the Primary and Secondary keys. Copy one of the keys and save it in a secure location for later use.
3. Locate the Azure Storage account and, under the "Security and Networking" section, select "Access keys". Here, you will see Key1 and Key2 options. Copy one of them and save it in a secure location for later use. Be sure to delete them after saving them in Seqera Platform.
4. In Seqera Platform, go to your workspace, select "Add a new credential," and choose the "Azure Credentials" type.
5. Enter a name for the credentials, such as _Azure Credentials_.
6. Add the **Batch account** and **Blob Storage** account names and access keys.

#### Service Principal

1. In the Azure Portal, navigate to "Microsoft Entra ID," and under "App registrations," click "New registration." See the [Azure documentation][az-create-sp] for more details.
2. Provide a name for the application. The application will automatically have a Service Principal associated with it.
3. Assign roles to the Service Principal. Go to the Azure Storage account, and under "Access Control (IAM)," click "Add role assignment."
4. Choose the roles "Storage Blob Data Reader" and "Storage Blob Data Contributor", then select "Members", click "Select Members", search for your newly created Service Principal, and assign the role.
5. Repeat the same process for the Azure Batch account, but use the "Azure Batch Contributor" role.
6. Seqera Platform will need credentials to authenticate as the Service Principal. Navigate back to the app registration, and on the "Overview" page, save the "Application (client) ID" value for use in Seqera Platform.
7. Then, click "Certificates & secrets" and select "New client secret." A new secret will be created containing a value and secret ID. Save both of these securely for use in Seqera Platform. Be sure to delete them after saving them in Seqera Platform.
8. In Seqera Platform, go to your workspace, select "Add a new credential," choose the "Azure Credentials" type, then select the "Service Principal" tab.
9. Enter a name for the credentials, such as _Azure Credentials_.
10. Add the Application ID, Secret ID, Secret, **Batch account**, and **Blob Storage** account names to the relevant fields.

## Seqera Platform
### Compute environment

There are two ways to create an Azure Batch compute environment in Seqera Platform:

Expand All @@ -169,35 +135,42 @@ Create a Batch Forge Azure Batch compute environment:
1. In a workspace, select **Compute Environments > New Environment**.
2. Enter a descriptive name, e.g., _Azure Batch (east-us)_.
3. Select **Azure Batch** as the target platform.
4. Choose existing Azure credentials or add a new credential.
5. Select a **Region**, e.g., _eastus_.
6. In the **Pipeline work directory** field, enter the Azure blob container created previously, e.g., `az://towerrgstorage-container/work`.
4. Choose existing Azure credentials or add a new credential. If you are using existing credentials, skip to step 7.

:::tip
You can create multiple credentials in your Seqera environment.
:::

5. Enter a name for the credentials, e.g., _Azure Credentials_.
6. Add the **Batch account** and **Blob Storage** account names and access keys.
7. Select a **Region**, e.g., _eastus_.
8. In the **Pipeline work directory** field, enter the Azure blob container created previously, e.g., `az://towerrgstorage-container/work`.

:::note
When you specify a Blob Storage bucket as your work directory, this bucket is used for the Nextflow [cloud cache](https://www.nextflow.io/docs/latest/cache-and-resume.html#cache-stores) by default. You can specify an alternative cache location with the **Nextflow config file** field on the pipeline [launch](../launch/launchpad.mdx#launch-form) form.
:::

7. Select **Enable Wave containers** to facilitate access to private container repositories and provision containers in your pipelines using the Wave containers service. See [Wave containers][wave-docs] for more information.
8. Select **Enable Fusion v2** to allow access to your Azure Blob Storage data via the [Fusion v2][nf-fusion-docs] virtual distributed file system. This speeds up most data operations. The Fusion v2 file system requires Wave containers to be enabled. See [Fusion file system](../supported_software/fusion/fusion.mdx) for configuration details.
9. Set the **Config mode** to **Batch Forge**.
10. Enter the default **VMs type**, depending on your quota limits set previously. The default is _Standard_D4_v3_.
11. Enter the **VMs count**. If autoscaling is enabled (default), this is the maximum number of VMs you wish the pool to scale up to. If autoscaling is disabled, this is the fixed number of virtual machines in the pool.
12. Enable **Autoscale** to scale up and down automatically, based on the number of pipeline tasks. The number of VMs will vary from **0** to **VMs count**.
13. Enable **Dispose resources** for Seqera to automatically delete the Batch pool if the compute environment is deleted on the platform.
14. Select or create [**Container registry credentials**](../credentials/azure_registry_credentials.mdx) to authenticate a registry (used by the [Wave containers](https://www.nextflow.io/docs/latest/wave.html) service). It is recommended to use an [Azure Container registry](https://azure.microsoft.com/en-gb/products/container-registry) within the same region for maximum performance.
15. Apply [**Resource labels**](../resource-labels/overview.mdx). This will populate the **Metadata** fields of the Azure Batch pool.
16. Expand **Staging options** to include optional [pre- or post-run Bash scripts](../launch/advanced.mdx#pre-and-post-run-scripts) that execute before or after the Nextflow pipeline execution in your environment.
17. Specify custom **Environment variables** for the **Head job** and/or **Compute jobs**.
18. Configure any advanced options you need:
9. Select **Enable Wave containers** to facilitate access to private container repositories and provision containers in your pipelines using the Wave containers service. See [Wave containers][wave-docs] for more information.
10. Select **Enable Fusion v2** to allow access to your Azure Blob Storage data via the [Fusion v2][nf-fusion-docs] virtual distributed file system. This speeds up most data operations. The Fusion v2 file system requires Wave containers to be enabled. See [Fusion file system](../supported_software/fusion/fusion.mdx) for configuration details.
11. Set the **Config mode** to **Batch Forge**.
12. Enter the default **VMs type**, depending on your quota limits set previously. The default is _Standard_D4_v3_.
13. Enter the **VMs count**. If autoscaling is enabled (default), this is the maximum number of VMs you wish the pool to scale up to. If autoscaling is disabled, this is the fixed number of virtual machines in the pool.
14. Enable **Autoscale** to scale up and down automatically, based on the number of pipeline tasks. The number of VMs will vary from **0** to **VMs count**.
15. Enable **Dispose resources** for Seqera to automatically delete the Batch pool if the compute environment is deleted on the platform.
16. Select or create [**Container registry credentials**](../credentials/azure_registry_credentials.mdx) to authenticate a registry (used by the [Wave containers](https://www.nextflow.io/docs/latest/wave.html) service). It is recommended to use an [Azure Container registry](https://azure.microsoft.com/en-gb/products/container-registry) within the same region for maximum performance.
17. Apply [**Resource labels**](../resource-labels/overview.mdx). This will populate the **Metadata** fields of the Azure Batch pool.
18. Expand **Staging options** to include optional [pre- or post-run Bash scripts](../launch/advanced.mdx#pre-and-post-run-scripts) that execute before or after the Nextflow pipeline execution in your environment.
19. Specify custom **Environment variables** for the **Head job** and/or **Compute jobs**.
20. Configure any advanced options you need:

- Use **Jobs cleanup policy** to control how Nextflow process jobs are deleted on completion. Active jobs consume the quota of the Azure Batch account. By default, jobs are terminated by Nextflow and removed from the quota when all tasks succesfully complete. If set to _Always_, all jobs are deleted by Nextflow after pipeline completion. If set to _Never_, jobs are never deleted. If set to _On success_, successful tasks are removed but failed tasks will be left for debugging purposes.
- Use **Token duration** to control the duration of the SAS token generated by Nextflow. This must be as long as the longest period of time the pipeline will run.

19. Select **Add** to finalize the compute environment setup. It will take a few seconds for all the resources to be created before the compute environment is ready to launch pipelines.
21. Select **Add** to finalize the compute environment setup. It will take a few seconds for all the resources to be created before the compute environment is ready to launch pipelines.

**See [Launch pipelines](../launch/launchpad.mdx) to start executing workflows in your Azure Batch compute environment.**

### Manual
## Manual

This section is for users with a pre-configured Azure Batch pool. This requires an existing Azure Batch account with an existing pool.

Expand All @@ -210,28 +183,35 @@ Your Seqera compute environment uses resources that you may be charged for in yo
1. In a workspace, select **Compute Environments > New Environment**.
2. Enter a descriptive name for this environment, e.g., _Azure Batch (east-us)_.
3. Select **Azure Batch** as the target platform.
4. Select your existing Azure credentials or select **+** to add new credentials.
5. Select a **Region**, e.g., _eastus (East US)_.
6. In the **Pipeline work directory** field, add the Azure blob container created previously, e.g., `az://towerrgstorage-container/work`.
4. Select your existing Azure credentials or select **+** to add new credentials. If you choose to use existing credentials, skip to step 7.

:::tip
You can create multiple credentials in your Seqera environment.
:::

5. Enter a name, e.g., _Azure Credentials_.
6. Add the **Batch account** and **Blob Storage** credentials you created previously.
7. Select a **Region**, e.g., _eastus (East US)_.
8. In the **Pipeline work directory** field, add the Azure blob container created previously, e.g., `az://towerrgstorage-container/work`.

:::note
When you specify a Blob Storage bucket as your work directory, this bucket is used for the Nextflow [cloud cache](https://www.nextflow.io/docs/latest/cache-and-resume.html#cache-stores) by default. You can specify an alternative cache location with the **Nextflow config file** field on the pipeline [launch](../launch/launchpad.mdx#launch-form) form.
:::

7. Set the **Config mode** to **Manual**.
8. Enter the **Compute Pool name**. This is the name of the Azure Batch pool you created previously in the Azure Batch account.
9. Set the **Config mode** to **Manual**.
10. Enter the **Compute Pool name**. This is the name of the Azure Batch pool you created previously in the Azure Batch account.

:::note
The default Azure Batch implementation uses a single pool for head and compute nodes. To use separate pools for head and compute nodes (e.g., to use low-priority VMs for compute jobs), see [this FAQ entry](../faqs.mdx#azure).
:::

9. Specify custom **Environment variables** for the **Head job** and/or **Compute jobs**.
10. Configure any advanced options you need:
11. Specify custom **Environment variables** for the **Head job** and/or **Compute jobs**.
12. Configure any advanced options you need:

- Use **Jobs cleanup policy** to control how Nextflow process jobs are deleted on completion. Active jobs consume the quota of the Azure Batch account. By default, jobs are terminated by Nextflow and removed from the quota when all tasks succesfully complete. If set to _Always_, all jobs are deleted by Nextflow after pipeline completion. If set to _Never_, jobs are never deleted. If set to _On success_, successful tasks are removed but failed tasks will be left for debugging purposes.
- Use **Token duration** to control the duration of the SAS token generated by Nextflow. This must be as long as the longest period of time the pipeline will run.

11. Select **Add** to finalize the compute environment setup. It will take a few seconds for all the resources to be created before you are ready to launch pipelines.
13. Select **Add** to finalize the compute environment setup. It will take a few seconds for all the resources to be created before you are ready to launch pipelines.

**See [Launch pipelines](../launch/launchpad.mdx) to start executing workflows in your Azure Batch compute environment.**

Expand All @@ -246,7 +226,6 @@ Your Seqera compute environment uses resources that you may be charged for in yo
[az-learn-jobs]: https://learn.microsoft.com/en-us/azure/batch/jobs-and-tasks
[az-create-rg]: https://portal.azure.com/#create/Microsoft.ResourceGroup
[az-create-storage]: https://portal.azure.com/#create/Microsoft.StorageAccount-ARM
[az-create-sp](https://learn.microsoft.com/en-us/entra/identity-platform/howto-create-service-principal-portal)

[wave-docs]: https://docs.seqera.io/wave
[nf-fusion-docs]: https://www.nextflow.io/docs/latest/fusion.html
Loading

0 comments on commit dcc64cc

Please sign in to comment.