diff --git a/platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx b/platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx index d2efd23ad..e52e967ac 100644 --- a/platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx +++ b/platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx @@ -8,116 +8,172 @@ tags: [azure, batch, compute environment] :::note This guide assumes you already have an Azure account with a valid Azure Subscription. For details, visit [Azure Free Account][az-create-account]. -Ensure you have sufficient permissions to create resource groups, an Azure Storage account, and a Batch account. +Ensure you have sufficient permissions to create resource groups, an Azure Storage account, and an Azure Batch account. ::: -## Concepts +## Azure concepts -### Accounts +#### Regions -Seqera Platform relies on an existing Azure Storage and Azure Batch account. You need at least 1 valid Azure Storage account and Azure Batch account within your subscription. +Azure regions are specific geographic locations around the world where Microsoft has established data centers to host its cloud services. Each Azure region is a collection of data centers that provide users with high availability, fault tolerance, and low latency for cloud services. Each region offers a wide range of Azure services that can be chosen to optimize performance, ensure data residency compliance, and meet regulatory requirements. Azure regions also enable redundancy and disaster recovery options by allowing resources to be replicated across different regions, enhancing the resilience of applications and data. -Azure uses 'accounts' for each service. For example, an [Azure Storage account][az-learn-storage] will house a collection of blob containers, file shares, queues, and tables. While you can have multiple Azure Storage and Azure Batch accounts in an Azure subscription, each compute environment on the platform can only use one of each (one storage and one Batch account). You can set up multiple compute environments on the platform with different credentials, storage accounts, and Batch accounts. +#### Resource groups -### Resource group +An Azure resource group is a logical container that holds related Azure resources such as virtual machines, storage accounts, databases, and more. A resource group serves as a management boundary to organize, deploy, monitor, and manage the resources within it as a single entity. Resources in a resource group share the same lifecycle, meaning they can be deployed, updated, and deleted together. This also enables easier access control, monitoring, and cost management, making resource groups a foundational element in organizing and managing cloud infrastructure in Azure. -To create Azure Batch and Azure Storage accounts, first create a [resource group][az-learn-rg] in your preferred region. +#### Accounts -:::note -A resource group can be created while creating an Azure Storage Account or Azure Batch account. -::: +Azure uses accounts for each service. For example, an [Azure Storage account][az-learn-storage] will house a collection of blob containers, file shares, queues, and tables. An Azure subscription can have multiple Azure Storage and Azure Batch accounts - however, a Platform compute environment can only use one of each. Multiple Platform compute environments can be created to use separate credentials, Azure Storage accounts, and Azure Batch accounts. -### Regions +#### Service principals -Azure resources can operate across regions, but this incurs additional costs and security requirements. It is recommended to place all resources in the same region. See the [Azure product page on data residency][az-data-residency] for more information. +An Azure service principal is an identity created specifically for applications, hosted services, or automated tools to access Azure resources. It acts like a user identity with a defined set of permissions, enabling resources authenticated through the service principal to perform actions within the Azure account. The platform can utilize an Azure service principal to authenticate and access Azure Batch for job execution and Azure Storage for data management. -## Resource group +## Create Azure resources -A resource group in Azure is a unit of related resources in Azure. As a rule of thumb, resources that have a similar lifecycle should be within the same resource group. You can delete a resource group and all associated components together. We recommend placing all platform compute resources in the same resource group, but this is not necessary. +### Resource group -### Create a resource group +Create a resource group to link your Azure Batch and Azure Storage account: -1. Log in to your Azure account, go to the [Create Resource group][az-create-rg] page, and select **Create new resource group**. -2. Enter a name for the resource group, e.g., _towerrg_. -3. Choose the preferred region. -4. Select **Review and Create** to proceed. -5. Select **Create**. +:::note +A resource group can be created while creating an Azure Storage account or Azure Batch account. +::: -## Storage account +1. Log in to your Azure account, go to the [Create Resource group][az-create-rg] page, and select **Create new resource group**. +1. Enter a name for the resource group, such as _seqeracompute_. +1. Choose the preferred region. +1. Select **Review and Create** to proceed. +1. Select **Create**. -After creating a resource group, set up an [Azure storage account][az-learn-storage]. +### Storage account -### Create a storage account +After creating a resource group, set up an [Azure Storage account][az-learn-storage]: 1. Log in to your Azure account, go to the [Create storage account][az-create-storage] page, and select **Create a storage account**. - :::note If you haven't created a resource group, you can do so now. ::: - -2. Enter a name for the storage account (e.g., _towerrgstorage_). -3. Choose the preferred region (same as the Batch account). -4. The platform supports any performance or redundancy settings — select the most appropriate settings for your use case. -5. Select **Next: Advanced**. -6. Enable _storage account key access_. -7. Select **Next: Networking**. +1. Enter a name for the storage account, such as _seqeracomputestorage_. +1. Choose the preferred region. This must be the same region as the Batch account. +1. Platform supports all performance or redundancy settings — select the most appropriate settings for your use case. +1. Select **Next: Advanced**. +1. Enable _storage account key access_. +1. Select **Next: Networking**. - Enable public access from all networks. You can enable public access from selected virtual networks and IP addresses, but you will be unable to use Forge to create compute resources. Disabling public access is not supported. -8. Select **Data protection**. +1. Select **Data protection**. - Configure appropriate settings. All settings are supported by the platform. -9. Select **Encryption**. +1. Select **Encryption**. - Only Microsoft-managed keys (MMK) are supported. -10. In **tags**, add any required tags for the storage account. -11. Select **Review and Create**. -12. Select **Create** to create the Azure Storage account. -13. You will need at least one blob storage container to act as a working directory for Nextflow. -14. Go to your new storage account and select **+ Container** to create a new Blob storage container. A new container dialogue will open. Enter a suitable name, e.g., _towerrgstorage-container_. -15. Go to the **Access Keys** section of your new storage account (_towerrgstorage_ in this example). -16. Store the access keys for your Azure Storage account, to be used when you create a Seqera compute environment. +1. In **tags**, add any required tags for the storage account. +1. Select **Review and Create**. +1. Select **Create** to create the Azure Storage account. + - You will need at least one Blob Storage container to act as a working directory for Nextflow. +1. Go to your new storage account and select **+ Container** to create a new Blob Storage container. A new container dialogue will open. Enter a suitable name, such as _seqeracomputestorage-container_. +1. Go to the **Access Keys** section of your new storage account (_seqeracomputestorage_ in this example). +1. Store the access keys for your Azure Storage account, to be used when you create a Seqera compute environment. :::caution Blob container storage credentials are associated with the Batch pool configuration. Avoid changing these credentials in your Seqera instance after you have created the compute environment. ::: -## Batch account - -After you have created a resource group and storage account, create a [Batch account][az-learn-batch]. +### Batch account -### Create a Batch account +After you have created a resource group and Storage account, create a [Batch account][az-learn-batch]: 1. Log in to your Azure account and select **Create a batch account** on [this page][az-create-batch]. -2. Select the existing resource group or create a new one. -3. Enter a name for the Batch account, e.g., _towerrgbatch_. -4. Choose the preferred region (same as the storage account). -5. Select **Advanced**. -6. For _Pool allocation mode_, select Batch service. -7. For _Authentication mode_, ensure _Shared Key_ is selected. -8. Select **Networking**. Ensure networking access is sufficient for the platform and any additional required resources. -9. In **tags**, add any required tags for the Batch account. -10. Select **Review and Create**. -11. Select **Create**. -12. Go to your new Batch account, then select **Access Keys**. -13. Store the access keys for your Azure Batch account, to be used when you create a Seqera compute environment. - +1. Select the existing resource group or create a new one. +1. Enter a name for the Batch account, such as _seqeracomputebatch_. +1. Choose the preferred region. This must be the same region as the Storage account. +1. Select **Advanced**. +1. For **Pool allocation mode**, select **Batch service**. +1. For **Authentication mode**, select _Shared Key_. +1. Select **Networking**. Ensure networking access is sufficient for Platform and any additional required resources. +1. Add any **Tags** to the Batch account, if needed. +1. Select **Review and Create**. +1. Select **Create**. +1. Go to your new Batch account, then select **Access Keys**. +1. Store the access keys for your Azure Batch account, to be used when you create a Seqera compute environment. :::caution A newly-created Azure Batch account may not be entitled to create virtual machines without making a service request to Azure. See [Azure Batch service quotas and limits][az-batch-quotas] for more information. ::: - -14. Select the **+ Quotas** tab of the Azure Batch account to check and increase existing quotas if necessary. -15. Select **+ Request quota increase** and add the quantity of resources you require. Here is a brief guideline: - +1. Select the **+ Quotas** tab of the Azure Batch account to check and increase existing quotas if necessary. +1. Select **+ Request quota increase** and add the quantity of resources you require. Here is a brief guideline: - **Active jobs and schedules**: Each Nextflow process will require an active Azure Batch job per pipeline while running, so increase this number to a high level. See [here][az-learn-jobs] to learn more about jobs in Azure Batch. - **Pools**: Each platform compute environment requires one Azure Batch pool. Each pool is composed of multiple machines of one virtual machine size. - :::note To use separate pools for head and compute nodes, see [this FAQ entry](../faqs.mdx#azure). ::: - - **Batch accounts per region per subscription**: Set this to the number of Azure Batch accounts per region per subscription. Only one is required. - **Spot/low-priority vCPUs**: Platform does not support spot or low-priority machines when using Forge, so when using Forge this number can be zero. When manually setting up a pool, select an appropriate number of concurrent vCPUs here. - **Total Dedicated vCPUs per VM series**: See the Azure documentation for [virtual machine sizes][az-vm-sizes] to help determine the machine size you need. We recommend the latest version of the ED series available in your region as a cost-effective and appropriately-sized machine for running Nextflow. However, you will need to select alternative machine series that have additional requirements, such as those with additional GPUs or faster storage. Increase the quota by the number of required concurrent CPUs. In Azure, machines are charged per cpu minute so there is no additional cost for a higher number. -### Compute environment +### Credentials + +There are two types of Azure credentials available: access keys and Entra service principals. + +Access keys are simple to use but have several limitations: +- Access keys are long-lived. +- Access keys provide full access to the Azure Storage and Azure Batch accounts. +- Azure allows only two access keys per account, making them a single point of failure. + +Entra service principals are accounts which can be granted access to Azure Batch and Azure Storage resources: +- Service principals enable role-based access control with more precise permissions. +- Service principals map to a many-to-many relationship with Azure Batch and Azure Storage accounts. +- Some Azure Batch features are only available when using a service principal. + +:::note +The two Azure credential types use different authentication methods. You can add more than one credential to a workspace, but Platform compute environments use only one credential at any given time. While separate credentials can be used by separate compute environments concurrently, they are not cross-compatible — access granted by one credential will not be shared with the other. +::: + +#### Access keys + +:::info +Batch Forge compute environments must use access keys for authentication. Service principals are only supported in manual compute environments. +::: + +To create an access key: + +1. Navigate to the Azure Portal and sign in. +1. Locate the Azure Batch account and select **Keys** under **Account management**. The Primary and Secondary keys are listed here. Copy one of the keys and save it in a secure location for later use. +1. Locate the Azure Storage account and, under the **Security and Networking** section, select **Access keys**. Key1 and Key2 options are listed here. Copy one of them and save it in a secure location for later use. +1. In your Platform workspace **Credentials** tab, select the **Add credentials** button and complete the following fields: + - Enter a **Name** for the credentials + - **Provider**: Azure + - Select the **Shared key** tab + - Add the **Batch account** and **Blob Storage account** names and access keys to the relevant fields. +1. Delete the copied keys from their temporary location after they have been added to a credential in Platform. + +#### Entra service principal + +:::info +Batch Forge compute environments must use access keys for authentication. Service principals are only supported in manual compute environments. + +The use of Entra service principals in manual compute environments requires the use of a [managed identity](#managed-identity). +::: + +See [Create a service principal][az-create-sp] for more details. + +To create an Entra service principal: + +1. In the Azure Portal, navigate to **Microsoft Entra ID**. Under **App registrations**, select **New registration**. +1. Provide a name for the application. The application will automatically have a service principal associated with it. +1. Assign roles to the service principal: + 1. Go to the Azure Storage account. Under **Access Control (IAM)**, select **Add role assignment**. + 1. Select the **Storage Blob Data Reader** and **Storage Blob Data Contributor** roles. + 1. Select **Members**, then **Select Members**. Search for your newly created service principal and assign the role. + 1. Repeat the same process for the Azure Batch account, using the **Azure Batch Contributor** role. +1. Platform will need credentials to authenticate as the service principal: + 1. Navigate back to the app registration. On the **Overview** page, save the **Application (client) ID** value for use in Platform. + 1. Select **Certificates & secrets**, then **New client secret**. A new secret is created containing a value and secret ID. Save both values securely for use in Platform. +1. In your Platform workspace **Credentials** tab, select the **Add credentials** button and complete the following fields: + - Enter a **Name** for the credentials + - **Provider**: Azure + - Select the **Entra** tab + - Complete the remaining fields: **Batch account name**, **Blob Storage account name**, **Tenant ID** (Application (client) ID in Azure), **Client ID** (Client secret ID in Azure), **Client secret** (Client secret value in Azure). +1. Delete the ID and secret values from their temporary location after they have been added to a credential in Platform. + +## Platform compute environment There are two ways to create an Azure Batch compute environment in Seqera Platform: @@ -133,16 +189,15 @@ Batch Forge automatically creates resources that you may be charged for in your Create a Batch Forge Azure Batch compute environment: 1. In a workspace, select **Compute Environments > New Environment**. -1. Enter a descriptive name, e.g., _Azure Batch (east-us)_. +1. Enter a descriptive name, such as _Azure Batch (east-us)_. 1. Select **Azure Batch** as the target platform. -1. Choose existing Azure credentials or add a new credential. If you are using existing credentials, skip to step 7. - :::tip - You can create multiple credentials in your Seqera environment. +1. Choose existing Azure credentials or add a new credential. + :::info + Batch Forge compute environments must use access keys for authentication. Entra service principals are only supported in manual compute environments. ::: -1. Enter a name for the credentials, e.g., _Azure Credentials_. 1. Add the **Batch account** and **Blob Storage** account names and access keys. -1. Select a **Region**, e.g., _eastus_. -1. In the **Pipeline work directory** field, enter the Azure blob container created previously, e.g., `az://towerrgstorage-container/work`. +1. Select a **Region**, such as _eastus_. +1. In the **Pipeline work directory** field, enter the Azure blob container created previously. For example, `az://seqeracomputestorage-container/work`. :::note When you specify a Blob Storage bucket as your work directory, this bucket is used for the Nextflow [cloud cache](https://www.nextflow.io/docs/latest/cache-and-resume.html#cache-stores) by default. You can specify an alternative cache location with the **Nextflow config file** field on the pipeline [launch](../launch/launchpad.mdx#launch-form) form. ::: @@ -198,19 +253,17 @@ This section is for users with a pre-configured Azure Batch pool. This requires Your Seqera compute environment uses resources that you may be charged for in your Azure account. See [Cloud costs](../monitoring/cloud-costs.mdx) for guidelines to manage cloud resources effectively and prevent unexpected costs. ::: -**Create a manual Seqera Azure Batch compute environment** +Create a manual Seqera Azure Batch compute environment: 1. In a workspace, select **Compute Environments > New Environment**. -1. Enter a descriptive name for this environment, e.g., _Azure Batch (east-us)_. -1. Select **Azure Batch** as the target platform. -1. Select your existing Azure credentials or select **+** to add new credentials. If you choose to use existing credentials, skip to step 7. - :::tip - You can create multiple credentials in your Seqera environment. +1. Enter a descriptive name for this environment, such as _Azure Batch (east-us)_. +1. For **Provider**, select **Azure Batch**. +1. Select your existing Azure credentials (access keys or Entra service principal) or select **+** to add new credentials. + :::note + To authenticate using an Entra service principal, you must include a user-assigned managed identity. See [Managed identity](#managed-identity) below. ::: -1. Enter a name, e.g., _Azure Credentials_. -1. Add the **Batch account** and **Blob Storage** credentials you created previously. -1. Select a **Region**, e.g., _eastus (East US)_. -1. In the **Pipeline work directory** field, add the Azure blob container created previously, e.g., `az://towerrgstorage-container/work`. +1. Select a **Region**, such as _eastus (East US)_. +1. In the **Pipeline work directory** field, add the Azure blob container created previously. For example, `az://seqeracomputestorage-container/work`. :::note When you specify a Blob Storage bucket as your work directory, this bucket is used for the Nextflow [cloud cache](https://www.nextflow.io/docs/latest/cache-and-resume.html#cache-stores) by default. You can specify an alternative cache location with the **Nextflow config file** field on the pipeline [launch](../launch/launchpad.mdx#launch-form) form. ::: @@ -242,13 +295,16 @@ Your Seqera compute environment uses resources that you may be charged for in yo 1. Set the **Config mode** to **Manual**. 1. Enter the **Compute Pool name**. This is the name of the Azure Batch pool you created previously in the Azure Batch account. :::note - The default Azure Batch implementation uses a single pool for head and compute nodes. To use separate pools for head and compute nodes (e.g., to use low-priority VMs for compute jobs), see [this FAQ entry](../faqs.mdx#azure). + The default Azure Batch implementation uses a single pool for head and compute nodes. To use separate pools for head and compute nodes (for example, to use low-priority VMs for compute jobs), see [this FAQ entry](../faqs.mdx#azure). ::: 1. Enter a user-assigned **Managed identity client ID**, if one is attached to your Azure Batch pool. See [Managed Identity](#managed-identity) below. 1. Apply [**Resource labels**](../resource-labels/overview.mdx). This will populate the **Metadata** fields of the Azure Batch pool. 1. Expand **Staging options** to include: - Optional [pre- or post-run Bash scripts](../launch/advanced.mdx#pre--post-run-scripts) that execute before or after the Nextflow pipeline execution in your environment. - Global Nextflow configuration settings for all pipeline runs launched with this compute environment. Configuration settings in this field override the same values in the pipeline Nextflow config file. + :::info + To use managed identities, Platform requires Nextflow version 24.06.0-edge or later. Add `export NXF_VER=24.06.0-edge` to the **Global Nextflow config** field for your compute environment to use this Nextflow version by default. + ::: 1. Define custom **Environment Variables** for the **Head Job** and/or **Compute Jobs**. 1. Configure any necessary advanced options: - Use **Jobs cleanup policy** to control how Nextflow process jobs are deleted on completion. Active jobs consume the quota of the Azure Batch account. By default, jobs are terminated by Nextflow and removed from the quota when all tasks succesfully complete. If set to _Always_, all jobs are deleted by Nextflow after pipeline completion. If set to _Never_, jobs are never deleted. If set to _On success_, successful tasks are removed but failed tasks will be left for debugging purposes. @@ -261,13 +317,17 @@ See [Launch pipelines](../launch/launchpad.mdx) to start executing workflows in ### Managed identity +:::info +To use managed identities, Platform requires requires Nextflow version 24.06.0-edge or later. Add `export NXF_VER=24.06.0-edge` to the **Global Nextflow config** field in advanced options for your compute environment to use this Nextflow version by default (see manual instructions above). +::: + Nextflow can authenticate to Azure services using a managed identity. This method offers enhanced security compared to access keys, but must run on Azure infrastructure. -When you use a manually configured compute environment with a managed identity attached to the Azure Batch Pool, Nextflow can use this managed identity for authentication. However, Platform still needs to use access keys to submit the initial task to Azure Batch to run Nextflow, which will then proceed with the managed identity for subsequent authentication. +When you use a manually configured compute environment with a managed identity attached to the Azure Batch Pool, Nextflow can use this managed identity for authentication. However, Platform still needs to use access keys or an Entra service principal to submit the initial task to Azure Batch to run Nextflow, which will then proceed with the managed identity for subsequent authentication. 1. In Azure, create a user-assigned managed identity. See [Manage user-assigned managed identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/how-manage-user-assigned-managed-identities) for detailed steps. After creation, record the Client ID of the managed identity. 1. The user-assigned managed identity must have the necessary access roles for Nextflow. See [Required role assignments](https://www.nextflow.io/docs/latest/azure.html#required-role-assignments) for more information. -1. Associate the user-assigned managed identity with the Azure Batch Pool. See [Set up managed identity in your batch pool](https://learn.microsoft.com/en-us/troubleshoot/azure/hpc/batch/use-managed-identities-azure-batch-account-pool#set-up-managed-identity-in-your-batch-pool) for more information. +1. Associate the user-assigned managed identity with the Azure Batch Pool. See [Set up managed identity in your Batch pool](https://learn.microsoft.com/en-us/troubleshoot/azure/hpc/batch/use-managed-identities-azure-batch-account-pool#set-up-managed-identity-in-your-batch-pool) for more information. 1. When you set up the Platform compute environment, select the Azure Batch pool by name and enter the managed identity client ID in the specified field as instructed above. When you submit a pipeline to this compute environment, Nextflow will authenticate using the managed identity associated with the Azure Batch node it runs on, rather than relying on access keys. @@ -283,6 +343,7 @@ When you submit a pipeline to this compute environment, Nextflow will authentica [az-learn-jobs]: https://learn.microsoft.com/en-us/azure/batch/jobs-and-tasks [az-create-rg]: https://portal.azure.com/#create/Microsoft.ResourceGroup [az-create-storage]: https://portal.azure.com/#create/Microsoft.StorageAccount-ARM +[az-create-sp]: https://learn.microsoft.com/en-us/entra/identity-platform/howto-create-service-principal-portal [wave-docs]: https://docs.seqera.io/wave [fusion-docs]: https://docs.seqera.io/fusion