Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Service Principal documentation #171

Merged
merged 51 commits into from
Oct 11, 2024
Merged
Changes from 8 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
89d54b2
Update Azure Documentation to include Service Principals
adamrtalbot Aug 15, 2024
b4746c7
Language tidy
adamrtalbot Aug 15, 2024
2db492c
Remove previous reference to credentials that are no longer relevant …
adamrtalbot Aug 15, 2024
74eeaae
Section headers
adamrtalbot Aug 15, 2024
2fcda24
Merge branch 'master' into PLAT-330_Azure_docs
adamrtalbot Aug 15, 2024
dcc64cc
Put changes on correct platform version
adamrtalbot Aug 15, 2024
9215c0f
Add back the MI stuff
adamrtalbot Aug 15, 2024
64eac54
De-number everything
adamrtalbot Aug 15, 2024
63346e7
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Aug 20, 2024
6d2ccfd
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Aug 20, 2024
6238d2f
Formatting and language review
llewellyn-sl Aug 20, 2024
9a96c32
Align credential steps with UI copy
llewellyn-sl Aug 21, 2024
9dd8764
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Aug 26, 2024
763fbb1
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Aug 26, 2024
1791337
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Aug 26, 2024
603c333
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Aug 26, 2024
f63d1e9
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Aug 26, 2024
1d588cc
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Aug 27, 2024
ce99f54
Update azure-batch.mdx
llewellyn-sl Aug 28, 2024
02fe628
Replace all towerrg instances with seqeraplatform
adamrtalbot Aug 28, 2024
17e12a0
Add Entra, clarify service principal requiring managed identity
llewellyn-sl Aug 28, 2024
f2cd012
Merge branch 'PLAT-330_Azure_docs' of https://github.com/seqeralabs/d…
llewellyn-sl Aug 28, 2024
7d01e02
Update azure-batch.mdx
llewellyn-sl Aug 28, 2024
aa21ad3
Update azure-batch.mdx
llewellyn-sl Aug 28, 2024
e5144af
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Aug 28, 2024
47a4c02
Merge branch 'master' into PLAT-330_Azure_docs
llewellyn-sl Aug 28, 2024
70ac0a5
Update azure-batch.mdx
llewellyn-sl Aug 28, 2024
bd069d3
Merge branch 'PLAT-330_Azure_docs' of https://github.com/seqeralabs/d…
llewellyn-sl Aug 28, 2024
7848704
Fix link
llewellyn-sl Aug 28, 2024
546d170
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
mbosio85 Aug 29, 2024
aa75600
Merge branch 'master' into PLAT-330_Azure_docs
llewellyn-sl Aug 29, 2024
800e5e9
Merge branch 'master' into PLAT-330_Azure_docs
adamrtalbot Oct 7, 2024
09f0d16
Merge branch 'master' into PLAT-330_Azure_docs
justinegeffen Oct 8, 2024
92ed7ef
Merge branch 'master' into PLAT-330_Azure_docs
justinegeffen Oct 10, 2024
44526e6
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
919d58c
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
88c2dbf
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
6fe50eb
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
105939e
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
a9b8d32
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
1009489
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
cbe5950
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
e09bd1f
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
7ea37db
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
7b2dc30
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
390e588
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
51c8266
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
d946b67
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
9bf74c0
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
3de8eea
Update platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
llewellyn-sl Oct 11, 2024
56c9b3b
Merge branch 'master' into PLAT-330_Azure_docs
llewellyn-sl Oct 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 88 additions & 63 deletions platform_versioned_docs/version-24.1/compute-envs/azure-batch.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,37 +13,41 @@ Ensure you have sufficient permissions to create resource groups, an Azure Stora

## Concepts

### Regions

Azure regions are distinct geographic areas that contain multiple data centers, strategically located around the world, to provide high availability, fault tolerance, and low latency for cloud services. Each region offers a wide range of Azure services, and by choosing a specific region, users can optimize performance, ensure data residency compliance, and meet regulatory requirements. Azure regions also enable redundancy and disaster recovery options by allowing resources to be replicated across different regions, enhancing the resilience of applications and data.

### Resource group

An Azure Resource Group is a logical container that holds related Azure resources such as virtual machines, storage accounts, databases, and more. It serves as a management boundary, allowing you to organize, deploy, monitor, and manage all the resources within it as a single entity. Resources in a Resource Group share the same lifecycle, meaning they can be deployed, updated, and deleted together. This grouping also enables easier access control, monitoring, and cost management, making it a foundational element in organizing and managing cloud infrastructure in Azure.

### Accounts

Seqera Platform relies on an existing Azure Storage and Azure Batch account. You need at least 1 valid Azure Storage account and Azure Batch account within your subscription.

Azure uses 'accounts' for each service. For example, an [Azure Storage account][az-learn-storage] will house a collection of blob containers, file shares, queues, and tables. While you can have multiple Azure Storage and Azure Batch accounts in an Azure subscription, each compute environment on the platform can only use one of each (one storage and one Batch account). You can set up multiple compute environments on the platform with different credentials, storage accounts, and Batch accounts.

### Service Principal

An Azure Service Principal is an identity created for use with applications, hosted services, or automated tools to access Azure resources. It acts like a "user identity" with a specific set of permissions assigned to it. Seqera Platform can use an Azure Service Principal to authenticate and authorize access to Azure Batch for job execution and Azure Storage for data management. By assigning the necessary roles to the Service Principal, Seqera can securely interact with these Azure services, ensuring that only authorized operations are performed during pipeline execution.

## Azure Resources

### Resource group

To create Azure Batch and Azure Storage accounts, first create a [resource group][az-learn-rg] in your preferred region.
An Azure Batch and Azure Storage account need to be linked to a resource group in Azure, so it is necessary to create one.

:::note
A resource group can be created while creating an Azure Storage Account or Azure Batch account.
:::

### Regions

Azure resources can operate across regions, but this incurs additional costs and security requirements. It is recommended to place all resources in the same region. See the [Azure product page on data residency][az-data-residency] for more information.

## Resource group

A resource group in Azure is a unit of related resources in Azure. As a rule of thumb, resources that have a similar lifecycle should be within the same resource group. You can delete a resource group and all associated components together. We recommend placing all platform compute resources in the same resource group, but this is not necessary.

### Create a resource group

1. Log in to your Azure account, go to the [Create Resource group][az-create-rg] page, and select **Create new resource group**.
2. Enter a name for the resource group, e.g., _towerrg_.
3. Choose the preferred region.
4. Select **Review and Create** to proceed.
5. Select **Create**.
1. Enter a name for the resource group, e.g., _towerrg_.
1. Choose the preferred region.
1. Select **Review and Create** to proceed.
1. Select **Create**.

## Storage account
### Storage account

After creating a resource group, set up an [Azure storage account][az-learn-storage].

Expand All @@ -55,56 +59,56 @@ After creating a resource group, set up an [Azure storage account][az-learn-stor
If you haven't created a resource group, you can do so now.
:::

2. Enter a name for the storage account (e.g., _towerrgstorage_).
3. Choose the preferred region (same as the Batch account).
4. The platform supports any performance or redundancy settings — select the most appropriate settings for your use case.
5. Select **Next: Advanced**.
6. Enable _storage account key access_.
7. Select **Next: Networking**.
1. Enter a name for the storage account (e.g., _towerrgstorage_).
1. Choose the preferred region (same as the Batch account).
1. The platform supports any performance or redundancy settings — select the most appropriate settings for your use case.
1. Select **Next: Advanced**.
1. Enable _storage account key access_.
1. Select **Next: Networking**.
- Enable public access from all networks. You can enable public access from selected virtual networks and IP addresses, but you will be unable to use Forge to create compute resources. Disabling public access is not supported.
8. Select **Data protection**.
1. Select **Data protection**.
- Configure appropriate settings. All settings are supported by the platform.
9. Select **Encryption**.
1. Select **Encryption**.
- Only Microsoft-managed keys (MMK) are supported.
10. In **tags**, add any required tags for the storage account.
11. Select **Review and Create**.
12. Select **Create** to create the Azure Storage account.
13. You will need at least one blob storage container to act as a working directory for Nextflow.
14. Go to your new storage account and select **+ Container** to create a new Blob storage container. A new container dialogue will open. Enter a suitable name, e.g., _towerrgstorage-container_.
15. Go to the **Access Keys** section of your new storage account (_towerrgstorage_ in this example).
16. Store the access keys for your Azure Storage account, to be used when you create a Seqera compute environment.
1. In **tags**, add any required tags for the storage account.
1. Select **Review and Create**.
1. Select **Create** to create the Azure Storage account.
1. You will need at least one blob storage container to act as a working directory for Nextflow.
1. Go to your new storage account and select **+ Container** to create a new Blob storage container. A new container dialogue will open. Enter a suitable name, e.g., _towerrgstorage-container_.
1. Go to the **Access Keys** section of your new storage account (_towerrgstorage_ in this example).
1. Store the access keys for your Azure Storage account, to be used when you create a Seqera compute environment.

:::caution
Blob container storage credentials are associated with the Batch pool configuration. Avoid changing these credentials in your Seqera instance after you have created the compute environment.
:::

## Batch account
### Batch account

After you have created a resource group and storage account, create a [Batch account][az-learn-batch].

### Create a Batch account

1. Log in to your Azure account and select **Create a batch account** on [this page][az-create-batch].
2. Select the existing resource group or create a new one.
3. Enter a name for the Batch account, e.g., _towerrgbatch_.
4. Choose the preferred region (same as the storage account).
5. Select **Advanced**.
6. For _Pool allocation mode_, select Batch service.
7. For _Authentication mode_, ensure _Shared Key_ is selected.
8. Select **Networking**. Ensure networking access is sufficient for the platform and any additional required resources.
9. In **tags**, add any required tags for the Batch account.
10. Select **Review and Create**.
11. Select **Create**.
12. Go to your new Batch account, then select **Access Keys**.
13. Store the access keys for your Azure Batch account, to be used when you create a Seqera compute environment.
1. Select the existing resource group or create a new one.
1. Enter a name for the Batch account, e.g., _towerrgbatch_.
1. Choose the preferred region (same as the storage account).
1. Select **Advanced**.
1. For _Pool allocation mode_, select Batch service.
1. For _Authentication mode_, ensure _Shared Key_ is selected.
1. Select **Networking**. Ensure networking access is sufficient for the platform and any additional required resources.
1. In **tags**, add any required tags for the Batch account.
1. Select **Review and Create**.
1. Select **Create**.
1. Go to your new Batch account, then select **Access Keys**.
1. Store the access keys for your Azure Batch account, to be used when you create a Seqera compute environment.

:::caution
A newly-created Azure Batch account may not be entitled to create virtual machines without making a service request to Azure.
See [Azure Batch service quotas and limits][az-batch-quotas] for more information.
:::

14. Select the **+ Quotas** tab of the Azure Batch account to check and increase existing quotas if necessary.
15. Select **+ Request quota increase** and add the quantity of resources you require. Here is a brief guideline:
1. Select the **+ Quotas** tab of the Azure Batch account to check and increase existing quotas if necessary.
1. Select **+ Request quota increase** and add the quantity of resources you require. Here is a brief guideline:

- **Active jobs and schedules**: Each Nextflow process will require an active Azure Batch job per pipeline while running, so increase this number to a high level. See [here][az-learn-jobs] to learn more about jobs in Azure Batch.
- **Pools**: Each platform compute environment requires one Azure Batch pool. Each pool is composed of multiple machines of one virtual machine size.
Expand All @@ -117,7 +121,37 @@ After you have created a resource group and storage account, create a [Batch acc
- **Spot/low-priority vCPUs**: Platform does not support spot or low-priority machines when using Forge, so when using Forge this number can be zero. When manually setting up a pool, select an appropriate number of concurrent vCPUs here.
- **Total Dedicated vCPUs per VM series**: See the Azure documentation for [virtual machine sizes][az-vm-sizes] to help determine the machine size you need. We recommend the latest version of the ED series available in your region as a cost-effective and appropriately-sized machine for running Nextflow. However, you will need to select alternative machine series that have additional requirements, such as those with additional GPUs or faster storage. Increase the quota by the number of required concurrent CPUs. In Azure, machines are charged per cpu minute so there is no additional cost for a higher number.

### Compute environment
### Credentials

There are two Azure credential options available: primary Access Keys and a Service Principal. Primary Access Keys are simple to use but provide full access to the storage and batch accounts. Additionally, there can only be two keys per account, making them a single point of failure. A Service Principal, on the other hand, provides an account that can be granted access to Azure Batch and Storage resources, allowing for role-based access control with more precise permissions. Moreover, some Azure Batch features are only available when using a Service Principal instead of primary Access Keys.

:::note
The two Azure credential types use entirely different authentication methods. You can add more than one credential to a workspace, but only one can be used at a time. While they can be used concurrently, they are not cross-compatible, and access granted by one will not be shared with the other.
:::

#### Access Keys

1. Navigate to the Azure Portal and sign in.
1. Locate the Azure Batch account and select "Keys" under "Account management." Here, you will see the Primary and Secondary keys. Copy one of the keys and save it in a secure location for later use.
1. Locate the Azure Storage account and, under the "Security and Networking" section, select "Access keys". Here, you will see Key1 and Key2 options. Copy one of them and save it in a secure location for later use. Be sure to delete them after saving them in Seqera Platform.
1. In Seqera Platform, go to your workspace, select "Add a new credential," and choose the "Azure Credentials" type.
1. Enter a name for the credentials, such as _Azure Credentials_.
1. Add the **Batch account** and **Blob Storage** account names and access keys.

#### Service Principal

1. In the Azure Portal, navigate to "Microsoft Entra ID," and under "App registrations," click "New registration." See the [Azure documentation][az-create-sp] for more details.
1. Provide a name for the application. The application will automatically have a Service Principal associated with it.
1. Assign roles to the Service Principal. Go to the Azure Storage account, and under "Access Control (IAM)," click "Add role assignment."
1. Choose the roles "Storage Blob Data Reader" and "Storage Blob Data Contributor", then select "Members", click "Select Members", search for your newly created Service Principal, and assign the role.
1. Repeat the same process for the Azure Batch account, but use the "Azure Batch Contributor" role.
1. Seqera Platform will need credentials to authenticate as the Service Principal. Navigate back to the app registration, and on the "Overview" page, save the "Application (client) ID" value for use in Seqera Platform.
1. Then, click "Certificates & secrets" and select "New client secret." A new secret will be created containing a value and secret ID. Save both of these securely for use in Seqera Platform. Be sure to delete them after saving them in Seqera Platform.
1. In Seqera Platform, go to your workspace, select "Add a new credential," choose the "Azure Credentials" type, then select the "Service Principal" tab.
1. Enter a name for the credentials, such as _Azure Credentials_.
1. Add the Application ID, Secret ID, Secret, **Batch account**, and **Blob Storage** account names to the relevant fields.

## Seqera Platform

There are two ways to create an Azure Batch compute environment in Seqera Platform:

Expand All @@ -135,14 +169,7 @@ Create a Batch Forge Azure Batch compute environment:
1. In a workspace, select **Compute Environments > New Environment**.
1. Enter a descriptive name, e.g., _Azure Batch (east-us)_.
1. Select **Azure Batch** as the target platform.
1. Choose existing Azure credentials or add a new credential. If you are using existing credentials, skip to step 7.

:::tip
You can create multiple credentials in your Seqera environment.
:::

1. Enter a name for the credentials, e.g., _Azure Credentials_.
1. Add the **Batch account** and **Blob Storage** account names and access keys.
1. Choose existing Azure credentials or add a new credential.
1. Select a **Region**, e.g., _eastus_.
1. In the **Pipeline work directory** field, enter the Azure blob container created previously, e.g., `az://towerrgstorage-container/work`.

Expand All @@ -159,9 +186,7 @@ Create a Batch Forge Azure Batch compute environment:
1. Enable **Dispose resources** for Seqera to automatically delete the Batch pool if the compute environment is deleted on the platform.
1. Select or create [**Container registry credentials**](../credentials/azure_registry_credentials.mdx) to authenticate a registry (used by the [Wave containers](https://www.nextflow.io/docs/latest/wave.html) service). It is recommended to use an [Azure Container registry](https://azure.microsoft.com/en-gb/products/container-registry) within the same region for maximum performance.
1. Apply [**Resource labels**](../resource-labels/overview.mdx). This will populate the **Metadata** fields of the Azure Batch pool.
1. Expand **Staging options** to include:
- Optional [pre- or post-run Bash scripts](../launch/advanced.mdx#pre--post-run-scripts) that execute before or after the Nextflow pipeline execution in your environment.
- Global Nextflow configuration settings for all pipeline runs launched with this compute environment. Configuration settings in this field override the same values in the pipeline Nextflow config file.
1. Expand **Staging options** to include optional [pre- or post-run Bash scripts](../launch/advanced.mdx#pre-and-post-run-scripts) that execute before or after the Nextflow pipeline execution in your environment.
1. Specify custom **Environment variables** for the **Head job** and/or **Compute jobs**.
1. Configure any advanced options you need:

Expand All @@ -179,7 +204,6 @@ This section is for users with a pre-configured Azure Batch pool. This requires
:::caution
Your Seqera compute environment uses resources that you may be charged for in your Azure account. See [Cloud costs](../monitoring/cloud-costs.mdx) for guidelines to manage cloud resources effectively and prevent unexpected costs.
:::

**Create a manual Seqera Azure Batch compute environment**

1. In a workspace, select **Compute Environments > New Environment**.
Expand Down Expand Up @@ -225,9 +249,9 @@ Nextflow can authenticate to Azure services using a managed identity. This metho
When you use a manually configured compute environment with a managed identity attached to the Azure Batch Pool, Nextflow can use this managed identity for authentication. However, Platform still needs to use access keys to submit the initial task to Azure Batch to run Nextflow, which will then proceed with the managed identity for subsequent authentication.

1. In Azure, create a user-assigned managed identity. See [Manage user-assigned managed identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/how-manage-user-assigned-managed-identities) for detailed steps. After creation, record the Client ID of the managed identity.
2. The user-assigned managed identity must have the necessary access roles for Nextflow. See [Required role assignments](https://www.nextflow.io/docs/latest/azure.html#required-role-assignments) for more information.
3. Associate the user-assigned managed identity with the Azure Batch Pool. See [Set up managed identity in your batch pool](https://learn.microsoft.com/en-us/troubleshoot/azure/hpc/batch/use-managed-identities-azure-batch-account-pool#set-up-managed-identity-in-your-batch-pool) for more information.
4. When you set up the Platform compute environment, select the Azure Batch pool by name and enter the managed identity client ID in the specified field as instructed above.
1. The user-assigned managed identity must have the necessary access roles for Nextflow. See [Required role assignments](https://www.nextflow.io/docs/latest/azure.html#required-role-assignments) for more information.
1. Associate the user-assigned managed identity with the Azure Batch Pool. See [Set up managed identity in your batch pool](https://learn.microsoft.com/en-us/troubleshoot/azure/hpc/batch/use-managed-identities-azure-batch-account-pool#set-up-managed-identity-in-your-batch-pool) for more information.
1. When you set up the Platform compute environment, select the Azure Batch pool by name and enter the managed identity client ID in the specified field as instructed above.

When you submit a pipeline to this compute environment, Nextflow will authenticate using the managed identity associated with the Azure Batch node it runs on, rather than relying on access keys.

Expand All @@ -244,6 +268,7 @@ When you submit a pipeline to this compute environment, Nextflow will authentica
[az-learn-jobs]: https://learn.microsoft.com/en-us/azure/batch/jobs-and-tasks
[az-create-rg]: https://portal.azure.com/#create/Microsoft.ResourceGroup
[az-create-storage]: https://portal.azure.com/#create/Microsoft.StorageAccount-ARM
[az-create-sp](https://learn.microsoft.com/en-us/entra/identity-platform/howto-create-service-principal-portal)

[wave-docs]: https://docs.seqera.io/wave
[nf-fusion-docs]: https://www.nextflow.io/docs/latest/fusion.html
[nf-fusion-docs]: https://www.nextflow.io/docs/latest/fusion.html
Loading