Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSN-35] Propagate Azure credentials to Fusion #5522

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -49,17 +49,26 @@ class AzFusionEnv implements FusionEnv {
throw new IllegalArgumentException("Missing Azure Storage account name")
}

if (cfg.storage().accountKey && cfg.storage().sasToken) {
throw new IllegalArgumentException("Azure Storage Access key and SAS token detected. Only one is allowed")
result.AZURE_STORAGE_ACCOUNT = cfg.storage().accountName

if (cfg.activeDirectory().isConfigured()) {
result.AZURE_TENANT_ID = cfg.activeDirectory().tenantId
result.AZURE_CLIENT_ID = cfg.activeDirectory().servicePrincipalId
result.AZURE_CLIENT_SECRET = cfg.activeDirectory().servicePrincipalSecret
return result
}

result.AZURE_STORAGE_ACCOUNT = cfg.storage().accountName
// In theory, generating an impromptu SAS token for authentication methods other than
// `azure.storage.sasToken` should not be necessary, because those methods should already allow sufficient
// access for normal operation. Nevertheless, #5287 heavily implies that failing to do so causes the Azure
// Storage plugin or Fusion to fail. In any case, it may be possible to remove this in the future.
result.AZURE_STORAGE_SAS_TOKEN = getOrCreateSasToken()
if (cfg.managedIdentity().isConfigured()) {
// User-assigned Managed Identity
if (cfg.managedIdentity().clientId) {
result.AZURE_CLIENT_ID = cfg.managedIdentity().clientId
}
// System Managed Identity
return result
Comment on lines +61 to +67
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can guarantee the Azure worker nodes will have the same managed identity as the Nextflow head node. In AWS we use the configuration item aws.batch.jobRole = 'JOB ROLE ARN' to specify this variable, perhaps we should do the same here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here for an explanation on how to do this with azcopy (non Fusion): #3314 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here for an explanation on how to do this with azcopy (non Fusion): #3314 (comment)

This approach would work for a system-wide managed identity, but AFAIK azcopy requires the AZURE_CLIENT_ID environment variable for a user-assigned managed identity (or the AZURE_CLIENT_ID, AZURE_SECRET_ID, and AZURE_TENANT_ID for a service principal).

I'm sure I'm missing something, but since Fusion support for this (wip) uses the same underlying mechanism (DefaultAzureCredential), it requires the same environment variables, so I'm not sure what we gain here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can guarantee the Azure worker nodes will have the same managed identity as the Nextflow head node. In AWS we use the configuration item aws.batch.jobRole = 'JOB ROLE ARN' to specify this variable, perhaps we should do the same here?

No idea about this, sorry.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the issue: #5232

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the issue: #5232

Ok, I understand what you mean now regarding the head vs worker nodes MIs. I think it makes sense, but supporting this would require changes to the configuration file format, which is probably out of the scope of this PR? Or we could discuss whether we want to do this here, but it was not the original purpose of the PR 🤔

}

// Shared Key authentication or Account SAS token
result.AZURE_STORAGE_SAS_TOKEN = getOrCreateSasToken()
return result
}

Expand All @@ -76,12 +85,6 @@ class AzFusionEnv implements FusionEnv {
return cfg.storage().sasToken
}

// For Active Directory and Managed Identity, we cannot generate an *account* SAS token, but we can generate
// a *container* SAS token for the work directory.
if (cfg.activeDirectory().isConfigured() || cfg.managedIdentity().isConfigured()) {
return AzHelper.generateContainerSasWithActiveDirectory(Global.session.workDir, cfg.storage().tokenDuration)
}

// Shared Key authentication can use an account SAS token
return AzHelper.generateAccountSasWithAccountKey(Global.session.workDir, cfg.storage().tokenDuration)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ class AzFusionEnvTest extends Specification {
env == Collections.emptyMap()
}

def 'should return env environment with SAS token config when accountKey is provided'() {
def 'should return env environment with sasToken and accountName config when accountKey is provided'() {
given:
def NAME = 'myaccount'
def KEY = 'myaccountkey'
Expand All @@ -67,7 +67,7 @@ class AzFusionEnvTest extends Specification {
env.size() == 2
}

def 'should return env environment with SAS token config when a Service Principal is provided'() {
def 'should return env environment with tenantID, servicePrincipalId, servicePrincipalSecret, and accountName config when a Service Principal is provided'() {
given:
def NAME = 'myaccount'
def CLIENT_ID = 'myclientid'
Expand All @@ -90,17 +90,18 @@ class AzFusionEnvTest extends Specification {

when:
def config = Mock(FusionConfig)
def fusionEnv = Spy(AzFusionEnv)
1 * fusionEnv.getOrCreateSasToken() >> 'generatedSasToken'
def env = fusionEnv.getEnvironment('az', config)
def env = new AzFusionEnv().getEnvironment('az', config)

then:
env.AZURE_STORAGE_ACCOUNT == NAME
env.AZURE_STORAGE_SAS_TOKEN == 'generatedSasToken'
env.size() == 2
env.AZURE_TENANT_ID == TENANT_ID
env.AZURE_CLIENT_ID == CLIENT_ID
env.AZURE_CLIENT_SECRET == CLIENT_SECRET
env.AZURE_STORAGE_SAS_TOKEN == null
Comment on lines +97 to +100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't passing the creds to the worker nodes something we should avoid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid concern, but I don't know enough about Nextflow to answer it (cc @pditommaso @jordeu). If this is indeed the case, the only way forward is to generate multiple SAS tokens (one for each container accessed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the benefits of using SAS token over secrets, and I would add Azure security is totally crap.

What do you think @arnaualcazar, it's safe exposing the client secret as an env variable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing secrets through environment variables should be avoided whenever possible. But what kind of credentials are we passing? If these credentials come from Managed Identities, then it is less of a problem since these are only valid for some time.

Instead of a SAS token, can we use a Managed Identity assigned to the container where Fusion is?

Copy link
Collaborator

@adamrtalbot adamrtalbot Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a managed identity but we would need to implement #5232 which is a much larger job.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially you are saying that if the batch node pool is aware of the managed identity there's no need for propagate the secret. Think that's the way to go

Copy link
Collaborator

@arnaualcazar arnaualcazar Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly. This is the same we do on AWS afaik.

Copy link
Collaborator

@adamrtalbot adamrtalbot Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue remains - how to attach a managed identity to the worker machines. Currently, we don't implement this at all but we need to do it to make this work. I think we can do it with this: https://learn.microsoft.com/en-en/java/api/com.microsoft.azure.batch.protocol.models.userassignedidentity

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we rely on the user to attach a managed identity to each Azure Batch node pool.

env.size() == 4
}

def 'should return env environment with SAS token config when a user-assigned Managed Identity is provided'() {
def 'should return env environment with the clientId and accountName config when a user-assigned Managed Identity is provided'() {
given:
def NAME = 'myaccount'
def CLIENT_ID = 'myclientid'
Expand All @@ -119,17 +120,16 @@ class AzFusionEnvTest extends Specification {

when:
def config = Mock(FusionConfig)
def fusionEnv = Spy(AzFusionEnv)
1 * fusionEnv.getOrCreateSasToken() >> 'generatedSasToken'
def env = fusionEnv.getEnvironment('az', config)
def env = new AzFusionEnv().getEnvironment('az', config)

then:
env.AZURE_STORAGE_ACCOUNT == NAME
env.AZURE_STORAGE_SAS_TOKEN == 'generatedSasToken'
env.AZURE_CLIENT_ID == CLIENT_ID
env.AZURE_STORAGE_SAS_TOKEN == null
env.size() == 2
}

def 'should return env environment with SAS token config when a system-assigned Managed Identity is provided'() {
def 'should return env environment with only the accountName config when a system-assigned Managed Identity is provided'() {
given:
def NAME = 'myaccount'
Global.session = Mock(Session) {
Expand All @@ -147,17 +147,15 @@ class AzFusionEnvTest extends Specification {

when:
def config = Mock(FusionConfig)
def fusionEnv = Spy(AzFusionEnv)
1 * fusionEnv.getOrCreateSasToken() >> 'generatedSasToken'
def env = fusionEnv.getEnvironment('az', config)
def env = new AzFusionEnv().getEnvironment('az', config)

then:
env.AZURE_STORAGE_ACCOUNT == NAME
env.AZURE_STORAGE_SAS_TOKEN == 'generatedSasToken'
env.size() == 2
env.AZURE_STORAGE_SAS_TOKEN == null
env.size() == 1
}

def 'should return env environment with SAS token config when a sasToken is provided'() {
def 'should return env environment sasToken and accountName config when a sasToken is provided'() {
given:
Global.session = Mock(Session) {
getConfig() >> [azure: [storage: [accountName: 'x1', sasToken: 'y1']]]
Expand All @@ -184,16 +182,4 @@ class AzFusionEnvTest extends Specification {
thrown(IllegalArgumentException)
}

def 'should throw an exception when both account key and SAS token are present'() {
given:
Global.session = Mock(Session) {
getConfig() >> [azure: [storage: [accountName: 'x1', accountKey: 'y1', sasToken: 'z1']]]
}
when:
def config = Mock(FusionConfig)
def env = new AzFusionEnv().getEnvironment('az', Mock(FusionConfig))
then:
thrown(IllegalArgumentException)
}

}