-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix (5794): Handle Azure error when using Fusion #5806
base: master
Are you sure you want to change the base?
Conversation
the error strategy was ignored when using Fusion on Azure, this PR fixes it by handling the failure with a new method Signed-off-by: adamrtalbot <[email protected]>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
I wonder why it was using Also, why would this issue only happen with Fusion? |
Here's how aws batch handles it: nextflow/plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/AwsBatchTaskHandler.groovy Lines 257 to 285 in c3f67db
|
Looks like an oversight on this PR: #2099, not a big deal. We could refactor it to be closer to the AWS one? |
Doesn't look an oversight, was made on purpose by #2099 |
Either way - do we want to refactor at all or is this adequate? |
This will also use the native Azure Batch SDK first then use the exitcode: @Override
boolean checkIfCompleted() {
assert taskKey
if( !isRunning() )
return false
final done = taskState0(taskKey)==BatchTaskState.COMPLETED
if( done ) {
// finalize the task
final info = batchService.getTask(taskKey).executionInfo
task.exitStatus = info?.exitCode ?: readExitFile()
task.stdout = outputFile
task.stderr = errorFile
status = TaskStatus.COMPLETED
if (info.result == BatchTaskExecutionResult.FAILURE) {
if (task.exitStatus != 0) {
// If the exit status is not 0, throw a process failed exception and Nextflow will handle it with errorStrategy
task.error = new ProcessFailedException("Task failed with exit code ${task.exitStatus}:\n${info.failureInfo.message}".toString())
} else {
// Else use the existing error handling
task.error = new ProcessUnrecoverableException(info.failureInfo.message)
}
}
deleteTask(taskKey, task)
return true
}
return false
} |
I looked a bit into this to ensure nextflow/plugins/nf-azure/src/main/nextflow/cloud/azure/batch/AzBatchTaskHandler.groovy Lines 122 to 124 in 9eefd20
I added some extra logging statements to double check user@host:~ $ cat proc.nf
workflow {
main:
EXITCODE_TEST()
}
process EXITCODE_TEST {
"""
exit 42
"""
}
# nextflow-dev is just nextflow's `master` with some extra logs
user@host:~ $ nextflow-dev run -w az://fusion-develop/scratch/amiranda/bugs/exitcode
[...]
# logs without fusion
Feb-24 17:58:24.028 [Task monitor] DEBUG n.c.azure.batch.AzBatchTaskHandler - [AZURE BATCH] Task EXITCODE_TEST completed with exit status: 42 -- result=success
[...]
# logs with fusion
Feb-24 17:55:51.122 [Task monitor] DEBUG n.c.azure.batch.AzBatchTaskHandler - [AZURE BATCH] Task EXITCODE_TEST completed with exit status: 42 -- result=failure This causes two differences (at least) between the
@adamrtalbot's fix circumvents this divergence by making the error """not-unrecoverable""", but IMHO we should:
|
@bentsherman I refactored this to be closer to the AWS version, but didn't push because I'm not sure it's necessary: @Override
boolean checkIfCompleted() {
assert taskKey
if( !isRunning() )
return false
final done = taskState0(taskKey)==BatchTaskState.COMPLETED
if( done ) {
// finalize the task
final info = batchService.getTask(taskKey).executionInfo
task.exitStatus = info?.exitCode ?: readExitFile()
task.stdout = outputFile
task.stderr = errorFile
status = TaskStatus.COMPLETED
if (info.result == BatchTaskExecutionResult.FAILURE || task.exitStatus==Integer.MAX_VALUE) {
final String reason = info?.failureInfo?.message ?: "Unknown failure"
if ( task.exitStatus && task.exitStatus != 0 ){
// If the exit status is not 0, throw a process failed exception and Nextflow will handle it with errorStrategy
task.error = new ProcessFailedException("Task failed with exit code ${task.exitStatus}:\n${reason}")
} else {
// Else use the existing error handling
task.error = new ProcessUnrecoverableException(reason)
}
}
deleteTask(taskKey, task)
return true
}
return false
} What do you think? |
Following what @alberto-miranda suggested, the reason why Azure Batch Tasks are different when enabling fusion is because of the cmd command set in this part of the code
The
So, without fusion the Azure Batch Task exit code is always 0 without fusion, but the failure is detected later in the code checking the Nextflow task exit code. To make both executions equivalent, I think we should make the non-fusion cmd to get the real exist code with something like:
I think @adamrtalbot solutions are fine to avoid exit code failures are recoverable but I do not have clear is why the rest of failures are Unrecoverable by default. |
The error strategy was ignored when using Fusion on Azure, this PR fixes it by handling the error after reading the exitCode
I don't know if this is the right way to do it...but it works?