-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSF job status unknown should probably be treated as HOLD not EXIT #5666
Comments
To chime in on @mp15 's comment, I got the exact same problem. Excerpt from the LSF log
LSF had a hiccup but the job completed. All the output files are there, including
However, the "Unknown; unable to reach the execution host" bit made the job status turn to
If Best, |
@mp15 @muffato I have created a PR based on your suggestion: #5756 Can either (or both) of you test this fix on your cluster? Instructions for building/testing locally are here Actually, my main concern is not that the PR works -- it's pretty straightforward -- but I wonder whether your suggestion is always appropriate? Can we be confident that the |
Here is what I could find in the LSF documentation about the status itself
and then in the section about the time summary of a job
I think LSF will change the job status to |
The UNKNOWN status in LSF does not mean the job has actually died, just that the LSF daemon has lost touch with each other. A job may continue running in an unknown state for a long period of time and write output via shared disks. It may also recover and terminate with exit code 0. I would suggest treating it as
QueueStatus.HOLD
notQueueStatus.ERROR
.nextflow/modules/nextflow/src/main/groovy/nextflow/executor/LsfExecutor.groovy
Line 215 in 9386082
The text was updated successfully, but these errors were encountered: