Skip to content

IOErrors when running multiples processes at once. #108

@alexanderwhatley

Description

@alexanderwhatley

I'm running ~200 jobs that use NetMHCpan at once on a supercomputer cluster. Some of these jobs are throwing this error message:

sh: fork: retry: Resource temporarily unavailable
sh: fork: retry: Resource temporarily unavailable
sh: fork: retry: Resource temporarily unavailable
Traceback (most recent call last):
  File "NetMHCpan_trials_all.py", line 120, in <module>
    epitope_predictions = get_epitope_predictions(HLA_alleles, vcf_file)
  File "NetMHCpan_trials_all.py", line 39, in get_epitope_predictions
    original_epitope_predictions = predictor.predict_subsequences(original_sequences).to_dataframe()
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/site-packages/mhctools/base_predictor.py", line 128, in predict_subsequences
    binding_predictions = self.predict_peptides(sorted(peptide_set))
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/site-packages/mhctools/base_commandline_predictor.py", line 309, in predict_peptides
    temp_dir_list=dirs)
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/site-packages/mhctools/base_commandline_predictor.py", line 256, in _run_commands_and_collect_predictions
    process_limit=self.process_limit)
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/site-packages/mhctools/process_helpers.py", line 141, in run_multiple_commands_redirect_stdout
    add_to_queue(p)
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/site-packages/mhctools/process_helpers.py", line 126, in add_to_queue
    process.start()
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/site-packages/mhctools/process_helpers.py", line 51, in start
    self.process = Popen(self.args, stdout=stdout, stderr=stderr)
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/subprocess.py", line 707, in __init__
    restore_signals, start_new_session)
  File "/n/home05/aewhatley/anaconda3/lib/python3.6/subprocess.py", line 1260, in _execute_child
    restore_signals, start_new_session, preexec_fn)
BlockingIOError: [Errno 11] Resource temporarily unavailable
sh: fork: retry: Resource temporarily unavailable

Do you have any advice on how to deal with this situation? Would placing a mutex on the file be the right thing to do, or should I perhaps retry the prediction line if it fails due to the resource being temporarily unavailable? Thanks for your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions