Skip to content

fix: make default Slurm CPU headers portable#596

Merged
njzjz merged 1 commit into
deepmodeling:masterfrom
SchrodingersCattt:fix/slurm-cpu-header
May 13, 2026
Merged

fix: make default Slurm CPU headers portable#596
njzjz merged 1 commit into
deepmodeling:masterfrom
SchrodingersCattt:fix/slurm-cpu-header

Conversation

@SchrodingersCattt

@SchrodingersCattt SchrodingersCattt commented May 12, 2026

Copy link
Copy Markdown
Contributor

Avoid login-shell and GPU directives for CPU Slurm jobs so generated scripts work on clusters with stricter compute-node environments.

Summary by CodeRabbit

  • Bug Fixes

    • Removed login-shell (-l) from script shebang and stopped embedding the parsable flag in headers for more predictable submissions.
    • GPU request logic now omits GPU directives for CPU jobs and respects explicit empty overrides when provided.
  • Documentation

    • Clarified guidance about login-shell effects and that GPU directives are omitted by default for CPU jobs.
  • Tests

    • Improved Slurm header tests for clearer, more targeted coverage.

Review Change Stack

@dosubot dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. bug Something isn't working labels May 12, 2026
@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@SchrodingersCattt has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 28 minutes before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9e778470-05ee-4152-971a-e6824e998ba3

📥 Commits

Reviewing files that changed from the base of the PR and between 252b175 and 593e67f.

📒 Files selected for processing (5)
  • doc/examples/expanse.md
  • dpdispatcher/machines/slurm.py
  • examples/resources/expanse_cpu.json
  • examples/resources/template.slurm
  • tests/test_slurm_script_generation.py
📝 Walkthrough

Walkthrough

The PR removes the login-shell invocation (-l) and embedded --parsable directive from Slurm script headers, allowing sbatch --parsable to be supplied at submission time. GPU directive generation now uses an explicit custom_gpu_line when provided, otherwise defaults to --gres=gpu:<count> for positive GPU requests or an empty directive for zero GPUs. Tests are refactored with a helper to simplify validation.

Changes

Slurm Header and GPU Directive Refactoring

Layer / File(s) Summary
Slurm header template and GPU directive generation
dpdispatcher/machines/slurm.py, examples/resources/template.slurm, examples/resources/expanse_cpu.json, doc/examples/expanse.md, doc/context.md
Shebang changed from #!/bin/bash -l to #!/bin/bash and #SBATCH --parsable removed from templates. Slurm.gen_script_header now has a -> str annotation and selects GPU directive: use custom_gpu_line only when it is non-None; otherwise emit --gres=gpu:<gpu_per_node> when GPU count > 0 or no GPU directive when count == 0. Documentation and example resource JSON updated to match the new default behavior.
Test infrastructure and header validation
tests/test_slurm_script_generation.py
Added _make_header() helper to generate script headers from a mutated machine JSON and Resources without constructing full Task/Submission objects. Tests updated to expect #!/bin/bash (no -l), no embedded #SBATCH --parsable, and to validate GPU --gres presence/absence for CPU/GPU cases.

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels: size:L, lgtm

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: make default Slurm CPU headers portable' directly and clearly summarizes the main change: modifying Slurm CPU script headers to be portable by removing login-shell and GPU directives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dpdispatcher/machines/slurm.py (1)

38-38: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add return type hint to gen_script_header.

The method should include a return type annotation -> str to comply with the project's type-hint requirements. As per coding guidelines, "Always add type hints - Include proper type annotations in all Python code for better maintainability" for files matching dpdispatcher/**/*.py.

📝 Proposed fix
-    def gen_script_header(self, job):
+    def gen_script_header(self, job) -> str:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dpdispatcher/machines/slurm.py` at line 38, Add a return type annotation to
the method signature of gen_script_header so it declares it returns a string
(i.e., change def gen_script_header(self, job) to include -> str). Locate the
gen_script_header method in the Slurm-related class (method name:
gen_script_header) and update its signature to include the return type; no other
behavioral changes are required. Ensure the change compiles with the existing
codebase and update any stubs/tests if they assert signature compatibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_slurm_script_generation.py`:
- Around line 25-37: Add type annotations to the test helper _make_header:
change its signature to accept resource_updates: Optional[Dict[str, Any]] = None
and remove_resource_keys: Optional[List[str]] = None and declare the return type
as str (the header returned by Machine.gen_script_header). Also ensure the
necessary typing imports (Optional, Dict, Any, List) are present at the top of
the test file; leave the function body unchanged and keep using Machine,
Resources and SimpleNamespace as before.

---

Outside diff comments:
In `@dpdispatcher/machines/slurm.py`:
- Line 38: Add a return type annotation to the method signature of
gen_script_header so it declares it returns a string (i.e., change def
gen_script_header(self, job) to include -> str). Locate the gen_script_header
method in the Slurm-related class (method name: gen_script_header) and update
its signature to include the return type; no other behavioral changes are
required. Ensure the change compiles with the existing codebase and update any
stubs/tests if they assert signature compatibility.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7df7e554-64d2-465d-9941-587b3a7b0c45

📥 Commits

Reviewing files that changed from the base of the PR and between b23161c and d2e9d18.

📒 Files selected for processing (2)
  • dpdispatcher/machines/slurm.py
  • tests/test_slurm_script_generation.py

Comment thread tests/test_slurm_script_generation.py Outdated
@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels May 12, 2026
@SchrodingersCattt SchrodingersCattt changed the title fix: make slurm CPU headers portable fix: make default Slurm CPU headers portable May 12, 2026
@codecov

codecov Bot commented May 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.40%. Comparing base (b23161c) to head (593e67f).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #596      +/-   ##
==========================================
+ Coverage   48.33%   48.40%   +0.07%     
==========================================
  Files          40       40              
  Lines        3958     3960       +2     
==========================================
+ Hits         1913     1917       +4     
+ Misses       2045     2043       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the portability of generated Slurm job scripts by removing login-shell usage in the default Slurm header and ensuring GPU directives are only emitted when explicitly requested or configured, which helps CPU-only jobs run on stricter clusters.

Changes:

  • Removed -l from the Slurm shebang and dropped #SBATCH --parsable from the default header template.
  • Updated Slurm GPU header generation to omit GPU requests when gpu_per_node == 0, unless an explicit custom_gpu_line is provided.
  • Refactored Slurm script-header tests to focus on gen_script_header directly and added CPU/GPU-specific assertions.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
dpdispatcher/machines/slurm.py Makes the default Slurm header more portable and fixes GPU directive emission logic for CPU jobs.
tests/test_slurm_script_generation.py Simplifies header tests and adds coverage for CPU headers omitting GPU directives and for default GPU --gres behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@njzjz-bot njzjz-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the portability fix. The code direction and the added Slurm header tests look reasonable, and CI is green, but I think this PR still needs documentation/example updates before merging.

Blocking documentation gaps:

  • doc/context.md still says DPDispatcher submission scripts use bash -l and therefore execute login-shell startup files. This PR changes the default Slurm header to #!/bin/bash, so that statement is no longer universally true for Slurm users and directly contradicts the new behavior.
  • doc/examples/expanse.md still says Expanse needs custom_gpu_line because the default would emit --gres=gpu:0. With this PR, CPU Slurm jobs now omit the GPU directive by default, so the example text should be updated.
  • examples/resources/expanse_cpu.json still sets kwargs.custom_gpu_line to #SBATCH --gpus=0. If the purpose of this PR is to make CPU Slurm headers portable by default, this example should demonstrate the new default behavior by removing that override, unless Expanse specifically requires --gpus=0—in which case the docs should explain that distinction.
  • examples/resources/template.slurm still includes #!/bin/bash -l and #SBATCH --parsable. Since this is the documented custom-header template example, please either update it to match the new recommended default or explicitly state that custom templates are user-controlled and may still opt into login shells / embedded --parsable.

The tests cover the Python behavior well enough for this small change (tests/test_slurm_script_generation.py passes locally; selected CLI/submission tests also pass in an editable venv), but the user-facing docs are now stale.

— OpenClaw 2026.4.22 (model: gpt-5.5)

@njzjz

njzjz commented May 12, 2026

Copy link
Copy Markdown
Member

This PR introduces an inconsistency among different types of Machines. Shell, Slurm, PBS, LSF, and Bohrium all used bash -l, but this PR only changed Slurm, leaving others unchanged.

Avoid invalid Slurm directives for CPU jobs while preserving the login-shell shebang used by other machine backends.
@SchrodingersCattt

Copy link
Copy Markdown
Contributor Author

This PR introduces an inconsistency among different types of Machines. Shell, Slurm, PBS, LSF, and Bohrium all used bash -l, but this PR only changed Slurm, leaving others unchanged.

I've restored the line involving bash -l and perhaps I'll open a new PR to handle the issue of -l.

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 12, 2026

@njzjz-bot njzjz-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. The documentation/examples now match the final scope of this PR:

  • #SBATCH --parsable is removed from the default Slurm header and custom template example, while sbatch --parsable is still used at submission time.
  • CPU Slurm jobs now omit the default GPU directive when gpu_per_node == 0.
  • The Expanse example no longer relies on custom_gpu_line for the default CPU case, and documents when an explicit site-specific directive such as --gpus=0 is still appropriate.
  • Keeping #!/bin/bash -l in this PR is acceptable because it avoids changing login-shell behavior for only Slurm while the other Machine backends still use bash -l; handling login-shell consistency across all backends should be a separate PR.

Local validation:

  • git diff --check origin/master...HEAD
  • pytest tests/test_slurm_script_generation.py tests/test_argcheck.py tests/test_submit.py tests/test_run_submission.py -q → 20 passed, 14 skipped
  • ruff check dpdispatcher/machines/slurm.py tests/test_slurm_script_generation.py

— OpenClaw 2026.4.22 (model: gpt-5.5)

@njzjz njzjz merged commit 581a0bd into deepmodeling:master May 13, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants