Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test building on Snellius: Zen4/H100 #903

Open
wants to merge 26 commits into
base: 2023.06-software.eessi.io
Choose a base branch
from

Conversation

casparvl
Copy link
Collaborator

For now, I've set up a personal bot instance to build some experience with bot deployment. This PR is purely to test that instance.

Copy link

eessi-bot bot commented Jan 31, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphire_rapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

Copy link

eessi-bot bot commented Jan 31, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@eessi-bot-casparvl
Copy link

Instance eessi-bot-casparvl is configured to build for:

  • architectures: x86_64/zen4
  • repositories: eessi.io-2023.06-software, eessi-hpc.org-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-compat

@casparvl casparvl added tests Related to software testing accel:nvidia labels Jan 31, 2025
@EESSI EESSI deleted a comment from eessi-bot bot Jan 31, 2025
@EESSI EESSI deleted a comment from eessi-bot bot Jan 31, 2025
@EESSI EESSI deleted a comment from eessi-bot-casparvl bot Jan 31, 2025
@casparvl
Copy link
Collaborator Author

bot: build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software accel:nvidia/cc80

Copy link

eessi-bot bot commented Jan 31, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software accel:nvidia/cc80 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Jan 31, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software accel:nvidia/cc80 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot-casparvl
Copy link

eessi-bot-casparvl bot commented Jan 31, 2025

Updates by the bot instance eessi-bot-casparvl (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software accel:nvidia/cc80 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@casparvl
Copy link
Collaborator Author

bot: show_config

Copy link

eessi-bot bot commented Jan 31, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)

Copy link

eessi-bot bot commented Jan 31, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)

@eessi-bot-casparvl
Copy link

eessi-bot-casparvl bot commented Jan 31, 2025

Updates by the bot instance eessi-bot-casparvl (click for details)

Copy link

eessi-bot bot commented Jan 31, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphire_rapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

Copy link

eessi-bot bot commented Jan 31, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@eessi-bot-casparvl
Copy link

Instance eessi-bot-casparvl is configured to build for:

  • architectures: x86_64/zen4
  • repositories: eessi.io-2023.06-software, eessi-hpc.org-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-compat

@casparvl
Copy link
Collaborator Author

bot: show_config

Copy link

eessi-bot bot commented Jan 31, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)

Copy link

eessi-bot bot commented Jan 31, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)

@eessi-bot-casparvl
Copy link

eessi-bot-casparvl bot commented Jan 31, 2025

Updates by the bot instance eessi-bot-casparvl (click for details)

Copy link

eessi-bot bot commented Jan 31, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphire_rapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

Copy link

eessi-bot bot commented Jan 31, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@eessi-bot-casparvl
Copy link

Instance eessi-bot-casparvl is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi-hpc.org-2023.06-compat

@casparvl
Copy link
Collaborator Author

bot: build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90

Copy link

eessi-bot bot commented Jan 31, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Jan 31, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted

@casparvl casparvl added the bot:deploy Ask bot to deploy missing software installations to EESSI label Feb 7, 2025
@casparvl
Copy link
Collaborator Author

Testing EESSI/eessi-bot-software-layer@782a862 for Thomas...

@casparvl
Copy link
Collaborator Author

bot: build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90

Copy link

eessi-bot bot commented Feb 11, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted

@riscv-eessi-io-bot
Copy link

Updates by the bot instance eessi-bot-riscv (click for details)
  • account casparvl has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Feb 11, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted

@eessi-bot-trz42
Copy link

Updates by the bot instance trz42-GH200-jr (click for details)
  • account casparvl has NO permission to send commands to the bot

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 11, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted

@casparvl
Copy link
Collaborator Author

Better turn on the smee client first...

@casparvl
Copy link
Collaborator Author

bot: build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90

Copy link

eessi-bot bot commented Feb 11, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Feb 11, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted

@riscv-eessi-io-bot
Copy link

Updates by the bot instance eessi-bot-riscv (click for details)
  • account casparvl has NO permission to send commands to the bot

@eessi-bot-casparvl
Copy link

eessi-bot-casparvl bot commented Feb 11, 2025

Updates by the bot instance eessi-bot-casparvl (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90 resulted in:

@eessi-bot-trz42
Copy link

Updates by the bot instance trz42-GH200-jr (click for details)
  • account casparvl has NO permission to send commands to the bot

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 11, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted

@eessi-bot-casparvl
Copy link

eessi-bot-casparvl bot commented Feb 11, 2025

New job on instance eessi-bot-casparvl for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/casparl/eessi-bot-casparvl/jobs/2025.02/pr_903/9873473

date job status comment
Feb 11 20:00:31 UTC 2025 submitted job id 9873473 awaits release by job manager
Feb 11 20:00:45 UTC 2025 received job awaits launch by Slurm scheduler
Feb 11 20:01:47 UTC 2025 running job 9873473 is running
Feb 11 20:04:31 UTC 2025 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job9873473.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Feb 11 20:04:31 UTC 2025 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job9873473.test does not exist in job directory, or parsing it failed.

@casparvl
Copy link
Collaborator Author

Killing it now, and then testing the hold_release protocol...

@casparvl
Copy link
Collaborator Author

bot: build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90

Copy link

eessi-bot bot commented Feb 11, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Feb 11, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted

@riscv-eessi-io-bot
Copy link

Updates by the bot instance eessi-bot-riscv (click for details)
  • account casparvl has NO permission to send commands to the bot

@eessi-bot-trz42
Copy link

Updates by the bot instance trz42-GH200-jr (click for details)
  • account casparvl has NO permission to send commands to the bot

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 11, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90 resulted in:

    • no jobs were submitted

@eessi-bot-casparvl
Copy link

eessi-bot-casparvl bot commented Feb 11, 2025

Updates by the bot instance eessi-bot-casparvl (click for details)
  • received bot command build instance:eessi-bot-casparvl repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 from casparvl

    • expanded format: build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90
  • handling command build instance:eessi-bot-casparvl repository:eessi.io-2023.06-software architecture:zen4 accelerator:nvidia/cc90 resulted in:

@eessi-bot-casparvl
Copy link

eessi-bot-casparvl bot commented Feb 11, 2025

New job on instance eessi-bot-casparvl for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/casparl/eessi-bot-casparvl/jobs/2025.02/pr_903/9873486

date job status comment
Feb 11 20:04:38 UTC 2025 submitted job id 9873486 awaits release by job manager
Feb 12 02:42:40 UTC 2025 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job9873486.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Feb 12 02:42:40 UTC 2025 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job9873486.test does not exist in job directory, or parsing it failed.

@casparvl
Copy link
Collaborator Author

Ok, this fails, but we knew it would fail on my system, and it shows the expected pattern:

$ myjobs
     JOBID PARTITION                      NAME     USER    STATE       TIME TIME_LIMI  NODES   PRIORITY           START_TIME NODELIST(REASON)
   9873486  gpu_h100  eessi-bot-casparvl-build  casparl  PENDING       0:00   1:00:00      1          0  2025-02-11T21:04:57 (JobHeldUser)

and for the jobmanager:

RuntimeError: run_cmd(): Error running '/usr/bin/scontrol release 9873486' in 'None
           stdout ''
           stderr 'Access/permission denied for job 9873486
slurm_suspend error: Access/permission denied
'
           exit code 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accel:nvidia bot:deploy Ask bot to deploy missing software installations to EESSI tests Related to software testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant