Skip to content

[BUG] Signal killed caused by Adam Offload  #164

@MayDomine

Description

@MayDomine

Is there an existing issue for this?

  • I have searched the existing issues

Description of the Bug

When I try to run the finetune script of cpm-live-10b , adam offload optmizer caused the signal killed.And i tried to set the cpu level of adam to 0 which system wont use avx anymore.But it doesn't work.

Environment Information

- GCC version:9.4.0
- Torch version:1.13.0
- Linux system version:Ubuntu 20.04
- CUDA version:11.6+
- Torch's CUDA version (as per `torch.cuda.version()`):1.13 + 11.6
- CPU core: 48

To Reproduce

bash CPM-Bee/src/script/finetune_cpm_bee_10b.sh

Expected Behavior

It works normal

Screenshots

No response

Additional Information

No response

Confirmation

  • I have reviewed and verified all the information provided in this report.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions