Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable builds without direct torch.cuda availability and support sm89 / sm90. #5

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

wanderingai
Copy link

@wanderingai wanderingai commented Nov 22, 2023

This PR allows for monarch_cuda kernel to be built for computes sm80, sm89, and sm90, which includes the following GPUs:

  • A100
  • H100
  • L40
  • RTX 6000 Ada

Additionally the setup.py is updated to enable builds based on nvcc availability but without direct torch.cuda availability for flexible builds.

Update:

The compiler flags have been abstracted to support both PTX and SASS builds while defaulting to the original Ampere-based PTX-only build i.e. -gencode=arch=compute_80,code=compute_80.

Successfully tested by building a docker image and running tests under tests/:


image
image

@DanFu09
Copy link
Contributor

DanFu09 commented Nov 22, 2023

It looks like this PR is introducing some race conditions - when I install using this branch, some tests fail:

pytest -s -q tests/test_flashfftconv.py
Running 1120 items in this shard
......................................................................................................................................................................................................................
F.....................................................................................................................................................................................................................
................................................................F.F...F...............................................................................................................................................
......................................................................................................................................................................................................................
......................................................................................................................................................................................................................
..................................................

@DanFu09 DanFu09 self-requested a review November 22, 2023 04:18
@michaelfeil
Copy link

@wanderingai Love this PR, its a improvement from the previous setup.py. Can this be merged, worst case with no sm_90 flags by default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants