Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add adam optimizer benchmark #1764

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

khushi-411
Copy link

@khushi-411 khushi-411 commented Feb 13, 2025

Before submitting
  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes a part of #1213

Hi Team! This PR adds benchmarking support for Adam optimizer's benchmark in Thunder for both training and inference.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Yes Indeed 🎉

Benchmarking Results

------------------------------------------------------------------------------- benchmark 'params=(128, 64) compute_type=ComputeType.INFERENCE': 3 tests ------------------------------------------------------------------------------
Name (time in us)                                               Min                   Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_optim_functional_adam[128x64-inference-inductor]       30.9779 (1.0)        136.3768 (1.0)       37.6462 (1.0)       5.7252 (1.0)       35.7971 (1.0)       4.3170 (1.0)       126;109       26.5631 (1.0)        2012          13
test_optim_functional_adam[128x64-inference-eager]          54.1495 (1.75)       267.6921 (1.96)      70.0111 (1.86)     16.1090 (2.81)      67.4573 (1.88)      9.9463 (2.30)        78;72       14.2834 (0.54)       1823          10
test_optim_functional_adam[128x64-inference-thunderfx]     380.8025 (12.29)    1,902.4225 (13.95)    482.8529 (12.83)    81.4521 (14.23)    467.5560 (13.06)    28.1646 (6.52)       99;139        2.0710 (0.08)       1175           2
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------- benchmark 'params=(128, 64) compute_type=ComputeType.TRAINING_FORWARD': 3 tests -----------------------------------------------------------------------------
Name (time in us)                                             Min                     Max                Mean                StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_optim_functional_adam[128x64-forward-inductor]       31.3664 (1.0)          103.8185 (1.0)       41.3095 (1.0)          4.6313 (1.0)       40.8741 (1.0)       1.2549 (1.0)       356;450       24.2075 (1.0)        2482          13
test_optim_functional_adam[128x64-forward-eager]          53.1405 (1.69)         291.3104 (2.81)      69.8356 (1.69)        17.7803 (3.84)      70.3353 (1.72)     11.9764 (9.54)        73;68       14.3193 (0.59)       1516          10
test_optim_functional_adam[128x64-forward-thunderfx]     398.2800 (12.70)    205,214.4870 (>1000.0)  642.6926 (15.56)    4,237.8191 (915.03)   518.8325 (12.69)    42.8450 (34.14)       1;320        1.5560 (0.06)       2334           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------ benchmark 'params=(64, 64) compute_type=ComputeType.INFERENCE': 3 tests -------------------------------------------------------------------------------
Name (time in us)                                              Min                   Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_optim_functional_adam[64x64-inference-inductor]       32.7157 (1.0)         68.0104 (1.0)       41.7094 (1.0)       4.0975 (1.0)       40.7993 (1.0)       1.0836 (1.0)       156;250       23.9754 (1.0)        2540          12
test_optim_functional_adam[64x64-inference-eager]          55.4851 (1.70)       233.0119 (3.43)      63.6217 (1.53)      8.3801 (2.05)      62.1244 (1.52)      8.6753 (8.01)       105;55       15.7179 (0.66)       1851          10
test_optim_functional_adam[64x64-inference-thunderfx]     357.6000 (10.93)    1,288.0165 (18.94)    435.3566 (10.44)    74.8103 (18.26)    420.2135 (10.30)    33.5440 (30.96)     105;115        2.2970 (0.10)       1448           2
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------- benchmark 'params=(64, 64) compute_type=ComputeType.TRAINING_FORWARD': 3 tests --------------------------------------------------------------------------
Name (time in us)                                            Min                   Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_optim_functional_adam[64x64-forward-inductor]       32.6359 (1.0)        180.8978 (1.38)      43.1067 (1.0)      10.9614 (1.92)      41.7087 (1.0)       2.7176 (1.55)      159;475       23.1982 (1.0)        2098          12
test_optim_functional_adam[64x64-forward-eager]          59.8462 (1.83)       130.8139 (1.0)       67.3917 (1.56)      5.7040 (1.0)       66.1202 (1.59)      1.7519 (1.0)        91;122       14.8386 (0.64)       1648          10
test_optim_functional_adam[64x64-forward-thunderfx]     371.0345 (11.37)    1,599.0390 (12.22)    434.6706 (10.08)    82.7640 (14.51)    417.1060 (10.00)    44.9456 (25.66)       64;66        2.3006 (0.10)       1207           2
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

Command to run

pytest thunder/benchmarks/targets.py -k "test_optim_functional_adam" --benchmark-group-by='param:params,param:compute_type'

Copy link
Collaborator

@riccardofelluga riccardofelluga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @khushi-411, thanks for contributing! Unfortunately your implementation does not benchmark the Adam optimizer through Thunder.

To trace the optimizer step we need to provide it in a functional form to Thunder. So I think what @crcrpar intended for #1213 issue to be a benchmark for the following function: https://github.com/pytorch/pytorch/blob/b0042286d48e2d202019d3defd3b53086efb1e6e/torch/optim/adam.py#L866

thunder/benchmarks/__init__.py Outdated Show resolved Hide resolved
thunder/benchmarks/targets.py Outdated Show resolved Hide resolved
thunder/benchmarks/targets.py Outdated Show resolved Hide resolved
thunder/benchmarks/targets.py Outdated Show resolved Hide resolved
@khushi-411
Copy link
Author

khushi-411 commented Feb 14, 2025

Hi @riccardofelluga! Thank you for reviewing the PR and for your suggestions. I've made the updates, please take another look whenever you have time. :-)

EDIT: I think I need to make some more corrections; will ping you as soon as I complete them. Thank you!
I've addressed the issues in the PR. Would love to hear back from you!

Copy link
Collaborator

@riccardofelluga riccardofelluga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far so good! Big improvement from last time, tho there are still a couple of things to address

Side question: why is this named after litgpt? In the end the function you are benchmarking comes from torch

thunder/benchmarks/targets.py Show resolved Hide resolved
thunder/benchmarks/targets.py Outdated Show resolved Hide resolved
thunder/benchmarks/targets.py Outdated Show resolved Hide resolved
thunder/benchmarks/__init__.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@riccardofelluga riccardofelluga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! We are getting to a good shape by now, just a couple of nits and the requires_grad situation to sort out before crossing the finish line.

Is it really useful to benchmark the backward function of the optimizer step?

thunder/benchmarks/__init__.py Outdated Show resolved Hide resolved
thunder/benchmarks/__init__.py Outdated Show resolved Hide resolved
thunder/benchmarks/targets.py Show resolved Hide resolved
@khushi-411
Copy link
Author

khushi-411 commented Feb 18, 2025

Thank you, @riccardofelluga for all your useful suggestions!

Is it really useful to benchmark the backward function of the optimizer step?

No, I don't think so, because even if we calculate the gradient of the backward pass, it wouldn't be useful (at least in general cases like this).

One reason I thought to explicitly declare @parametrize_compute_type_without_backward. And the other reason was that error.
Does this sound okay to you? Thank you

@riccardofelluga
Copy link
Collaborator

riccardofelluga commented Feb 19, 2025

No, I don't think so, because even if we calculate the gradient of the backward pass, it wouldn't be useful (at least in general cases like this).
One reason I thought to explicitly declare @parametrize_compute_type_without_backward. And the other reason was that error.

Indeed! I think the best solution here would be to parametrize only for ComputeType.INFERENCE and set requires_grad manually where needed.

And the other reason was that error.

That error comes from using the decorator @torch.no_grad()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants