add adam optimizer benchmark #1764

khushi-411 · 2025-02-13T17:45:45Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Fixes a part of #1213

Hi Team! This PR adds benchmarking support for Adam optimizer's benchmark in Thunder for both training and inference.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Yes Indeed 🎉

Benchmarking Results

------------------------------------------------------------------------------- benchmark 'params=(128, 64) compute_type=ComputeType.INFERENCE': 3 tests ------------------------------------------------------------------------------
Name (time in us)                                               Min                   Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_optim_functional_adam[128x64-inference-inductor]       30.9779 (1.0)        136.3768 (1.0)       37.6462 (1.0)       5.7252 (1.0)       35.7971 (1.0)       4.3170 (1.0)       126;109       26.5631 (1.0)        2012          13
test_optim_functional_adam[128x64-inference-eager]          54.1495 (1.75)       267.6921 (1.96)      70.0111 (1.86)     16.1090 (2.81)      67.4573 (1.88)      9.9463 (2.30)        78;72       14.2834 (0.54)       1823          10
test_optim_functional_adam[128x64-inference-thunderfx]     380.8025 (12.29)    1,902.4225 (13.95)    482.8529 (12.83)    81.4521 (14.23)    467.5560 (13.06)    28.1646 (6.52)       99;139        2.0710 (0.08)       1175           2
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------- benchmark 'params=(128, 64) compute_type=ComputeType.TRAINING_FORWARD': 3 tests -----------------------------------------------------------------------------
Name (time in us)                                             Min                     Max                Mean                StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_optim_functional_adam[128x64-forward-inductor]       31.3664 (1.0)          103.8185 (1.0)       41.3095 (1.0)          4.6313 (1.0)       40.8741 (1.0)       1.2549 (1.0)       356;450       24.2075 (1.0)        2482          13
test_optim_functional_adam[128x64-forward-eager]          53.1405 (1.69)         291.3104 (2.81)      69.8356 (1.69)        17.7803 (3.84)      70.3353 (1.72)     11.9764 (9.54)        73;68       14.3193 (0.59)       1516          10
test_optim_functional_adam[128x64-forward-thunderfx]     398.2800 (12.70)    205,214.4870 (>1000.0)  642.6926 (15.56)    4,237.8191 (915.03)   518.8325 (12.69)    42.8450 (34.14)       1;320        1.5560 (0.06)       2334           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------ benchmark 'params=(64, 64) compute_type=ComputeType.INFERENCE': 3 tests -------------------------------------------------------------------------------
Name (time in us)                                              Min                   Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_optim_functional_adam[64x64-inference-inductor]       32.7157 (1.0)         68.0104 (1.0)       41.7094 (1.0)       4.0975 (1.0)       40.7993 (1.0)       1.0836 (1.0)       156;250       23.9754 (1.0)        2540          12
test_optim_functional_adam[64x64-inference-eager]          55.4851 (1.70)       233.0119 (3.43)      63.6217 (1.53)      8.3801 (2.05)      62.1244 (1.52)      8.6753 (8.01)       105;55       15.7179 (0.66)       1851          10
test_optim_functional_adam[64x64-inference-thunderfx]     357.6000 (10.93)    1,288.0165 (18.94)    435.3566 (10.44)    74.8103 (18.26)    420.2135 (10.30)    33.5440 (30.96)     105;115        2.2970 (0.10)       1448           2
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------- benchmark 'params=(64, 64) compute_type=ComputeType.TRAINING_FORWARD': 3 tests --------------------------------------------------------------------------
Name (time in us)                                            Min                   Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_optim_functional_adam[64x64-forward-inductor]       32.6359 (1.0)        180.8978 (1.38)      43.1067 (1.0)      10.9614 (1.92)      41.7087 (1.0)       2.7176 (1.55)      159;475       23.1982 (1.0)        2098          12
test_optim_functional_adam[64x64-forward-eager]          59.8462 (1.83)       130.8139 (1.0)       67.3917 (1.56)      5.7040 (1.0)       66.1202 (1.59)      1.7519 (1.0)        91;122       14.8386 (0.64)       1648          10
test_optim_functional_adam[64x64-forward-thunderfx]     371.0345 (11.37)    1,599.0390 (12.22)    434.6706 (10.08)    82.7640 (14.51)    417.1060 (10.00)    44.9456 (25.66)       64;66        2.3006 (0.10)       1207           2
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

Command to run

pytest thunder/benchmarks/targets.py -k "test_optim_functional_adam" --benchmark-group-by='param:params,param:compute_type'

for more information, see https://pre-commit.ci

riccardofelluga

Hi @khushi-411, thanks for contributing! Unfortunately your implementation does not benchmark the Adam optimizer through Thunder.

To trace the optimizer step we need to provide it in a functional form to Thunder. So I think what @crcrpar intended for #1213 issue to be a benchmark for the following function: https://github.com/pytorch/pytorch/blob/b0042286d48e2d202019d3defd3b53086efb1e6e/torch/optim/adam.py#L866

thunder/benchmarks/__init__.py

thunder/benchmarks/targets.py

for more information, see https://pre-commit.ci

khushi-411 · 2025-02-14T11:22:44Z

Hi @riccardofelluga! Thank you for reviewing the PR and for your suggestions. I've made the updates, please take another look whenever you have time. :-)

~~EDIT: I think I need to make some more corrections; will ping you as soon as I complete them. Thank you!~~
I've addressed the issues in the PR. Would love to hear back from you!

for more information, see https://pre-commit.ci

riccardofelluga

So far so good! Big improvement from last time, tho there are still a couple of things to address

Side question: why is this named after litgpt? In the end the function you are benchmarking comes from torch

thunder/benchmarks/targets.py

thunder/benchmarks/__init__.py

riccardofelluga

Great work! We are getting to a good shape by now, just a couple of nits and the requires_grad situation to sort out before crossing the finish line.

Is it really useful to benchmark the backward function of the optimizer step?

thunder/benchmarks/__init__.py

thunder/benchmarks/targets.py

for more information, see https://pre-commit.ci

khushi-411 · 2025-02-18T10:55:50Z

Thank you, @riccardofelluga for all your useful suggestions!

Is it really useful to benchmark the backward function of the optimizer step?

No, I don't think so, because even if we calculate the gradient of the backward pass, it wouldn't be useful (at least in general cases like this).

One reason I thought to explicitly declare @parametrize_compute_type_without_backward. And the other reason was that error.
Does this sound okay to you? Thank you

riccardofelluga · 2025-02-19T09:50:18Z

No, I don't think so, because even if we calculate the gradient of the backward pass, it wouldn't be useful (at least in general cases like this).
One reason I thought to explicitly declare @parametrize_compute_type_without_backward. And the other reason was that error.

Indeed! I think the best solution here would be to parametrize only for ComputeType.INFERENCE and set requires_grad manually where needed.

And the other reason was that error.

That error comes from using the decorator @torch.no_grad()

add adam optimizer benchmark

0b1279b

khushi-411 requested review from mruberry, lantiga and t-vi as code owners February 13, 2025 17:45

pre-commit-ci bot and others added 3 commits February 13, 2025 17:46

[pre-commit.ci] auto fixes from pre-commit.com hooks

827630b

for more information, see https://pre-commit.ci

minor fix

01667cb

[pre-commit.ci] auto fixes from pre-commit.com hooks

7d5efdf

for more information, see https://pre-commit.ci

riccardofelluga requested a review from crcrpar February 14, 2025 08:27

riccardofelluga requested changes Feb 14, 2025

View reviewed changes

thunder/benchmarks/__init__.py Outdated Show resolved Hide resolved

thunder/benchmarks/targets.py Outdated Show resolved Hide resolved

thunder/benchmarks/targets.py Outdated Show resolved Hide resolved

thunder/benchmarks/targets.py Outdated Show resolved Hide resolved

khushi-411 and others added 2 commits February 14, 2025 16:46

address reviews

846119a

[pre-commit.ci] auto fixes from pre-commit.com hooks

086beb8

for more information, see https://pre-commit.ci

khushi-411 and others added 2 commits February 14, 2025 21:04

benchmark functional form

db6af6f

[pre-commit.ci] auto fixes from pre-commit.com hooks

8dc1ab9

for more information, see https://pre-commit.ci

khushi-411 requested a review from riccardofelluga February 14, 2025 17:15

riccardofelluga mentioned this pull request Feb 17, 2025

add torch.optim._functional.sgd benchmark #1766

Open

4 tasks

riccardofelluga requested changes Feb 17, 2025

View reviewed changes

thunder/benchmarks/targets.py Show resolved Hide resolved

thunder/benchmarks/targets.py Outdated Show resolved Hide resolved

thunder/benchmarks/targets.py Outdated Show resolved Hide resolved

thunder/benchmarks/__init__.py Outdated Show resolved Hide resolved

address reviews

672f0a1

riccardofelluga requested changes Feb 18, 2025

View reviewed changes

thunder/benchmarks/__init__.py Outdated Show resolved Hide resolved

thunder/benchmarks/__init__.py Outdated Show resolved Hide resolved

thunder/benchmarks/targets.py Show resolved Hide resolved

khushi-411 and others added 2 commits February 18, 2025 16:19

nitpicks

154f1e3

[pre-commit.ci] auto fixes from pre-commit.com hooks

e6dd2fb

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add adam optimizer benchmark #1764

add adam optimizer benchmark #1764

khushi-411 commented Feb 13, 2025 •

edited

Loading

riccardofelluga left a comment •

edited

Loading

khushi-411 commented Feb 14, 2025 •

edited

Loading

riccardofelluga left a comment

riccardofelluga left a comment

khushi-411 commented Feb 18, 2025 •

edited

Loading

riccardofelluga commented Feb 19, 2025 •

edited

Loading

add adam optimizer benchmark #1764

Are you sure you want to change the base?

add adam optimizer benchmark #1764

Conversation

khushi-411 commented Feb 13, 2025 • edited Loading

What does this PR do?

PR review

Did you have fun?

Benchmarking Results

Command to run

riccardofelluga left a comment • edited Loading

Choose a reason for hiding this comment

khushi-411 commented Feb 14, 2025 • edited Loading

riccardofelluga left a comment

Choose a reason for hiding this comment

riccardofelluga left a comment

Choose a reason for hiding this comment

khushi-411 commented Feb 18, 2025 • edited Loading

riccardofelluga commented Feb 19, 2025 • edited Loading

khushi-411 commented Feb 13, 2025 •

edited

Loading

riccardofelluga left a comment •

edited

Loading

khushi-411 commented Feb 14, 2025 •

edited

Loading

khushi-411 commented Feb 18, 2025 •

edited

Loading

riccardofelluga commented Feb 19, 2025 •

edited

Loading