Added CPU offloading #3452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

peri044 merged 6 commits into main from cpu-offload

May 20, 2025

Collaborator

cehongwang commented Mar 26, 2025

Description

Added CPU offloading. Compilation takes no more than 1x GPU memory. Before engine compilation, the model and graph module are moved to CPU.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

cehongwang requested review from narendasan, peri044 and zewenli98

March 26, 2025 16:33

facebook-github-bot added the cla signed label

github-actions bot added component: conversion component: api [Python] component: dynamo labels

github-actions bot requested a review from gs-olive

March 26, 2025 16:33

peri044 reviewed

View reviewed changes

py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py Outdated Show resolved Hide resolved

zewenli98 reviewed

View reviewed changes

py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py Outdated Show resolved Hide resolved

py/torch_tensorrt/dynamo/_compiler.py Outdated Show resolved Hide resolved

github-actions bot added the component: tests label

cehongwang force-pushed the cpu-offload branch from a265018 to 8b92866 Compare

April 7, 2025 13:22

github-actions bot removed the component: tests label

cehongwang force-pushed the cpu-offload branch from 8af43eb to 4593064 Compare

April 7, 2025 14:18

narendasan reviewed

View reviewed changes

Collaborator

narendasan left a comment

Wasnt there supposed to be a bunch of logging?

cehongwang force-pushed the cpu-offload branch from 4593064 to 76cab94 Compare

April 7, 2025 14:42

peri044 reviewed

View reviewed changes

py/torch_tensorrt/dynamo/_compiler.py Outdated Show resolved Hide resolved

py/torch_tensorrt/dynamo/_compiler.py Outdated Show resolved Hide resolved

py/torch_tensorrt/dynamo/_compiler.py Outdated Show resolved Hide resolved

cehongwang force-pushed the cpu-offload branch from 823ef8c to cef66f4 Compare

April 13, 2025 10:15

peri044 reviewed

View reviewed changes

Collaborator

peri044 left a comment

We need to test this change across our entire test suite to ensure it is working as expected.

py/torch_tensorrt/dynamo/_compiler.py Outdated Show resolved Hide resolved

py/torch_tensorrt/dynamo/_defaults.py Show resolved Hide resolved

cehongwang force-pushed the cpu-offload branch from cef66f4 to 76f1fd5 Compare

April 25, 2025 03:54

cehongwang added 4 commits

April 25, 2025 03:57


          Added CPU offloading

090cb65


          Chagned CPU offload to default

63be625


          Added support to module with graph break

63d6552


          Added back the control flag and fixed the CI

ef8288a

cehongwang force-pushed the cpu-offload branch from 76f1fd5 to 0fbe44d Compare

April 25, 2025 03:58

peri044 reviewed

View reviewed changes

py/torch_tensorrt/dynamo/_compiler.py Show resolved Hide resolved

narendasan reviewed

View reviewed changes

py/torch_tensorrt/dynamo/_compiler.py Outdated Show resolved Hide resolved

cehongwang force-pushed the cpu-offload branch from 0fbe44d to 0feda1c Compare

May 9, 2025 03:05

github-actions bot added the component: tests label


          Fixed comments

005755e

cehongwang force-pushed the cpu-offload branch from 0feda1c to 005755e Compare

May 9, 2025 03:15

peri044 reviewed

View reviewed changes

py/torch_tensorrt/dynamo/_compiler.py Outdated

@@ @@ -690,6 +685,18 @@ def compile( @@
                   gm = post_lowering(gm, settings)
                   logger.debug("Lowered Input graph: " + str(gm.graph))
+                  # Move the weights in the state_dict to CPU
+                  if offload_module_to_cpu:
+                      exported_program.module().to(CPU_DEVICE)

Collaborator

peri044 May 12, 2025

I encountered a situation where this wasn't enough and it required calling torch.cuda.empty_cache and gc.collect as well to release the memory. Here's a suggestion: Modify the delete_module function to deallocate_module(module, delete=False) and call it here as deallocate_module(exported_program.module(), delete=False).


          Added more test cases

c76cbd1

peri044 approved these changes

View reviewed changes

Collaborator

peri044 left a comment

LGTM

peri044 merged commit bb990fd into main

81 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed component: api [Python] component: conversion component: dynamo component: tests