Allow passing in additional params to be ignored in the DDP wrapper (#2103)

yuhuishi-convect · facebook-github-bot · commit 63370723bc4d · 2024-06-14T11:24:30.000-07:00
Summary: Pull Request resolved: #2103 # What Add an option to allow users to pass in additional params to be ignored in DDP. # Why Currently the wrapper calls `DistributedDataParallel._set_params_and_buffers_to_ignore_for_model` to ignore all sharded params in the embedding modules. However, if users want to call `DistributedDataParallel._set_params_and_buffers_to_ignore_for_model` before torchrec, their params to-be-ignored will be overwriten by torchrec's call. Discussion: https://fb.workplace.com/groups/319878845696681/permalink/1199477041070186/ Why users want to call `_set_params_and_buffers_to_ignore_for_model` -- please see the diff on top of this for the motivation. # How In oder to mitigate this issue, we have to "batch" the call to `_set_params_and_buffers_to_ignore_for_model`. Therefore, we allow users to pass their params-to-be-ignored to the wrapper to batch with torchrec sharded params. Reviewed By: dstaay-fb Differential Revision: D58486022 fbshipit-source-id: 3896e02fec0cec7db528c265c7d0fbfdef1fea87
diff --git a/torchrec/distributed/model_parallel.py b/torchrec/distributed/model_parallel.py
@@ -77,11 +77,13 @@ def __init__(
         static_graph: bool = True,
         find_unused_parameters: bool = False,
         allreduce_comm_precision: Optional[str] = None,
+        params_to_ignore: Optional[List[str]] = None,
     ) -> None:
         self._bucket_cap_mb: int = bucket_cap_mb
         self._static_graph: bool = static_graph
         self._find_unused_parameters: bool = find_unused_parameters
         self._allreduce_comm_precision = allreduce_comm_precision
+        self._additional_params_to_ignore: Set[str] = set(params_to_ignore or [])
 
     def _ddp_wrap(
         self,
@@ -136,7 +138,10 @@ def wrap(
         sharded_parameter_names = set(
             DistributedModelParallel._sharded_parameter_names(dmp._dmp_wrapped_module)
         )
-        self._ddp_wrap(dmp, env, device, sharded_parameter_names)
+        params_to_ignore = sharded_parameter_names.union(
+            self._additional_params_to_ignore
+        )
+        self._ddp_wrap(dmp, env, device, params_to_ignore)
 
 
 def get_unwrapped_module(module: nn.Module) -> nn.Module: