fix(setup): Explicitly include 'mii' package in find_packages #567

Flink-ddd · 2025-06-22T06:23:57Z

Description

This PR fixes a packaging issue where pip install -e . can fail to discover the top-level mii package in certain environments. This leads to a ModuleNotFoundError: No module named 'mii' at runtime, making development and testing from a source checkout unreliable.

Root Cause Analysis

The setup.py file relied on find_packages() without an explicit include argument. While this usually works, it proved to be unreliable in some containerized environments, failing to identify the mii directory as an installable package despite the presence of an __init__.py file.

The Fix

To ensure a robust installation, this PR modifies the packages argument in setup.py to be explicit:

From: packages=find_packages(exclude=("tests",))
To: packages=find_packages(include=['mii', 'mii.*'], exclude=("tests",))

This change explicitly instructs setuptools to find and include the mii package and all its sub-packages, resolving the installation issue permanently.

Verification

This packaging fix was verified in a containerized GPU environment (RunPod with the PyTorch 2.1 template).

Cloned the DeepSpeed-MII repository.
Applied this change to setup.py.
Ran pip install -e ..
Successfully executed python -c "import mii", which failed prior to this fix.

Related Pull Request

This fix is a companion to a bug fix in the core deepspeed repository. Resolving this packaging issue was a necessary prerequisite to create a working test environment for the downstream bug.

The main fix for the inference logic bug is in: deepspeed/deepspeed#7378

…7378) ### Description This PR fixes an `AttributeError: 'UnembedParameter' object has no attribute 'dtype'` that occurs in the Inference V2 engine. The issue is triggered when using a high-level interface like [DeepSpeed-MII](https://github.com/deepspeedai/DeepSpeed-MII) to run inference on models with tied input/output embeddings, such as Llama 2. **Resolves: #7260** ### Root Cause Analysis The root cause is that while the `ParameterBase` metaclass correctly creates property setters for parameter tensors, the setter function (`param_setter`) only assigns the tensor value itself. It does not propagate the tensor's `dtype` to the container instance. Downstream functions, such as `flatten_inference_model`, expect every parameter container to have a `.dtype` attribute. When they encounter a custom container like `UnembedParameter` that lacks this attribute, an `AttributeError` is raised. ### The Fix The solution is to modify the `param_setter` function within `make_param_setter` located in `deepspeed/inference/v2/model_implementations/parameter_base.py`. I have added the line `self.dtype = value.dtype` immediately after the parameter tensor is assigned. This simple change ensures that any object inheriting from `ParameterBase` will now correctly expose the `dtype` of the tensor it wraps, resolving the error. ### Verification This fix has been thoroughly verified in a containerized GPU environment (RunPod with PyTorch 2.1). The verification process involved: 1. Cloning both the `deepspeed` and `DeepSpeed-MII` repositories from source. 2. Installing the modified `deepspeed` library from this branch. 3. Installing the `DeepSpeed-MII` library (with a packaging fix) to trigger the bug. 4. Running an end-to-end inference script with `mii.pipeline` and a standard language model. The logs confirm that with this fix, the program successfully executes past the original point of failure. The `AttributeError` is completely resolved, and the DeepSpeed engine proceeds correctly to the model loading phase. *(Note: A full end-to-end run in the test environment was ultimately blocked by a separate, pre-existing build issue in DeepSpeed's op builder (`ModuleNotFoundError: dskernels`), which is unrelated to this logic fix. The successful progression past the original error point serves as definitive proof of this fix's effectiveness.)* ### Related Context This bug is primarily triggered via the [**DeepSpeed-MII**](https://github.com/deepspeedai/DeepSpeed-MII) project. A companion PR, **[deepspeedai/DeepSpeed-MII#567](deepspeedai/DeepSpeed-MII#567, has been submitted to fix a packaging issue in that repository that was a prerequisite for this verification. output： <img width="1014" alt="Screenshot 2025-06-22 at 14 16 15" src="https://github.com/user-attachments/assets/1a658f98-a98b-4584-ae11-59e9edfd0b7e" /> <img width="1012" alt="Screenshot 2025-06-22 at 14 16 26" src="https://github.com/user-attachments/assets/3959d0e5-d6dc-4ed4-adbc-6919e00da172" /> <img width="1728" alt="Screenshot 2025-06-22 at 14 17 40" src="https://github.com/user-attachments/assets/537fd354-b840-4af2-98ab-d243c6902412" /> Signed-off-by: Vensenmu <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]>

tohtana · 2025-06-24T22:13:23Z

Hi @Flink-ddd,
The change for find_packages looks good to me. As the diff of " makes it hard to see the intention of this PR, can you revert it to '? Please make sure the formatting is good as well.

Flink-ddd · 2025-06-25T01:25:23Z

Hi @tohtana ,

Thanks for the quick review! You got it, I've reverted the quote-style formatting change to make the diff cleaner and only show the substantive change to find_packages.

Just a quick heads-up, the Formatting CI check might fail because of this, as the project's yapf hook seems to automatically enforce double quotes. I've reverted to single quotes as requested for better readability.

Please let me know if there's anything else I can do. Thanks again!

tohtana · 2025-06-26T00:29:38Z

Hi @Flink-ddd, I ran the formatter, and it seems okay now. However, the test runners are not running. Let me check if there is any issue with the CI.

…rror Signed-off-by: Vensenmu <[email protected]>

Flink-ddd · 2025-06-26T04:32:40Z

Hi @tohtana ,

Thanks a lot for fixing the formatting! I've squashed the formatting commit with my original one and re-applied the DCO sign-off to the combined commit. This should resolve the DCO check failure.

Could you please help approve the workflows when you have a moment? Thank you!

…eepspeedai#7378) ### Description This PR fixes an `AttributeError: 'UnembedParameter' object has no attribute 'dtype'` that occurs in the Inference V2 engine. The issue is triggered when using a high-level interface like [DeepSpeed-MII](https://github.com/deepspeedai/DeepSpeed-MII) to run inference on models with tied input/output embeddings, such as Llama 2. **Resolves: deepspeedai#7260** ### Root Cause Analysis The root cause is that while the `ParameterBase` metaclass correctly creates property setters for parameter tensors, the setter function (`param_setter`) only assigns the tensor value itself. It does not propagate the tensor's `dtype` to the container instance. Downstream functions, such as `flatten_inference_model`, expect every parameter container to have a `.dtype` attribute. When they encounter a custom container like `UnembedParameter` that lacks this attribute, an `AttributeError` is raised. ### The Fix The solution is to modify the `param_setter` function within `make_param_setter` located in `deepspeed/inference/v2/model_implementations/parameter_base.py`. I have added the line `self.dtype = value.dtype` immediately after the parameter tensor is assigned. This simple change ensures that any object inheriting from `ParameterBase` will now correctly expose the `dtype` of the tensor it wraps, resolving the error. ### Verification This fix has been thoroughly verified in a containerized GPU environment (RunPod with PyTorch 2.1). The verification process involved: 1. Cloning both the `deepspeed` and `DeepSpeed-MII` repositories from source. 2. Installing the modified `deepspeed` library from this branch. 3. Installing the `DeepSpeed-MII` library (with a packaging fix) to trigger the bug. 4. Running an end-to-end inference script with `mii.pipeline` and a standard language model. The logs confirm that with this fix, the program successfully executes past the original point of failure. The `AttributeError` is completely resolved, and the DeepSpeed engine proceeds correctly to the model loading phase. *(Note: A full end-to-end run in the test environment was ultimately blocked by a separate, pre-existing build issue in DeepSpeed's op builder (`ModuleNotFoundError: dskernels`), which is unrelated to this logic fix. The successful progression past the original error point serves as definitive proof of this fix's effectiveness.)* ### Related Context This bug is primarily triggered via the [**DeepSpeed-MII**](https://github.com/deepspeedai/DeepSpeed-MII) project. A companion PR, **[deepspeedai/DeepSpeed-MII#567](deepspeedai/DeepSpeed-MII#567, has been submitted to fix a packaging issue in that repository that was a prerequisite for this verification. output： <img width="1014" alt="Screenshot 2025-06-22 at 14 16 15" src="https://github.com/user-attachments/assets/1a658f98-a98b-4584-ae11-59e9edfd0b7e" /> <img width="1012" alt="Screenshot 2025-06-22 at 14 16 26" src="https://github.com/user-attachments/assets/3959d0e5-d6dc-4ed4-adbc-6919e00da172" /> <img width="1728" alt="Screenshot 2025-06-22 at 14 17 40" src="https://github.com/user-attachments/assets/537fd354-b840-4af2-98ab-d243c6902412" /> Signed-off-by: Vensenmu <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]>

Flink-ddd · 2025-06-28T02:05:37Z

Hi @tohtana ,

Thanks for approving the workflows!

I see the nv-a6000-fastgen unit test passed successfully, which is great news, It looks like the other test on the nv-v100-legacy runner was cancelled after timing out in the queue.

Just wanted to give you a heads-up on this CI status. Please let me know if there's anything I can do to help. Thanks again for your time!

tohtana · 2025-06-30T16:23:47Z

Hi @Flink-ddd,

As we might need some more time to get the CI runner back, I forcibly merged this PR.
Your change is only about setup.py, and one of the tests that install the package has passed. It shouldn't cause an issue.

Thank you for your contribution!

Flink-ddd · 2025-07-01T00:25:11Z

Hi @tohtana ,

That's fantastic news! Thank you so much for looking into the CI issue and for your trust in merging this.

I'm confident this packaging change is safe as it's self-contained, but of course, I'll be on standby to help if any unexpected issues arise.

Thanks again for everything, and happy to contribute in the future!

…eepspeedai#7378) ### Description This PR fixes an `AttributeError: 'UnembedParameter' object has no attribute 'dtype'` that occurs in the Inference V2 engine. The issue is triggered when using a high-level interface like [DeepSpeed-MII](https://github.com/deepspeedai/DeepSpeed-MII) to run inference on models with tied input/output embeddings, such as Llama 2. **Resolves: deepspeedai#7260** ### Root Cause Analysis The root cause is that while the `ParameterBase` metaclass correctly creates property setters for parameter tensors, the setter function (`param_setter`) only assigns the tensor value itself. It does not propagate the tensor's `dtype` to the container instance. Downstream functions, such as `flatten_inference_model`, expect every parameter container to have a `.dtype` attribute. When they encounter a custom container like `UnembedParameter` that lacks this attribute, an `AttributeError` is raised. ### The Fix The solution is to modify the `param_setter` function within `make_param_setter` located in `deepspeed/inference/v2/model_implementations/parameter_base.py`. I have added the line `self.dtype = value.dtype` immediately after the parameter tensor is assigned. This simple change ensures that any object inheriting from `ParameterBase` will now correctly expose the `dtype` of the tensor it wraps, resolving the error. ### Verification This fix has been thoroughly verified in a containerized GPU environment (RunPod with PyTorch 2.1). The verification process involved: 1. Cloning both the `deepspeed` and `DeepSpeed-MII` repositories from source. 2. Installing the modified `deepspeed` library from this branch. 3. Installing the `DeepSpeed-MII` library (with a packaging fix) to trigger the bug. 4. Running an end-to-end inference script with `mii.pipeline` and a standard language model. The logs confirm that with this fix, the program successfully executes past the original point of failure. The `AttributeError` is completely resolved, and the DeepSpeed engine proceeds correctly to the model loading phase. *(Note: A full end-to-end run in the test environment was ultimately blocked by a separate, pre-existing build issue in DeepSpeed's op builder (`ModuleNotFoundError: dskernels`), which is unrelated to this logic fix. The successful progression past the original error point serves as definitive proof of this fix's effectiveness.)* ### Related Context This bug is primarily triggered via the [**DeepSpeed-MII**](https://github.com/deepspeedai/DeepSpeed-MII) project. A companion PR, **[deepspeedai/DeepSpeed-MII#567](deepspeedai/DeepSpeed-MII#567, has been submitted to fix a packaging issue in that repository that was a prerequisite for this verification. output： <img width="1014" alt="Screenshot 2025-06-22 at 14 16 15" src="https://github.com/user-attachments/assets/1a658f98-a98b-4584-ae11-59e9edfd0b7e" /> <img width="1012" alt="Screenshot 2025-06-22 at 14 16 26" src="https://github.com/user-attachments/assets/3959d0e5-d6dc-4ed4-adbc-6919e00da172" /> <img width="1728" alt="Screenshot 2025-06-22 at 14 17 40" src="https://github.com/user-attachments/assets/537fd354-b840-4af2-98ab-d243c6902412" /> Signed-off-by: Vensenmu <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]>

Flink-ddd requested review from loadams, tjruwase and tohtana as code owners June 22, 2025 06:23

Flink-ddd mentioned this pull request Jun 22, 2025

fix(inference): Add missing dtype attribute to ParameterBase setter deepspeedai/DeepSpeed#7378

Merged

Flink-ddd force-pushed the fix/mii-packaging-issue branch 2 times, most recently from 2b973e3 to 5a7cb13 Compare June 22, 2025 11:41

Flink-ddd force-pushed the fix/mii-packaging-issue branch from 5a7cb13 to 65826f0 Compare June 24, 2025 00:43

Flink-ddd force-pushed the fix/mii-packaging-issue branch 2 times, most recently from 7259d87 to 93b9de3 Compare June 25, 2025 01:24

fix(setup): explicitly include mii package to resolve ModuleNotFoundE…

521404d

…rror Signed-off-by: Vensenmu <[email protected]>

Flink-ddd force-pushed the fix/mii-packaging-issue branch from befaa24 to 521404d Compare June 26, 2025 04:30

tohtana merged commit 8abdd98 into deepspeedai:main Jun 30, 2025
3 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(setup): Explicitly include 'mii' package in find_packages #567

fix(setup): Explicitly include 'mii' package in find_packages #567

Uh oh!

Flink-ddd commented Jun 22, 2025

Uh oh!

tohtana commented Jun 24, 2025

Uh oh!

Flink-ddd commented Jun 25, 2025

Uh oh!

tohtana commented Jun 26, 2025

Uh oh!

Flink-ddd commented Jun 26, 2025

Uh oh!

Flink-ddd commented Jun 28, 2025

Uh oh!

Uh oh!

tohtana commented Jun 30, 2025

Uh oh!

Flink-ddd commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(setup): Explicitly include 'mii' package in find_packages #567

fix(setup): Explicitly include 'mii' package in find_packages #567

Uh oh!

Conversation

Flink-ddd commented Jun 22, 2025

Description

Root Cause Analysis

The Fix

Verification

Related Pull Request

Uh oh!

tohtana commented Jun 24, 2025

Uh oh!

Flink-ddd commented Jun 25, 2025

Uh oh!

tohtana commented Jun 26, 2025

Uh oh!

Flink-ddd commented Jun 26, 2025

Uh oh!

Flink-ddd commented Jun 28, 2025

Uh oh!

Uh oh!

tohtana commented Jun 30, 2025

Uh oh!

Flink-ddd commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants