Skip to content

Conversation

@Flink-ddd
Copy link
Contributor

Description

This PR fixes a packaging issue where pip install -e . can fail to discover the top-level mii package in certain environments. This leads to a ModuleNotFoundError: No module named 'mii' at runtime, making development and testing from a source checkout unreliable.

Root Cause Analysis

The setup.py file relied on find_packages() without an explicit include argument. While this usually works, it proved to be unreliable in some containerized environments, failing to identify the mii directory as an installable package despite the presence of an __init__.py file.

The Fix

To ensure a robust installation, this PR modifies the packages argument in setup.py to be explicit:

  • From: packages=find_packages(exclude=("tests",))
  • To: packages=find_packages(include=['mii', 'mii.*'], exclude=("tests",))

This change explicitly instructs setuptools to find and include the mii package and all its sub-packages, resolving the installation issue permanently.

Verification

This packaging fix was verified in a containerized GPU environment (RunPod with the PyTorch 2.1 template).

  1. Cloned the DeepSpeed-MII repository.
  2. Applied this change to setup.py.
  3. Ran pip install -e ..
  4. Successfully executed python -c "import mii", which failed prior to this fix.

Related Pull Request

This fix is a companion to a bug fix in the core deepspeed repository. Resolving this packaging issue was a necessary prerequisite to create a working test environment for the downstream bug.

The main fix for the inference logic bug is in: deepspeed/deepspeed#7378

@Flink-ddd Flink-ddd force-pushed the fix/mii-packaging-issue branch 2 times, most recently from 2b973e3 to 5a7cb13 Compare June 22, 2025 11:41
tohtana added a commit to deepspeedai/DeepSpeed that referenced this pull request Jun 23, 2025
…7378)

### Description

This PR fixes an `AttributeError: 'UnembedParameter' object has no
attribute 'dtype'` that occurs in the Inference V2 engine. The issue is
triggered when using a high-level interface like
[DeepSpeed-MII](https://github.com/deepspeedai/DeepSpeed-MII) to run
inference on models with tied input/output embeddings, such as Llama 2.

**Resolves: #7260**

### Root Cause Analysis

The root cause is that while the `ParameterBase` metaclass correctly
creates property setters for parameter tensors, the setter function
(`param_setter`) only assigns the tensor value itself. It does not
propagate the tensor's `dtype` to the container instance.

Downstream functions, such as `flatten_inference_model`, expect every
parameter container to have a `.dtype` attribute. When they encounter a
custom container like `UnembedParameter` that lacks this attribute, an
`AttributeError` is raised.

### The Fix

The solution is to modify the `param_setter` function within
`make_param_setter` located in
`deepspeed/inference/v2/model_implementations/parameter_base.py`.

I have added the line `self.dtype = value.dtype` immediately after the
parameter tensor is assigned. This simple change ensures that any object
inheriting from `ParameterBase` will now correctly expose the `dtype` of
the tensor it wraps, resolving the error.

### Verification

This fix has been thoroughly verified in a containerized GPU environment
(RunPod with PyTorch 2.1). The verification process involved:
1. Cloning both the `deepspeed` and `DeepSpeed-MII` repositories from
source.
2. Installing the modified `deepspeed` library from this branch.
3. Installing the `DeepSpeed-MII` library (with a packaging fix) to
trigger the bug.
4. Running an end-to-end inference script with `mii.pipeline` and a
standard language model.

The logs confirm that with this fix, the program successfully executes
past the original point of failure. The `AttributeError` is completely
resolved, and the DeepSpeed engine proceeds correctly to the model
loading phase.

*(Note: A full end-to-end run in the test environment was ultimately
blocked by a separate, pre-existing build issue in DeepSpeed's op
builder (`ModuleNotFoundError: dskernels`), which is unrelated to this
logic fix. The successful progression past the original error point
serves as definitive proof of this fix's effectiveness.)*

### Related Context

This bug is primarily triggered via the
[**DeepSpeed-MII**](https://github.com/deepspeedai/DeepSpeed-MII)
project. A companion PR,
**[deepspeedai/DeepSpeed-MII#567](deepspeedai/DeepSpeed-MII#567,
has been submitted to fix a packaging issue in that repository that was
a prerequisite for this verification.

output:

<img width="1014" alt="Screenshot 2025-06-22 at 14 16 15"
src="https://github.com/user-attachments/assets/1a658f98-a98b-4584-ae11-59e9edfd0b7e"
/>

<img width="1012" alt="Screenshot 2025-06-22 at 14 16 26"
src="https://github.com/user-attachments/assets/3959d0e5-d6dc-4ed4-adbc-6919e00da172"
/>

<img width="1728" alt="Screenshot 2025-06-22 at 14 17 40"
src="https://github.com/user-attachments/assets/537fd354-b840-4af2-98ab-d243c6902412"
/>

Signed-off-by: Vensenmu <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
@Flink-ddd Flink-ddd force-pushed the fix/mii-packaging-issue branch from 5a7cb13 to 65826f0 Compare June 24, 2025 00:43
@tohtana
Copy link
Contributor

tohtana commented Jun 24, 2025

Hi @Flink-ddd,
The change for find_packages looks good to me. As the diff of " makes it hard to see the intention of this PR, can you revert it to '? Please make sure the formatting is good as well.

@Flink-ddd Flink-ddd force-pushed the fix/mii-packaging-issue branch 2 times, most recently from 7259d87 to 93b9de3 Compare June 25, 2025 01:24
@Flink-ddd
Copy link
Contributor Author

Hi @tohtana ,

Thanks for the quick review! You got it, I've reverted the quote-style formatting change to make the diff cleaner and only show the substantive change to find_packages.

Just a quick heads-up, the Formatting CI check might fail because of this, as the project's yapf hook seems to automatically enforce double quotes. I've reverted to single quotes as requested for better readability.

Please let me know if there's anything else I can do. Thanks again!

@tohtana
Copy link
Contributor

tohtana commented Jun 26, 2025

Hi @Flink-ddd, I ran the formatter, and it seems okay now. However, the test runners are not running. Let me check if there is any issue with the CI.

@Flink-ddd Flink-ddd force-pushed the fix/mii-packaging-issue branch from befaa24 to 521404d Compare June 26, 2025 04:30
@Flink-ddd
Copy link
Contributor Author

Hi @tohtana ,

Thanks a lot for fixing the formatting! I've squashed the formatting commit with my original one and re-applied the DCO sign-off to the combined commit. This should resolve the DCO check failure.

Could you please help approve the workflows when you have a moment? Thank you!

Antlera pushed a commit to Antlera/DeepSpeed that referenced this pull request Jun 27, 2025
…eepspeedai#7378)

### Description

This PR fixes an `AttributeError: 'UnembedParameter' object has no
attribute 'dtype'` that occurs in the Inference V2 engine. The issue is
triggered when using a high-level interface like
[DeepSpeed-MII](https://github.com/deepspeedai/DeepSpeed-MII) to run
inference on models with tied input/output embeddings, such as Llama 2.

**Resolves: deepspeedai#7260**

### Root Cause Analysis

The root cause is that while the `ParameterBase` metaclass correctly
creates property setters for parameter tensors, the setter function
(`param_setter`) only assigns the tensor value itself. It does not
propagate the tensor's `dtype` to the container instance.

Downstream functions, such as `flatten_inference_model`, expect every
parameter container to have a `.dtype` attribute. When they encounter a
custom container like `UnembedParameter` that lacks this attribute, an
`AttributeError` is raised.

### The Fix

The solution is to modify the `param_setter` function within
`make_param_setter` located in
`deepspeed/inference/v2/model_implementations/parameter_base.py`.

I have added the line `self.dtype = value.dtype` immediately after the
parameter tensor is assigned. This simple change ensures that any object
inheriting from `ParameterBase` will now correctly expose the `dtype` of
the tensor it wraps, resolving the error.

### Verification

This fix has been thoroughly verified in a containerized GPU environment
(RunPod with PyTorch 2.1). The verification process involved:
1. Cloning both the `deepspeed` and `DeepSpeed-MII` repositories from
source.
2. Installing the modified `deepspeed` library from this branch.
3. Installing the `DeepSpeed-MII` library (with a packaging fix) to
trigger the bug.
4. Running an end-to-end inference script with `mii.pipeline` and a
standard language model.

The logs confirm that with this fix, the program successfully executes
past the original point of failure. The `AttributeError` is completely
resolved, and the DeepSpeed engine proceeds correctly to the model
loading phase.

*(Note: A full end-to-end run in the test environment was ultimately
blocked by a separate, pre-existing build issue in DeepSpeed's op
builder (`ModuleNotFoundError: dskernels`), which is unrelated to this
logic fix. The successful progression past the original error point
serves as definitive proof of this fix's effectiveness.)*

### Related Context

This bug is primarily triggered via the
[**DeepSpeed-MII**](https://github.com/deepspeedai/DeepSpeed-MII)
project. A companion PR,
**[deepspeedai/DeepSpeed-MII#567](deepspeedai/DeepSpeed-MII#567,
has been submitted to fix a packaging issue in that repository that was
a prerequisite for this verification.

output:

<img width="1014" alt="Screenshot 2025-06-22 at 14 16 15"
src="https://github.com/user-attachments/assets/1a658f98-a98b-4584-ae11-59e9edfd0b7e"
/>

<img width="1012" alt="Screenshot 2025-06-22 at 14 16 26"
src="https://github.com/user-attachments/assets/3959d0e5-d6dc-4ed4-adbc-6919e00da172"
/>

<img width="1728" alt="Screenshot 2025-06-22 at 14 17 40"
src="https://github.com/user-attachments/assets/537fd354-b840-4af2-98ab-d243c6902412"
/>

Signed-off-by: Vensenmu <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
@Flink-ddd
Copy link
Contributor Author

Hi @tohtana ,

Thanks for approving the workflows!

I see the nv-a6000-fastgen unit test passed successfully, which is great news, It looks like the other test on the nv-v100-legacy runner was cancelled after timing out in the queue.

Just wanted to give you a heads-up on this CI status. Please let me know if there's anything I can do to help. Thanks again for your time!

@tohtana tohtana merged commit 8abdd98 into deepspeedai:main Jun 30, 2025
3 of 5 checks passed
@tohtana
Copy link
Contributor

tohtana commented Jun 30, 2025

Hi @Flink-ddd,

As we might need some more time to get the CI runner back, I forcibly merged this PR.
Your change is only about setup.py, and one of the tests that install the package has passed. It shouldn't cause an issue.

Thank you for your contribution!

@Flink-ddd
Copy link
Contributor Author

Hi @tohtana ,

That's fantastic news! Thank you so much for looking into the CI issue and for your trust in merging this.

I'm confident this packaging change is safe as it's self-contained, but of course, I'll be on standby to help if any unexpected issues arise.

Thanks again for everything, and happy to contribute in the future!

lpnpcs pushed a commit to lpnpcs/DeepSpeed that referenced this pull request Jul 30, 2025
…eepspeedai#7378)

### Description

This PR fixes an `AttributeError: 'UnembedParameter' object has no
attribute 'dtype'` that occurs in the Inference V2 engine. The issue is
triggered when using a high-level interface like
[DeepSpeed-MII](https://github.com/deepspeedai/DeepSpeed-MII) to run
inference on models with tied input/output embeddings, such as Llama 2.

**Resolves: deepspeedai#7260**

### Root Cause Analysis

The root cause is that while the `ParameterBase` metaclass correctly
creates property setters for parameter tensors, the setter function
(`param_setter`) only assigns the tensor value itself. It does not
propagate the tensor's `dtype` to the container instance.

Downstream functions, such as `flatten_inference_model`, expect every
parameter container to have a `.dtype` attribute. When they encounter a
custom container like `UnembedParameter` that lacks this attribute, an
`AttributeError` is raised.

### The Fix

The solution is to modify the `param_setter` function within
`make_param_setter` located in
`deepspeed/inference/v2/model_implementations/parameter_base.py`.

I have added the line `self.dtype = value.dtype` immediately after the
parameter tensor is assigned. This simple change ensures that any object
inheriting from `ParameterBase` will now correctly expose the `dtype` of
the tensor it wraps, resolving the error.

### Verification

This fix has been thoroughly verified in a containerized GPU environment
(RunPod with PyTorch 2.1). The verification process involved:
1. Cloning both the `deepspeed` and `DeepSpeed-MII` repositories from
source.
2. Installing the modified `deepspeed` library from this branch.
3. Installing the `DeepSpeed-MII` library (with a packaging fix) to
trigger the bug.
4. Running an end-to-end inference script with `mii.pipeline` and a
standard language model.

The logs confirm that with this fix, the program successfully executes
past the original point of failure. The `AttributeError` is completely
resolved, and the DeepSpeed engine proceeds correctly to the model
loading phase.

*(Note: A full end-to-end run in the test environment was ultimately
blocked by a separate, pre-existing build issue in DeepSpeed's op
builder (`ModuleNotFoundError: dskernels`), which is unrelated to this
logic fix. The successful progression past the original error point
serves as definitive proof of this fix's effectiveness.)*

### Related Context

This bug is primarily triggered via the
[**DeepSpeed-MII**](https://github.com/deepspeedai/DeepSpeed-MII)
project. A companion PR,
**[deepspeedai/DeepSpeed-MII#567](deepspeedai/DeepSpeed-MII#567,
has been submitted to fix a packaging issue in that repository that was
a prerequisite for this verification.

output:

<img width="1014" alt="Screenshot 2025-06-22 at 14 16 15"
src="https://github.com/user-attachments/assets/1a658f98-a98b-4584-ae11-59e9edfd0b7e"
/>

<img width="1012" alt="Screenshot 2025-06-22 at 14 16 26"
src="https://github.com/user-attachments/assets/3959d0e5-d6dc-4ed4-adbc-6919e00da172"
/>

<img width="1728" alt="Screenshot 2025-06-22 at 14 17 40"
src="https://github.com/user-attachments/assets/537fd354-b840-4af2-98ab-d243c6902412"
/>

Signed-off-by: Vensenmu <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Oct 4, 2025
…eepspeedai#7378)

### Description

This PR fixes an `AttributeError: 'UnembedParameter' object has no
attribute 'dtype'` that occurs in the Inference V2 engine. The issue is
triggered when using a high-level interface like
[DeepSpeed-MII](https://github.com/deepspeedai/DeepSpeed-MII) to run
inference on models with tied input/output embeddings, such as Llama 2.

**Resolves: deepspeedai#7260**

### Root Cause Analysis

The root cause is that while the `ParameterBase` metaclass correctly
creates property setters for parameter tensors, the setter function
(`param_setter`) only assigns the tensor value itself. It does not
propagate the tensor's `dtype` to the container instance.

Downstream functions, such as `flatten_inference_model`, expect every
parameter container to have a `.dtype` attribute. When they encounter a
custom container like `UnembedParameter` that lacks this attribute, an
`AttributeError` is raised.

### The Fix

The solution is to modify the `param_setter` function within
`make_param_setter` located in
`deepspeed/inference/v2/model_implementations/parameter_base.py`.

I have added the line `self.dtype = value.dtype` immediately after the
parameter tensor is assigned. This simple change ensures that any object
inheriting from `ParameterBase` will now correctly expose the `dtype` of
the tensor it wraps, resolving the error.

### Verification

This fix has been thoroughly verified in a containerized GPU environment
(RunPod with PyTorch 2.1). The verification process involved:
1. Cloning both the `deepspeed` and `DeepSpeed-MII` repositories from
source.
2. Installing the modified `deepspeed` library from this branch.
3. Installing the `DeepSpeed-MII` library (with a packaging fix) to
trigger the bug.
4. Running an end-to-end inference script with `mii.pipeline` and a
standard language model.

The logs confirm that with this fix, the program successfully executes
past the original point of failure. The `AttributeError` is completely
resolved, and the DeepSpeed engine proceeds correctly to the model
loading phase.

*(Note: A full end-to-end run in the test environment was ultimately
blocked by a separate, pre-existing build issue in DeepSpeed's op
builder (`ModuleNotFoundError: dskernels`), which is unrelated to this
logic fix. The successful progression past the original error point
serves as definitive proof of this fix's effectiveness.)*

### Related Context

This bug is primarily triggered via the
[**DeepSpeed-MII**](https://github.com/deepspeedai/DeepSpeed-MII)
project. A companion PR,
**[deepspeedai/DeepSpeed-MII#567](deepspeedai/DeepSpeed-MII#567,
has been submitted to fix a packaging issue in that repository that was
a prerequisite for this verification.

output:

<img width="1014" alt="Screenshot 2025-06-22 at 14 16 15"
src="https://github.com/user-attachments/assets/1a658f98-a98b-4584-ae11-59e9edfd0b7e"
/>

<img width="1012" alt="Screenshot 2025-06-22 at 14 16 26"
src="https://github.com/user-attachments/assets/3959d0e5-d6dc-4ed4-adbc-6919e00da172"
/>

<img width="1728" alt="Screenshot 2025-06-22 at 14 17 40"
src="https://github.com/user-attachments/assets/537fd354-b840-4af2-98ab-d243c6902412"
/>

Signed-off-by: Vensenmu <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants