Skip to content

Harden KV Cache qparams #262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Conversation

kylesayrs
Copy link
Contributor

Purpose

  • Harden the logic around initialization of kv cache quantization parameters
    • Handle case where the first parameter yielded by parameters() is not the weight

Prerequisites

horheynm
horheynm previously approved these changes Feb 26, 2025
rahul-tuli
rahul-tuli previously approved these changes Feb 27, 2025

expected_shape = 1 # per tensor

param = next(module.parameters())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHAAAAAAAAAAAAAAT 😨 we just assumed next(model.parameters()) give us the weight param

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this needs to be reworked. This is meant to be called on an "attention" module, so we need to get the k_proj and v_proj attributes, not the weight attribute.

But yes, making any assumption of ordering on module.parameters() is incorrect

Base automatically changed from kylesayrs/fix-registered-offloading to main February 27, 2025 17:49
@dsikka dsikka dismissed stale reviews from brian-dellabetta, rahul-tuli, and horheynm February 27, 2025 17:49

The base branch was changed.

param = next(module.parameters())
scale_dtype = param.dtype
device = param.device
weight_param = getattr(module, "weight", next(module.parameters()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be q_proj weights?

@kylesayrs kylesayrs marked this pull request as draft April 30, 2025 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants