Skip to content

Fix int16 TOSA.TABLE LUT zeroed when output range uses <16 bits (#20401)#20401

Draft
ryan-monroe wants to merge 1 commit into
pytorch:mainfrom
ryan-monroe:export-D107331163
Draft

Fix int16 TOSA.TABLE LUT zeroed when output range uses <16 bits (#20401)#20401
ryan-monroe wants to merge 1 commit into
pytorch:mainfrom
ryan-monroe:export-D107331163

Conversation

@ryan-monroe

@ryan-monroe ryan-monroe commented Jun 18, 2026

Copy link
Copy Markdown

Summary:

InsertTableOpsPass.generate_16_bit_table_values builds the int16 TOSA.TABLE lookup for unary ops (sigmoid, tanh, ...). It computes rshift = ceil(log2(max_table_value)) + 1 - 16 to fit the table into 16 signed bits, then does lut_values >> rshift, assuming the table fills ~16 bits (its own comment notes "for int16, rshift == 0").

When the op's output range uses fewer than 16 bits this breaks. A sigmoid output is in [0, 1]; quantized with a small scale (e.g. 1/4096), the largest table value is 4096 (13 bits), so rshift = 13 - 16 = -3. lut_values >> -3 is an undefined negative right-shift; on the host the shift count is masked and the entire table is zeroed, so the activation returns 0 for every input. This makes any int16 TABLE op with a small output range (e.g. a sigmoid in a Squeeze-and-Excitation block) degenerate.

Fix: clamp rshift to >= 0. When it would be negative the values already fit in int16, so no shift is needed; this restores the documented rshift == 0 / rescale_lshift == -7 case. The fix is general -- it covers any int16 TABLE op whose output range is small.

Differential Revision: D107331163

@ryan-monroe ryan-monroe requested a review from digantdesai as a code owner June 18, 2026 21:59
@pytorch-bot

pytorch-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20401

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Pending, 3 Unrelated Failures, 2 Unclassified Failures

As of commit 74ebdfc with merge base 6f8a889 (image):

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla

linux-foundation-easycla Bot commented Jun 18, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: christine-long-meta / name: Christine Long (7a6db1f)

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2026
@meta-codesync

meta-codesync Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

@ryan-monroe has exported this pull request. If you are a Meta employee, you can view the originating Diff in D107331163.

@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@meta-codesync meta-codesync Bot changed the title Fix int16 TOSA.TABLE LUT zeroed when output range uses <16 bits Fix int16 TOSA.TABLE LUT zeroed when output range uses <16 bits (#20401) Jun 18, 2026
ryan-monroe added a commit to ryan-monroe/executorch that referenced this pull request Jun 18, 2026
…rch#20401)

Summary:

`InsertTableOpsPass.generate_16_bit_table_values` builds the int16 `TOSA.TABLE` lookup for unary ops (sigmoid, tanh, ...). It computes `rshift = ceil(log2(max_table_value)) + 1 - 16` to fit the table into 16 signed bits, then does `lut_values >> rshift`, assuming the table fills ~16 bits (its own comment notes "for int16, rshift == 0").

When the op's output range uses fewer than 16 bits this breaks. A sigmoid output is in `[0, 1]`; quantized with the scale the observer picks (here `1/4096`), the largest table value is `4096` (13 bits), so `rshift = 13 - 16 = -3`. `lut_values >> -3` is an undefined negative right-shift; on the host the shift count is masked and the entire table is zeroed, so the on-device activation returns 0 for every input.

On the Auth (ECAPA-TDNN) U85 model this zeroed the `Sigmoid` in every Squeeze-and-Excitation block, collapsing the SE channel-attention scale to 0 and dropping PTQ<->FVP SQNR from ~87 dB to 0 dB. The ideal table is a perfectly well-conditioned sigmoid ramp (`0 -> 2048 -> 4096`, 159 distinct levels) -- the shift, not the qparams, was the problem.

Fix: clamp `rshift` to >= 0. When it would be negative the values already fit in int16, so no shift is needed; this restores the documented `rshift == 0` / `rescale_lshift == -7` case. The fix is general -- it covers any int16 `TABLE` op whose output range is small.

This patches both copies of the pass (`xplat/executorch/...` and `fbcode/executorch/...`); the fbcode copy is the one fbcode test targets build, so both must stay in sync.

A full writeup (localization, the LUT before/after, FVP verification, and two secondary findings -- a runner-memory HardFault and the graph-sim e2e path) is in this Google Doc: https://docs.google.com/document/d/1WGAR01mdnwOLKcQBtC4qzYLPWB1Lh4fp_8fHF4tXORE/edit

Differential Revision: D107331163
ryan-monroe added a commit to ryan-monroe/executorch that referenced this pull request Jun 19, 2026
…rch#20401)

Summary:

`InsertTableOpsPass.generate_16_bit_table_values` builds the int16 `TOSA.TABLE` lookup for unary ops (sigmoid, tanh, ...). It computes `rshift = ceil(log2(max_table_value)) + 1 - 16` to fit the table into 16 signed bits, then does `lut_values >> rshift`, assuming the table fills ~16 bits (its own comment notes "for int16, rshift == 0").

When the op's output range uses fewer than 16 bits this breaks. A sigmoid output is in `[0, 1]`; quantized with the scale the observer picks (here `1/4096`), the largest table value is `4096` (13 bits), so `rshift = 13 - 16 = -3`. `lut_values >> -3` is an undefined negative right-shift; on the host the shift count is masked and the entire table is zeroed, so the on-device activation returns 0 for every input.

On the Auth (ECAPA-TDNN) U85 model this zeroed the `Sigmoid` in every Squeeze-and-Excitation block, collapsing the SE channel-attention scale to 0 and dropping PTQ<->FVP SQNR from ~87 dB to 0 dB. The ideal table is a perfectly well-conditioned sigmoid ramp (`0 -> 2048 -> 4096`, 159 distinct levels) -- the shift, not the qparams, was the problem.

Fix: clamp `rshift` to >= 0. When it would be negative the values already fit in int16, so no shift is needed; this restores the documented `rshift == 0` / `rescale_lshift == -7` case. The fix is general -- it covers any int16 `TABLE` op whose output range is small.

This patches both copies of the pass (`xplat/executorch/...` and `fbcode/executorch/...`); the fbcode copy is the one fbcode test targets build, so both must stay in sync.

A full writeup (localization, the LUT before/after, FVP verification, and two secondary findings -- a runner-memory HardFault and the graph-sim e2e path) is in this Google Doc: https://docs.google.com/document/d/1WGAR01mdnwOLKcQBtC4qzYLPWB1Lh4fp_8fHF4tXORE/edit

Differential Revision: D107331163
@ryan-monroe ryan-monroe force-pushed the export-D107331163 branch 2 times, most recently from 2cda5ae to b6383b4 Compare June 22, 2026 16:34
ryan-monroe added a commit to ryan-monroe/executorch that referenced this pull request Jun 22, 2026
…rch#20401)

Summary:

`InsertTableOpsPass.generate_16_bit_table_values` builds the int16 `TOSA.TABLE` lookup for unary ops (sigmoid, tanh, ...). It computes `rshift = ceil(log2(max_table_value)) + 1 - 16` to fit the table into 16 signed bits, then does `lut_values >> rshift`, assuming the table fills ~16 bits (its own comment notes "for int16, rshift == 0").

When the op's output range uses fewer than 16 bits this breaks. A sigmoid output is in `[0, 1]`; quantized with the scale the observer picks (here `1/4096`), the largest table value is `4096` (13 bits), so `rshift = 13 - 16 = -3`. `lut_values >> -3` is an undefined negative right-shift; on the host the shift count is masked and the entire table is zeroed, so the on-device activation returns 0 for every input.

On the Auth (ECAPA-TDNN) U85 model this zeroed the `Sigmoid` in every Squeeze-and-Excitation block, collapsing the SE channel-attention scale to 0 and dropping PTQ<->FVP SQNR from ~87 dB to 0 dB. The ideal table is a perfectly well-conditioned sigmoid ramp (`0 -> 2048 -> 4096`, 159 distinct levels) -- the shift, not the qparams, was the problem.

Fix: clamp `rshift` to >= 0. When it would be negative the values already fit in int16, so no shift is needed; this restores the documented `rshift == 0` / `rescale_lshift == -7` case. The fix is general -- it covers any int16 `TABLE` op whose output range is small.

This patches both copies of the pass (`xplat/executorch/...` and `fbcode/executorch/...`); the fbcode copy is the one fbcode test targets build, so both must stay in sync.

A full writeup (localization, the LUT before/after, FVP verification, and two secondary findings -- a runner-memory HardFault and the graph-sim e2e path) is in this Google Doc: https://docs.google.com/document/d/1WGAR01mdnwOLKcQBtC4qzYLPWB1Lh4fp_8fHF4tXORE/edit

Differential Revision: D107331163
ryan-monroe pushed a commit to ryan-monroe/executorch that referenced this pull request Jun 30, 2026
…rch#20401)

Summary:

`InsertTableOpsPass.generate_16_bit_table_values` builds the int16 `TOSA.TABLE` lookup for unary ops (sigmoid, tanh, ...). It computes `rshift = ceil(log2(max_table_value)) + 1 - 16` to fit the table into 16 signed bits, then does `lut_values >> rshift`, assuming the table fills ~16 bits (its own comment notes "for int16, rshift == 0").

When the op's output range uses fewer than 16 bits this breaks. A sigmoid output is in `[0, 1]`; quantized with the scale the observer picks (here `1/4096`), the largest table value is `4096` (13 bits), so `rshift = 13 - 16 = -3`. `lut_values >> -3` is an undefined negative right-shift; on the host the shift count is masked and the entire table is zeroed, so the on-device activation returns 0 for every input.

On the model this zeroed the `Sigmoid` in every Squeeze-and-Excitation block, collapsing the channel-attention scale to 0 and dropping PTQ<->FVP SQNR from ~87 dB to 0 dB. The ideal table is a perfectly well-conditioned sigmoid ramp (`0 -> 2048 -> 4096`, 159 distinct levels) -- the shift, not the qparams, was the problem.

Fix: clamp `rshift` to >= 0. When it would be negative the values already fit in int16, so no shift is needed; this restores the documented `rshift == 0` / `rescale_lshift == -7` case. The fix is general -- it covers any int16 `TABLE` op whose output range is small.

This patches both copies of the pass (`xplat/executorch/...` and `fbcode/executorch/...`); the fbcode copy is the one fbcode test targets build, so both must stay in sync.

Differential Revision: D107331163
@rascani rascani requested a review from zingo June 30, 2026 17:50
ryan-monroe pushed a commit to ryan-monroe/executorch that referenced this pull request Jun 30, 2026
…rch#20401)

Summary:

`InsertTableOpsPass.generate_16_bit_table_values` builds the int16 `TOSA.TABLE` lookup for unary ops (sigmoid, tanh, ...). It computes `rshift = ceil(log2(max_table_value)) + 1 - 16` to fit the table into 16 signed bits, then does `lut_values >> rshift`, assuming the table fills ~16 bits (its own comment notes "for int16, rshift == 0").

When the op's output range uses fewer than 16 bits this breaks. A sigmoid output is in `[0, 1]`; quantized with a small scale (e.g. `1/4096`), the largest table value is `4096` (13 bits), so `rshift = 13 - 16 = -3`. `lut_values >> -3` is an undefined negative right-shift; on the host the shift count is masked and the entire table is zeroed, so the activation returns 0 for every input. This makes any int16 `TABLE` op with a small output range (e.g. a sigmoid in a Squeeze-and-Excitation block) degenerate.

Fix: clamp `rshift` to >= 0. When it would be negative the values already fit in int16, so no shift is needed; this restores the documented `rshift == 0` / `rescale_lshift == -7` case. The fix is general -- it covers any int16 `TABLE` op whose output range is small.

Differential Revision: D107331163
…rch#20401)

Summary:

`InsertTableOpsPass.generate_16_bit_table_values` builds the int16 `TOSA.TABLE` lookup for unary ops (sigmoid, tanh, ...). It computes `rshift = ceil(log2(max_table_value)) + 1 - 16` to fit the table into 16 signed bits, then does `lut_values >> rshift`, assuming the table fills ~16 bits (its own comment notes "for int16, rshift == 0").

When the op's output range uses fewer than 16 bits this breaks. A sigmoid output is in `[0, 1]`; quantized with a small scale (e.g. `1/4096`), the largest table value is `4096` (13 bits), so `rshift = 13 - 16 = -3`. `lut_values >> -3` is an undefined negative right-shift; on the host the shift count is masked and the entire table is zeroed, so the activation returns 0 for every input. This makes any int16 `TABLE` op with a small output range (e.g. a sigmoid in a Squeeze-and-Excitation block) degenerate.

Fix: clamp `rshift` to >= 0. When it would be negative the values already fit in int16, so no shift is needed; this restores the documented `rshift == 0` / `rescale_lshift == -7` case. The fix is general -- it covers any int16 `TABLE` op whose output range is small.

Differential Revision: D107331163
@christine-long-meta christine-long-meta marked this pull request as draft June 30, 2026 19:26
@christine-long-meta

Copy link
Copy Markdown
Contributor

Converting the draft as I cannot re-export the commandeered diff to this PR due to owner is on PTO. Will create new PR to get around this problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported module: arm Issues related to arm backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants