Fix int16 TOSA.TABLE LUT zeroed when output range uses <16 bits (#20401) by ryan-monroe · Pull Request #20401 · pytorch/executorch

ryan-monroe · 2026-06-18T21:59:26Z

Summary:

InsertTableOpsPass.generate_16_bit_table_values builds the int16 TOSA.TABLE lookup for unary ops (sigmoid, tanh, ...). It computes rshift = ceil(log2(max_table_value)) + 1 - 16 to fit the table into 16 signed bits, then does lut_values >> rshift, assuming the table fills ~16 bits (its own comment notes "for int16, rshift == 0").

When the op's output range uses fewer than 16 bits this breaks. A sigmoid output is in [0, 1]; quantized with a small scale (e.g. 1/4096), the largest table value is 4096 (13 bits), so rshift = 13 - 16 = -3. lut_values >> -3 is an undefined negative right-shift; on the host the shift count is masked and the entire table is zeroed, so the activation returns 0 for every input. This makes any int16 TABLE op with a small output range (e.g. a sigmoid in a Squeeze-and-Excitation block) degenerate.

Fix: clamp rshift to >= 0. When it would be negative the values already fit in int16, so no shift is needed; this restores the documented rshift == 0 / rescale_lshift == -7 case. The fix is general -- it covers any int16 TABLE op whose output range is small.

Differential Revision: D107331163

pytorch-bot · 2026-06-18T21:59:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20401

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Pending, 3 Unrelated Failures, 2 Unclassified Failures

As of commit 74ebdfc with merge base 6f8a889 ():

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

trunk / test-models-macos-coreml (edsr) / macos-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-models-macos-cpu (vit, xnnpack-quantization-delegation) / macos-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-llama-runner-linux (fp32, xnnpack+quantize_kv, linux.arm64.2xlarge, executorch-ubuntu-22.04-... / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 128
Test Vulkan Backend / test-vulkan / test-backend-linux (vulkan, models) / linux-job (gh) (matched linux rule in flaky-rules.json)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
trunk / test-arm-backend-ethos-u (test_pytest_models_ethos_u85) / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 128

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2026-06-18T21:59:33Z

The committers listed above are authorized under a signed CLA.

✅ login: christine-long-meta / name: Christine Long (7a6db1f)

meta-codesync · 2026-06-18T21:59:34Z

@ryan-monroe has exported this pull request. If you are a Meta employee, you can view the originating Diff in D107331163.

github-actions · 2026-06-18T22:00:24Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…rch#20401) Summary: `InsertTableOpsPass.generate_16_bit_table_values` builds the int16 `TOSA.TABLE` lookup for unary ops (sigmoid, tanh, ...). It computes `rshift = ceil(log2(max_table_value)) + 1 - 16` to fit the table into 16 signed bits, then does `lut_values >> rshift`, assuming the table fills ~16 bits (its own comment notes "for int16, rshift == 0"). When the op's output range uses fewer than 16 bits this breaks. A sigmoid output is in `[0, 1]`; quantized with the scale the observer picks (here `1/4096`), the largest table value is `4096` (13 bits), so `rshift = 13 - 16 = -3`. `lut_values >> -3` is an undefined negative right-shift; on the host the shift count is masked and the entire table is zeroed, so the on-device activation returns 0 for every input. On the Auth (ECAPA-TDNN) U85 model this zeroed the `Sigmoid` in every Squeeze-and-Excitation block, collapsing the SE channel-attention scale to 0 and dropping PTQ<->FVP SQNR from ~87 dB to 0 dB. The ideal table is a perfectly well-conditioned sigmoid ramp (`0 -> 2048 -> 4096`, 159 distinct levels) -- the shift, not the qparams, was the problem. Fix: clamp `rshift` to >= 0. When it would be negative the values already fit in int16, so no shift is needed; this restores the documented `rshift == 0` / `rescale_lshift == -7` case. The fix is general -- it covers any int16 `TABLE` op whose output range is small. This patches both copies of the pass (`xplat/executorch/...` and `fbcode/executorch/...`); the fbcode copy is the one fbcode test targets build, so both must stay in sync. A full writeup (localization, the LUT before/after, FVP verification, and two secondary findings -- a runner-memory HardFault and the graph-sim e2e path) is in this Google Doc: https://docs.google.com/document/d/1WGAR01mdnwOLKcQBtC4qzYLPWB1Lh4fp_8fHF4tXORE/edit Differential Revision: D107331163

…rch#20401) Summary: `InsertTableOpsPass.generate_16_bit_table_values` builds the int16 `TOSA.TABLE` lookup for unary ops (sigmoid, tanh, ...). It computes `rshift = ceil(log2(max_table_value)) + 1 - 16` to fit the table into 16 signed bits, then does `lut_values >> rshift`, assuming the table fills ~16 bits (its own comment notes "for int16, rshift == 0"). When the op's output range uses fewer than 16 bits this breaks. A sigmoid output is in `[0, 1]`; quantized with the scale the observer picks (here `1/4096`), the largest table value is `4096` (13 bits), so `rshift = 13 - 16 = -3`. `lut_values >> -3` is an undefined negative right-shift; on the host the shift count is masked and the entire table is zeroed, so the on-device activation returns 0 for every input. On the model this zeroed the `Sigmoid` in every Squeeze-and-Excitation block, collapsing the channel-attention scale to 0 and dropping PTQ<->FVP SQNR from ~87 dB to 0 dB. The ideal table is a perfectly well-conditioned sigmoid ramp (`0 -> 2048 -> 4096`, 159 distinct levels) -- the shift, not the qparams, was the problem. Fix: clamp `rshift` to >= 0. When it would be negative the values already fit in int16, so no shift is needed; this restores the documented `rshift == 0` / `rescale_lshift == -7` case. The fix is general -- it covers any int16 `TABLE` op whose output range is small. This patches both copies of the pass (`xplat/executorch/...` and `fbcode/executorch/...`); the fbcode copy is the one fbcode test targets build, so both must stay in sync. Differential Revision: D107331163

…rch#20401) Summary: `InsertTableOpsPass.generate_16_bit_table_values` builds the int16 `TOSA.TABLE` lookup for unary ops (sigmoid, tanh, ...). It computes `rshift = ceil(log2(max_table_value)) + 1 - 16` to fit the table into 16 signed bits, then does `lut_values >> rshift`, assuming the table fills ~16 bits (its own comment notes "for int16, rshift == 0"). When the op's output range uses fewer than 16 bits this breaks. A sigmoid output is in `[0, 1]`; quantized with a small scale (e.g. `1/4096`), the largest table value is `4096` (13 bits), so `rshift = 13 - 16 = -3`. `lut_values >> -3` is an undefined negative right-shift; on the host the shift count is masked and the entire table is zeroed, so the activation returns 0 for every input. This makes any int16 `TABLE` op with a small output range (e.g. a sigmoid in a Squeeze-and-Excitation block) degenerate. Fix: clamp `rshift` to >= 0. When it would be negative the values already fit in int16, so no shift is needed; this restores the documented `rshift == 0` / `rescale_lshift == -7` case. The fix is general -- it covers any int16 `TABLE` op whose output range is small. Differential Revision: D107331163

christine-long-meta · 2026-06-30T19:27:09Z

Converting the draft as I cannot re-export the commandeered diff to this PR due to owner is on PTO. Will create new PR to get around this problem

ryan-monroe requested a review from digantdesai as a code owner June 18, 2026 21:59

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2026

github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels Jun 18, 2026

meta-codesync Bot added the meta-exported label Jun 18, 2026

meta-codesync Bot temporarily deployed to cadence June 18, 2026 21:59 Inactive

meta-codesync Bot changed the title ~~Fix int16 TOSA.TABLE LUT zeroed when output range uses <16 bits~~ Fix int16 TOSA.TABLE LUT zeroed when output range uses <16 bits (#20401) Jun 18, 2026

ryan-monroe force-pushed the export-D107331163 branch from 894a5ec to 813a584 Compare June 18, 2026 22:08

ryan-monroe force-pushed the export-D107331163 branch 2 times, most recently from 2cda5ae to b6383b4 Compare June 22, 2026 16:34

ryan-monroe force-pushed the export-D107331163 branch from b6383b4 to 7a6db1f Compare June 30, 2026 17:47

rascani requested a review from zingo June 30, 2026 17:50

ryan-monroe force-pushed the export-D107331163 branch from 398e060 to 4d52808 Compare June 30, 2026 18:52

ryan-monroe force-pushed the export-D107331163 branch from 4d52808 to 74ebdfc Compare June 30, 2026 19:00

christine-long-meta marked this pull request as draft June 30, 2026 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix int16 TOSA.TABLE LUT zeroed when output range uses <16 bits (#20401)#20401

Fix int16 TOSA.TABLE LUT zeroed when output range uses <16 bits (#20401)#20401
ryan-monroe wants to merge 1 commit into
pytorch:mainfrom
ryan-monroe:export-D107331163

ryan-monroe commented Jun 18, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

linux-foundation-easycla Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

christine-long-meta commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ryan-monroe commented Jun 18, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20401

❌ 2 Pending, 3 Unrelated Failures, 2 Unclassified Failures

Uh oh!

linux-foundation-easycla Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

This PR needs a release notes: label

Uh oh!

christine-long-meta commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ryan-monroe commented Jun 18, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented Jun 18, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Jun 18, 2026 •

edited

Loading

This PR needs a `release notes:` label