[aes] Improve GHASH masking to reduce SCA leakage #18

vogelpi · 2025-01-23T13:52:52Z

This PR contains several commits and I recommend reviewing them one by one. The most substantial one is the last one. I recommend reviewing this with the masked block diagram at hand.

With these changes in place, the design passes formal masking verification in Alma considering also transient effects.

hw/ip/aes/rtl/aes_ghash.sv

andrea-caforio

If I understand the workflow correctly. The tool signals some transient leakages which are then patched by adding blankers etc. Does there exist a formalism that codifies such additional countermeasures on top of the masking methodology such that they can already be included during the first implementation phase?

andrea-caforio · 2025-01-24T06:15:50Z

hw/ip/aes/rtl/aes_ghash.sv

          end
        end
      end

+      GHASH_MASKED_SETTLE: begin


Does the settle state add one more cycle to the computation or is it offset by forwarding the result
of the last addition directly to the output?

Yes, it's a bubble cycle. The design does really nothing. It's required to prevent that a new register value (written at the active clock edge when entering this state) gets combined with a previous intermediate result potentially still present on some downstream wires.

andrea-caforio · 2025-01-24T06:25:50Z

hw/ip/aes/rtl/aes_ghash.sv

+          // Note: Once the multiplication finishes, Share 0 of the state depends on Share 0 of the
+          // hash subkey. Thus, we don't forward it to the second multiplier as this may lead to
+          // undesirable SCA leakage inside the multiplier.
+          // When doing the first block only, we have to start computing another correction term
+          // using the second multiplier in the next clock cycle, i.e., S1 * H1.


By deferring the computation of the correction term of the first block to the next clock cycle, does this also
shift all other correction term multiplications for the following blocks?

No, this correction term is just used once throughout an entire message. Instead of computing it and storing it in a separate 128-bit register (expensive!) we compute it and directly use it. All other blocks use the two correction terms we compute once at the beginning and then store into registers. Since these terms are used once per block, it pays off to store them in registers.

Theoretically, we could compute them as well for every block and make the multiplier faster (would probably be more efficient from an area perspective) but it's not nice from an SCA viewpoint, because you compute the same operation on the same inputs many times.

vogelpi · 2025-01-24T10:23:52Z

If I understand the workflow correctly. The tool signals some transient leakages which are then patched by adding blankers etc. Does there exist a formalism that codifies such additional countermeasures on top of the masking methodology such that they can already be included during the first implementation phase?

Thanks for your review @andrea-caforio . You understanding is correct.

To a limited extent, there are rules that can be followed whenever logic is time-multiplexed between masking shares (such as the second multiplier here, or parts of the ALU in CPU). For example: don't process two shares back to back and use onehot muxes. We do have some guidance in the for OTBN programmers for example. But I fear eventually, it's always pretty much tailored to the implementation at hand. Also, adding this stuff and doing it right takes a lot of time. For a first implementation, one typically needs something quickly that one can then start optimizing.

This commit adds a wrapper module suitable for formal or simulation- based masking verification using Alma or PROLEAD, respectively. Also, it adds the required setup files to kick off the formal masking verification using Alma. Signed-off-by: Pirmin Vogel <[email protected]>

It turns out that the B input is scanned instead of the A input. For SCA hardening, it's better to connect the hash subkey to the un- scanned input, i.e., input A. Signed-off-by: Pirmin Vogel <[email protected]>

This output is high during the second to the last clock cycle, i.e., in the cycle before the output becomes valid or one cycle before ack_o asserts. Signed-off-by: Pirmin Vogel <[email protected]>

Forwarding the unscanned Operand A (typically used for the secret) in case of SCA hardened designs is not ideal whereas forwarding a deterministic value is less ideal when focusing on FI hardening. This commit adds a parameter to choose what to forward before the result is ready. Signed-off-by: Pirmin Vogel <[email protected]>

This commit implements a series of improvements for the GHASH masking scheme to reduce the SCA leakage. With these changes, the implementation successfully passes formal masking verification using Alma in transient mode, i.e., when glitches are considered. Prior to this commit, the implementation would pass masking verification in stable mode only. The following improvements have been made: - The result of the final addition of Share 1 of S and the unmasked GHASH state is no longer stored into the GHASH state register but directly forwarded to the output, and the state input to this addition is blanked. The input multiplexer (ghash_in_mux) looses one input. (The ghash_state_mux for the unmasked implementation gains one input.) - The two 3-input multiplexers selecting the operands for the addition with the GHASH state (add_in_mux) are replaced by one-hot multiplexers with registered control signals. - The Operand B inputs of both GF multipliers are now blanked. The 3-input multiplexer selecting Operand B of the second GF multiplier is replaced by a one-hot multiplexer with registered control signal. In addition, the last input slice of Operand B for this multiplier is registered. This allows the switching the multiplexer during the last clock cycle of the multiplication to avoid some undesirable transient leakage occurring upon saving the result of the multiplication into the GHASH state register (and this new value propagating through the multiplexer into the multiplier again). - The GF multipliers are configured to output zero instead of Operand A (the hash subkey) while busy. - The state input for the addition required for the generation of the correction term for Share 0 is blanked. - Between adding the correction terms to the GHASH state for the last time and between unmasking the GHASH state, a bubble cycle is added to allow signals to fully settle thereby avoiding undesirable transient effects unmasking the uncorrected state shares. The overall area impact of these changes is low (+0.16 kGE in Yosys + nangate45). Signed-off-by: Pirmin Vogel <[email protected]>

vogelpi requested review from nasahlpa and andrea-caforio January 23, 2025 13:52

nasahlpa reviewed Jan 23, 2025

View reviewed changes

hw/ip/aes/rtl/aes_ghash.sv Outdated Show resolved Hide resolved

vogelpi force-pushed the aes-gcm-masking-fixes branch from 21cc5a3 to 74f03b7 Compare January 23, 2025 16:59

andrea-caforio approved these changes Jan 24, 2025

View reviewed changes

vogelpi added 5 commits February 4, 2025 21:31

[aes/rtl] Switch inputs of GF multipliers inside the GHASH block

ac9b614

It turns out that the B input is scanned instead of the A input. For SCA hardening, it's better to connect the hash subkey to the un- scanned input, i.e., input A. Signed-off-by: Pirmin Vogel <[email protected]>

[prim_gf_mult] Add ack_pre_o output

ec59467

This output is high during the second to the last clock cycle, i.e., in the cycle before the output becomes valid or one cycle before ack_o asserts. Signed-off-by: Pirmin Vogel <[email protected]>

vogelpi force-pushed the aes-gcm-masking-fixes branch from 74f03b7 to 9b2ee66 Compare February 4, 2025 20:32

vogelpi merged commit ac93331 into aes-gcm-review Feb 6, 2025
12 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aes] Improve GHASH masking to reduce SCA leakage #18

[aes] Improve GHASH masking to reduce SCA leakage #18

vogelpi commented Jan 23, 2025

andrea-caforio left a comment

andrea-caforio Jan 24, 2025

vogelpi Jan 24, 2025

andrea-caforio Jan 24, 2025

vogelpi Jan 24, 2025

vogelpi Jan 24, 2025

vogelpi commented Jan 24, 2025

[aes] Improve GHASH masking to reduce SCA leakage #18

[aes] Improve GHASH masking to reduce SCA leakage #18

Conversation

vogelpi commented Jan 23, 2025

andrea-caforio left a comment

Choose a reason for hiding this comment

andrea-caforio Jan 24, 2025

Choose a reason for hiding this comment

vogelpi Jan 24, 2025

Choose a reason for hiding this comment

andrea-caforio Jan 24, 2025

Choose a reason for hiding this comment

vogelpi Jan 24, 2025

Choose a reason for hiding this comment

vogelpi Jan 24, 2025

Choose a reason for hiding this comment

vogelpi commented Jan 24, 2025