Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aes] Improve GHASH masking to reduce SCA leakage #18

Merged
merged 5 commits into from
Feb 6, 2025

Conversation

vogelpi
Copy link
Owner

@vogelpi vogelpi commented Jan 23, 2025

This PR contains several commits and I recommend reviewing them one by one. The most substantial one is the last one. I recommend reviewing this with the masked block diagram at hand.

With these changes in place, the design passes formal masking verification in Alma considering also transient effects.

@vogelpi vogelpi force-pushed the aes-gcm-masking-fixes branch from 21cc5a3 to 74f03b7 Compare January 23, 2025 16:59
Copy link
Collaborator

@andrea-caforio andrea-caforio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the workflow correctly. The tool signals some transient leakages which are then patched by adding blankers etc. Does there exist a formalism that codifies such additional countermeasures on top of the masking methodology such that they can already be included during the first implementation phase?

end
end
end

GHASH_MASKED_SETTLE: begin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the settle state add one more cycle to the computation or is it offset by forwarding the result
of the last addition directly to the output?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a bubble cycle. The design does really nothing. It's required to prevent that a new register value (written at the active clock edge when entering this state) gets combined with a previous intermediate result potentially still present on some downstream wires.

Comment on lines +831 to +835
// Note: Once the multiplication finishes, Share 0 of the state depends on Share 0 of the
// hash subkey. Thus, we don't forward it to the second multiplier as this may lead to
// undesirable SCA leakage inside the multiplier.
// When doing the first block only, we have to start computing another correction term
// using the second multiplier in the next clock cycle, i.e., S1 * H1.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By deferring the computation of the correction term of the first block to the next clock cycle, does this also
shift all other correction term multiplications for the following blocks?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this correction term is just used once throughout an entire message. Instead of computing it and storing it in a separate 128-bit register (expensive!) we compute it and directly use it. All other blocks use the two correction terms we compute once at the beginning and then store into registers. Since these terms are used once per block, it pays off to store them in registers.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, we could compute them as well for every block and make the multiplier faster (would probably be more efficient from an area perspective) but it's not nice from an SCA viewpoint, because you compute the same operation on the same inputs many times.

@vogelpi
Copy link
Owner Author

vogelpi commented Jan 24, 2025

If I understand the workflow correctly. The tool signals some transient leakages which are then patched by adding blankers etc. Does there exist a formalism that codifies such additional countermeasures on top of the masking methodology such that they can already be included during the first implementation phase?

Thanks for your review @andrea-caforio . You understanding is correct.

To a limited extent, there are rules that can be followed whenever logic is time-multiplexed between masking shares (such as the second multiplier here, or parts of the ALU in CPU). For example: don't process two shares back to back and use onehot muxes. We do have some guidance in the for OTBN programmers for example. But I fear eventually, it's always pretty much tailored to the implementation at hand. Also, adding this stuff and doing it right takes a lot of time. For a first implementation, one typically needs something quickly that one can then start optimizing.

This commit adds a wrapper module suitable for formal or simulation-
based masking verification using Alma or PROLEAD, respectively. Also,
it adds the required setup files to kick off the formal masking
verification using Alma.

Signed-off-by: Pirmin Vogel <[email protected]>
It turns out that the B input is scanned instead of the A input. For
SCA hardening, it's better to connect the hash subkey to the un-
scanned input, i.e., input A.

Signed-off-by: Pirmin Vogel <[email protected]>
This output is high during the second to the last clock cycle, i.e.,
in the cycle before the output becomes valid or one cycle before ack_o
asserts.

Signed-off-by: Pirmin Vogel <[email protected]>
Forwarding the unscanned Operand A (typically used for the secret) in
case of SCA hardened designs is not ideal whereas forwarding a
deterministic value is less ideal when focusing on FI hardening. This
commit adds a parameter to choose what to forward before the result is
ready.

Signed-off-by: Pirmin Vogel <[email protected]>
This commit implements a series of improvements for the GHASH masking
scheme to reduce the SCA leakage. With these changes, the implementation
successfully passes formal masking verification using Alma in transient
mode, i.e., when glitches are considered. Prior to this commit, the
implementation would pass masking verification in stable mode only.

The following improvements have been made:

- The result of the final addition of Share 1 of S and the unmasked
  GHASH state is no longer stored into the GHASH state register but
  directly forwarded to the output, and the state input to this addition
  is blanked. The input multiplexer (ghash_in_mux) looses one input.
  (The ghash_state_mux for the unmasked implementation gains one input.)

- The two 3-input multiplexers selecting the operands for the addition
  with the GHASH state (add_in_mux) are replaced by one-hot multiplexers
  with registered control signals.

- The Operand B inputs of both GF multipliers are now blanked.
  The 3-input multiplexer selecting Operand B of the second GF
  multiplier is replaced by a one-hot multiplexer with registered
  control signal. In addition, the last input slice of Operand B for
  this multiplier is registered. This allows the switching the
  multiplexer during the last clock cycle of the multiplication to avoid
  some undesirable transient leakage occurring upon saving the result of
  the multiplication into the GHASH state register (and this new value
  propagating through the multiplexer into the multiplier again).

- The GF multipliers are configured to output zero instead of Operand A
  (the hash subkey) while busy.

- The state input for the addition required for the generation of the
  correction term for Share 0 is blanked.

- Between adding the correction terms to the GHASH state for the last
  time and between unmasking the GHASH state, a bubble cycle is added
  to allow signals to fully settle thereby avoiding undesirable
  transient effects unmasking the uncorrected state shares.

The overall area impact of these changes is low (+0.16 kGE in Yosys +
nangate45).

Signed-off-by: Pirmin Vogel <[email protected]>
@vogelpi vogelpi force-pushed the aes-gcm-masking-fixes branch from 74f03b7 to 9b2ee66 Compare February 4, 2025 20:32
@vogelpi vogelpi merged commit ac93331 into aes-gcm-review Feb 6, 2025
12 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants