-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aes] Base RTL implementation of GCM extension #2
Conversation
Support for AES-GCM will be added in a backward compatible manner. Existing software won't need to change. Signed-off-by: Pirmin Vogel <[email protected]>
Signed-off-by: Pirmin Vogel <[email protected]>
998629f
to
3aaa967
Compare
@@ -79,6 +85,13 @@ module aes_control | |||
output logic cipher_data_out_clear_o, | |||
input logic cipher_data_out_clear_i, | |||
|
|||
// GHASH control and sync | |||
output sp2v_e ghash_in_valid_o, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These sparse two-value signals are only used in the AES block, why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use sparse two-value signals for FI hardening purposes. Implementing that hardening is quite cumbersome at the moment. Since the GHASH block is not yet final, I did not yet do the hardening there yet. But I wanted to keep the interface between the cores as stable as possible.
aes_ctrl_ns = !cipher_dec_key_gen_i ? CTRL_PRNG_UPDATE : CTRL_FINISH; | ||
end | ||
|
||
CTRL_PRNG_UPDATE: begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this state being superseded by CTRL_GHASH_READY
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to properly decouple the valid from the ready in the various handshake pairs. After investigating the FSM I realized I could do something similar to how we handled the clearing PRNG. Then I realized that the clearing PRNG actually doesn't need a handshake for the updating function (it's always ready, see separate commit). That's why I changed the integration of the PRNG and then re-purposed this FSM state to integrate the GHASH block.
hw/ip/aes/rtl/aes_core.sv
Outdated
end | ||
|
||
// Avoid aggressive synthesis optimizations. | ||
logic [3:0][3:0][7:0] ghash_state_done_buf [NumShares]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the worst case, what could be optimized away if the signal is not being buffered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mux could be optimized away, because most tools realize that in case where the output is not marked as valid, we anyway don't store it into the output registers. Without the mux, the two shares of the GHASH state will constantly get compared at the output.
But for the real hardened design, I think we won't need this. We can re-use existing logic inside GHASH to add the shares at the very end of the computation. This will take one more clock cycle but it will be more efficient area wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now removed this. It saves around 300 GEs :-)
Thanks for your review @andrea-caforio ! |
This commit adds a first implementation of the GHASH module required for AES-GCM support. This first version uses a single, pipelined GF(2^128) multiplier and 3 128-bit registers for the GHASH state, the hash subkey and the encrypted initial counter block J_0 (= S). The latency of the GF multiplier is matched to the latency of the cipher core. Signed-off-by: Pirmin Vogel <[email protected]>
For historical reasons, the clearing PRNG had a req/ack interface for both the pseudo-random data and the reseed interface. The former is actually not required as the PRNG can always provide suitable pseudo- random data, even in case of an outstanding reseed request. This commit thus simplifies the req/ack interface to a single update signal, similar to what is used for the masking PRNG. In addition to simplifying the code, this enables re-using the main control FSM state previously used for performing the handshake with the clearing PRNG, to perform an actually required handshake with another functional block such as the GHASH block used for AES-GCM. Signed-off-by: Pirmin Vogel <[email protected]>
Signed-off-by: Pirmin Vogel <[email protected]>
In contrast to the regular CTR mode where the counter performs inc128(), the counter only performs inc32() in GCM, i.e., the counter wraps at 32 bits. Signed-off-by: Pirmin Vogel <[email protected]>
This commit optimizes the clearing logic of the GHASH block to clear internal registers by loading the cipher core output after the cipher output has cleared its internal state. This allows reducing the internal muxing logic by 2x 128-bit wide 2-to-1 muxes and the big 128-bit wide 5-to-1 state mux looses one input, too. When using the open source synthesis flow, this helps reducing the area by roughly 1 kGE. Signed-off-by: Pirmin Vogel <[email protected]>
3aaa967
to
d249304
Compare
CHANGE AUTHORIZED: hw/ip/aes/data/aes.hjson |
This PR contains the base RTL implementation of the AES-GCM extension. The GHASH implementation in this PR is unhardened against SCA. The hardened implementation will be part of a follow-up PR.
This implementation has been tested using a basic Verilator testbench and some NIST vectors.