Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SFPIADD doesn't take two cycles. #27

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jasondavies
Copy link

I don't have access to any documentation for this instruction, but from my own testing I don't think two cycles are used. Probably confusion with SFPMAD?

@jasondavies
Copy link
Author

/cc @ttmtrajkovic just in case you haven't seen this.

Also wondering if you are able to share any additional documentation on SFP instructions for Wormhole, I think you mentioned you would check to see what can be shared publicly?

I'm mainly interested in SFPSHFT2, SFPSHFT, SFPLZ, SFPAND, SFPOR, SFPMOV, SFPABS, SFPSWAP. Essentially anything that involves int32. They're mostly pretty well understood but there may well be new information in the documentation.

@nvelickovicTT
Copy link
Contributor

@jasondavies I can't comment about the SFP docs publishing, but let me check with relevant folks internally.
As far as the 2 cycles requirement for some SFP ops goes:

  • It is true that on WH this is required. Not because the OP takes 2 cycles to executed, but because the result is not immediately available. So for WH this additional NOP has to stay to avoid potential race.

  • For BH this should not be required. But before we disable it, it has to pass through a sweep of tests to make sure there are no regressions. This is a small performance optimization, which is not high priority at the moment.

@jasondavies
Copy link
Author

  • It is true that on WH this is required. Not because the OP takes 2 cycles to executed, but because the result is not immediately available. So for WH this additional NOP has to stay to avoid potential race.

In my testing on Wormhole the result always seemed to be immediately available for integer addition, hence my confusion. Maybe I need to run more tests?

@nvelickovicTT
Copy link
Contributor

nvelickovicTT commented Mar 4, 2025

Not sure how you are testing. On WH, if SFPSTORE comes immediately after SFPIADD there are no guarantees you will get the correct result. That is HW spec. I guess depending on the situation and what order of instruction is actually in the binary, SFPSTORE and SFPADD might end up being not consecutive and you don't see an issue.

Bottom line is, to be aligned with the HW spec on WH we have these NOPs, which guarantee correct operation under all circumstances.

On BH they shouldn't be necessary, but before we remove them there, we need to do extensive sweep test to gain confidence that it is working as expected.

@jasondavies
Copy link
Author

Ah, maybe by "immediately available" you mean for SFPSTORE only?

  • Disassembling example uses of integer addition doesn't show any inserted NOPs e.g. SFPIADD with result immediately used by SFPMUL.
  • There's at least one example of hand-coded use of SFPIADD where the result is used immediately afterwards without NOP.

But these cases don't cover SFPSTORE.

@jasondavies
Copy link
Author

jasondavies commented Mar 4, 2025

Actually, even disassembly of a simple test like the following shows the compiler doesn't insert a NOP prior to SFPSTORE for Wormhole:

vInt a = dst_reg[0];
vInt b = dst_reg[dst_offset * 64];
dst_reg[0] = a + b;
dst_reg++;
sfpload  L1,0,12,3
sfpload  L0,128,12,3
sfpiadd  L0,L1,0x000,4
sfpstore 0,L0,12,3
ttincrwc 0,2,0,0

Just to be absolutely clear, I'm talking about SFPIADD not SFPADD :)

@nvelickovicTT
Copy link
Contributor

nvelickovicTT commented Mar 4, 2025

Good point @jasondavies. I need to clarify the HW spec with our HW team and get back to you. Indeed it seems like SFPIADD instruction should take 1-cycle to produce results.
But the LLKs historically seem to have been putting the NOP on it, and not on the SFPADDI which is a 2-cycle operation.

@rdjogoTT , @rtawfik01 do you have an idea why this is?
Looking at ckernel_sfpu_typecast.h I see that SFPADDI (which takes 2 cycles) is not followed by a NOP. And SFPIADD is, but the spec doesn't list it as 2-cycle instruction.

@jasondavies
Copy link
Author

Looking at ckernel_sfpu_typecast.h I see that SFPADDI (which takes 2 cycles) is not followed by a NOP.

In this example I don't think you need a NOP because the next instruction (SFPENCC) does not consume a result from SFPADDI.

@jasondavies
Copy link
Author

jasondavies commented Mar 4, 2025

Just to be clear on my understanding: everything that uses the fp32 floating point multiply/add unit takes 2 cycles (before result can be consumed). This includes SFPADD, SFPMUL, SFPMAD, SFPMULI, SFPADDI.

But, 32-bit signed integer addition (SFPIADD) doesn't use the floating-point multiply/add unit, and only takes a single cycle in all my testing, and this is consistent with the compiler output in terms of being NOP-free.

@nvelickovicTT
Copy link
Contributor

Yes, sorry, I misread the instruction.
What you are saying makes sense. I will double check with HW team about SFPIADD to be sure.

@rdjogoTT
Copy link
Contributor

rdjogoTT commented Mar 4, 2025

Yeah I think @jasondavies is correct here, good catch. And indeed, the typecast example does not need a NOP because the SFPENCC that follows is independent.

@nvelickovicTT
Copy link
Contributor

@jasondavies HW team confirmed that this should be a 1-cycle instruction, so no NOP should be necessary.

To be on the safe side I would like to run our internal test pipelines against these changes to verify.
This repo doesn't yet have these handy and we test through tt-metal. Can you create a branch in tt-metal repo that references this LLK branch. Someone on our side can than trigger the job, as I am not sure if you would have the privileges.

@jasondavies
Copy link
Author

Can you create a branch in tt-metal repo that references this LLK branch.

Do you want me to rebase against mt/nop_throttling which isn't merged to this repo but is being pointed to from tt-metal?

@nvelickovicTT
Copy link
Contributor

Seems I'm not seeing the same thing as you. I see tt-metal pointing to commit c124c67.
Could you rebase your tt-llk branch on this commit, so that the only diff in tt-metal is your commit?

@jasondavies
Copy link
Author

Ah, it looks like c124c67 isn't merged to main for this repo, but it's not the HEAD for mt/nop_throttling either. Anyway, I've rebased my branch so that it includes c124c67. So any sweep tests should now only check c124c67 plus the SFPIADD changes.

@jasondavies
Copy link
Author

Looks like that commit is actually merged but with a different commit ID. I will rebase when tenstorrent/tt-metal#18617 is merged.

@nvelickovicTT
Copy link
Contributor

Sounds good, thanks!

@jasondavies
Copy link
Author

Can I get someone to trigger those internal test pipelines on tenstorrent/tt-metal#18611?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants