-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SFPIADD doesn't take two cycles. #27
base: main
Are you sure you want to change the base?
Conversation
/cc @ttmtrajkovic just in case you haven't seen this. Also wondering if you are able to share any additional documentation on SFP instructions for Wormhole, I think you mentioned you would check to see what can be shared publicly? I'm mainly interested in SFPSHFT2, SFPSHFT, SFPLZ, SFPAND, SFPOR, SFPMOV, SFPABS, SFPSWAP. Essentially anything that involves int32. They're mostly pretty well understood but there may well be new information in the documentation. |
@jasondavies I can't comment about the SFP docs publishing, but let me check with relevant folks internally.
|
In my testing on Wormhole the result always seemed to be immediately available for integer addition, hence my confusion. Maybe I need to run more tests? |
Not sure how you are testing. On WH, if SFPSTORE comes immediately after SFPIADD there are no guarantees you will get the correct result. That is HW spec. I guess depending on the situation and what order of instruction is actually in the binary, SFPSTORE and SFPADD might end up being not consecutive and you don't see an issue. Bottom line is, to be aligned with the HW spec on WH we have these NOPs, which guarantee correct operation under all circumstances. On BH they shouldn't be necessary, but before we remove them there, we need to do extensive sweep test to gain confidence that it is working as expected. |
Ah, maybe by "immediately available" you mean for SFPSTORE only?
But these cases don't cover SFPSTORE. |
Actually, even disassembly of a simple test like the following shows the compiler doesn't insert a NOP prior to SFPSTORE for Wormhole: vInt a = dst_reg[0];
vInt b = dst_reg[dst_offset * 64];
dst_reg[0] = a + b;
dst_reg++; sfpload L1,0,12,3
sfpload L0,128,12,3
sfpiadd L0,L1,0x000,4
sfpstore 0,L0,12,3
ttincrwc 0,2,0,0 Just to be absolutely clear, I'm talking about SFPIADD not SFPADD :) |
Good point @jasondavies. I need to clarify the HW spec with our HW team and get back to you. Indeed it seems like SFPIADD instruction should take 1-cycle to produce results. @rdjogoTT , @rtawfik01 do you have an idea why this is? |
In this example I don't think you need a NOP because the next instruction (SFPENCC) does not consume a result from SFPADDI. |
Just to be clear on my understanding: everything that uses the fp32 floating point multiply/add unit takes 2 cycles (before result can be consumed). This includes SFPADD, SFPMUL, SFPMAD, SFPMULI, SFPADDI. But, 32-bit signed integer addition (SFPIADD) doesn't use the floating-point multiply/add unit, and only takes a single cycle in all my testing, and this is consistent with the compiler output in terms of being NOP-free. |
Yes, sorry, I misread the instruction. |
Yeah I think @jasondavies is correct here, good catch. And indeed, the typecast example does not need a NOP because the SFPENCC that follows is independent. |
@jasondavies HW team confirmed that this should be a 1-cycle instruction, so no NOP should be necessary. To be on the safe side I would like to run our internal test pipelines against these changes to verify. |
386900e
to
236a50a
Compare
Do you want me to rebase against mt/nop_throttling which isn't merged to this repo but is being pointed to from tt-metal? |
Seems I'm not seeing the same thing as you. I see tt-metal pointing to commit c124c67. |
236a50a
to
f119fc2
Compare
Looks like that commit is actually merged but with a different commit ID. I will rebase when tenstorrent/tt-metal#18617 is merged. |
Sounds good, thanks! |
Can I get someone to trigger those internal test pipelines on tenstorrent/tt-metal#18611? |
I don't have access to any documentation for this instruction, but from my own testing I don't think two cycles are used. Probably confusion with SFPMAD?