-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast Path Math.min/max_F/D #7617
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just typo and format changes
generateConditionalBranchInstruction(cg, TR::InstOpCode::bnun, node, nan_label, condReg); | ||
// Move the NaN which is in one of trgReg or src2Reg to trgReg by fadd | ||
generateTrg1Src2Instruction(cg, TR::InstOpCode::fadd, node, trgReg, trgReg, src2Reg); | ||
// Go to the nan_lavel for NaN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo nan_lavel
generateTrg1Src2Instruction(cg, | ||
max ? TR::InstOpCode::xsmaxdp : TR::InstOpCode::xsmindp, | ||
node, | ||
trgReg, src1Reg, src2Reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instruction generation calls are usually in one-line or few if needed.
2571b9d
to
d6e3454
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@hzongaro can you review and merge this? |
Performance test for jdk11 SIMDDoubleMaxMinBench
|
generateConditionalBranchInstruction(cg, TR::InstOpCode::bnun, node, nan_label, condReg); | ||
// Move the NaN which is in one of trgReg or src2Reg to trgReg by fadd | ||
generateTrg1Src2Instruction(cg, TR::InstOpCode::fadd, node, trgReg, trgReg, src2Reg); | ||
// Go to the nan_label for NaN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will run into functional bugs (and difficult to debug) sooner or later: since src1Reg is now used in controlFlow area, you will run into register allocation bugs under high register pressure if it is not added in dependency condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a condition, is that correct?
e9aa67f
to
d6b87f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
dep->addPostCondition(condReg, TR::RealRegister::NoReg); | ||
dep->addPostCondition(trgReg, TR::RealRegister::NoReg); | ||
dep->addPostCondition(src1Reg, TR::RealRegister::NoReg); | ||
dep->addPostCondition(src2Reg, TR::RealRegister::NoReg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guessed it is acceptable to be a little conservative. adding src1Reg into dependency condition is not necessary for integer side, but it is not wrong either.
dep->addPostCondition(condReg, TR::RealRegister::NoReg); | ||
dep->addPostCondition(trgReg, TR::RealRegister::NoReg); | ||
dep->addPostCondition(src1Reg, TR::RealRegister::NoReg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this bug previously: trgReg can be the same register as src1Reg in some cases, such that registerAllocator can get into endless loop when the same register appeared twice or more times in a dep condition.
Re-enable the fast pathing of Math.min/max for floating points with the behaviour of +/-0.0 and NaN values correctly addressed. Signed-off-by: Luke Li <[email protected]>
d6b87f0
to
3bae6ee
Compare
@hzongaro the PR is ready now, can you review and merge this? |
Re-enable the fast pathing of Math.min/max for floating points with the behaviour of +/-0.0 and NaN values correctly addressed.