Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast Path Math.min/max_F/D #7617

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

luke-li-2003
Copy link
Contributor

Re-enable the fast pathing of Math.min/max for floating points with the behaviour of +/-0.0 and NaN values correctly addressed.

@luke-li-2003
Copy link
Contributor Author

Copy link
Contributor

@rmnattas rmnattas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just typo and format changes

generateConditionalBranchInstruction(cg, TR::InstOpCode::bnun, node, nan_label, condReg);
// Move the NaN which is in one of trgReg or src2Reg to trgReg by fadd
generateTrg1Src2Instruction(cg, TR::InstOpCode::fadd, node, trgReg, trgReg, src2Reg);
// Go to the nan_lavel for NaN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo nan_lavel

generateTrg1Src2Instruction(cg,
max ? TR::InstOpCode::xsmaxdp : TR::InstOpCode::xsmindp,
node,
trgReg, src1Reg, src2Reg);
Copy link
Contributor

@rmnattas rmnattas Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instruction generation calls are usually in one-line or few if needed.

@luke-li-2003 luke-li-2003 force-pushed the FastPathMathMinMaxFD branch 2 times, most recently from 2571b9d to d6e3454 Compare January 22, 2025 19:49
Copy link
Contributor

@rmnattas rmnattas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luke-li-2003
Copy link
Contributor Author

luke-li-2003 commented Jan 23, 2025

@hzongaro can you review and merge this?

@luke-li-2003
Copy link
Contributor Author

Performance test for jdk11 SIMDDoubleMaxMinBench

Branch Score
Baseline 14180465.000000
FastPathMathMinMaxFD 15760804.000000

generateConditionalBranchInstruction(cg, TR::InstOpCode::bnun, node, nan_label, condReg);
// Move the NaN which is in one of trgReg or src2Reg to trgReg by fadd
generateTrg1Src2Instruction(cg, TR::InstOpCode::fadd, node, trgReg, trgReg, src2Reg);
// Go to the nan_label for NaN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will run into functional bugs (and difficult to debug) sooner or later: since src1Reg is now used in controlFlow area, you will run into register allocation bugs under high register pressure if it is not added in dependency condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a condition, is that correct?

@luke-li-2003 luke-li-2003 force-pushed the FastPathMathMinMaxFD branch 2 times, most recently from e9aa67f to d6b87f0 Compare January 27, 2025 19:59
Copy link
Contributor

@zl-wang zl-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

dep->addPostCondition(condReg, TR::RealRegister::NoReg);
dep->addPostCondition(trgReg, TR::RealRegister::NoReg);
dep->addPostCondition(src1Reg, TR::RealRegister::NoReg);
dep->addPostCondition(src2Reg, TR::RealRegister::NoReg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guessed it is acceptable to be a little conservative. adding src1Reg into dependency condition is not necessary for integer side, but it is not wrong either.

dep->addPostCondition(condReg, TR::RealRegister::NoReg);
dep->addPostCondition(trgReg, TR::RealRegister::NoReg);
dep->addPostCondition(src1Reg, TR::RealRegister::NoReg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this bug previously: trgReg can be the same register as src1Reg in some cases, such that registerAllocator can get into endless loop when the same register appeared twice or more times in a dep condition.

Re-enable the fast pathing of Math.min/max for floating points
with the behaviour of +/-0.0 and NaN values correctly addressed.

Signed-off-by: Luke Li <[email protected]>
@luke-li-2003
Copy link
Contributor Author

@hzongaro the PR is ready now, can you review and merge this?

@hzongaro hzongaro self-assigned this Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants