-
Notifications
You must be signed in to change notification settings - Fork 11
Use native ulong shifts when custom shift is not needed #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use native ulong shifts when custom shift is not needed #50
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors shift operations in the UInt256 implementation so that native ulong shifts are used when the custom shift behavior is not required, improving performance and simplifying the code.
- Introduces aggressively inlined helpers NativeLsh and NativeRsh.
- Replaces many calls to the custom Lsh/Rsh with native shift operations in Udivrem, Div64, Lsh, and Rsh.
- Adds an explicit DivideByZeroException branch in Udivrem to handle invalid divisor cases.
|
Comparison with the package benchmarks: Current `Lsh` Benchmarks
Previous `Lsh` Benchmarks
|
|
I'm running benchmarks for the shift ATM. Will provide comparison and then proceed with the review. |
src/Nethermind.Int256/UInt256.cs
Outdated
| } | ||
| else | ||
| { | ||
| throw new DivideByZeroException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this behaviour change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like in Evm we check for 0 before calling divide, so should be ok (behaviour)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a separate method, non inlinable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in f08fba3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the existing method for throwing the exception - just followed the same as ThrowOverflowException and am now passing the message as argument
|
On my machine there's some boost for Master: 45.702 ns ± 0.3279 master
shift optimized
|
Scooletz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add asserts, don't throw in the body. I added benchmarks in a comment and for AddMod_UInt256 it ~3% faster
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| internal static ulong NativeLsh(ulong a, int n) | ||
| { | ||
| return a << n; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we Debug.Assert the requirement when this method can be called (n > 64).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Done in 2def59d
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| internal static ulong NativeRsh(ulong a, int n) | ||
| { | ||
| return a >> n; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we Debug.Assert the requirement when this method can be called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Done in 2def59d
src/Nethermind.Int256/UInt256.cs
Outdated
| } | ||
| else | ||
| { | ||
| throw new DivideByZeroException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a separate method, non inlinable?
src/Nethermind.Int256/UInt256.cs
Outdated
|
|
||
| [DoesNotReturn] | ||
| private static void ThrowDivideByZeroException() => throw new DivideByZeroException("y == 0"); | ||
| private static void ThrowDivideByZeroException(string message) => throw new DivideByZeroException(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd scrap the message; will add unneeded string load code; also is obvious what is zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, just didn't want to change the existing approach. Done in 9740891
In some cases, we are using our
ulongLshandRshimplementations instead of the native ones. This would be necessary in the case thatn == 64, since C# masks the shift amount by 64 and as sucha << 64 == a, but we wanta << 64 == 0.However, it seems like in some of these cases in
UInt256we can, in fact, consider some restrictions on the values ofnand replace the custom with the native shift operations, avoiding some extra steps.Udivrem,shiftisLeadingZeroes(FirstNonZero(d.u[0], d.u[1], d.u[2], d.u[3])). So, we know this could only be 64 if all of them are zeroes. But in this casedwould be zero which means a division by zero.shiftcan be zero though. This means shifting byshiftcan be native while by64 - shiftwould still require the custom shift.Div64it's a similar situation.shift < 64sincedcan only have 64 leading zeroes if it is zero, butshiftcan still be zero. This also means shifting byshiftcan be native while by64 - shiftwould still require the custom shift.Lsh(UInt256,...)andRsh(UInt256,...), we know that, in the bit shift section at the end,ncannot be 64 as we have already done all the word shifts we could do before. We also know it can't be zero as we are returning early in the cases it's only 0-4 word shifts. So, sincenis strictly in [1..63], this means we can replace all the shifts with native ones.With a very simple benchmark this seems to have had a significant effect in
Lshperformance, but requires better testing.Whether these benefits justify the additional complexity is also an important point.