-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG:Wrong sum in groupby rolling due to precision issues #38752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @AkariGale , closing as you haven't written anything, but if you fill out the report and ping us we'll reopen |
Hi @MarcoGorelli ! Sorry, I accidentally posted an empty report. Сorrected |
first bad commit: [bad52a9] PERF: Use Indexers to implement groupby rolling (#34052) cc @mroeschke |
Thanks for the report. You are correct in that there is influence from the first summation as Since the summation of values is carried across the groups, it appears this example hits numerical precision limits of our summation algorithm as replacing the values in your example with integers yields the correct result
|
Could you provide a smaller example reproducing this? See https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports |
Hi @simonjayhawkins - any reason to move this off 1.2.2 if it's a regression, and to add "needs info"? I don't disagree, I just don't understand. A smaller reproducible example would be great of course, but isn't there enough info here to reproduce the issue? |
@MarcoGorelli 1.2.2 release is today - so this issue is not being fixed rn |
Yeah, makes sense, but this was moved off of 1.2.2 17 days ago, so I was just trying to understand the rationale behind that |
oh it's not even clear this is actually fixable as it's a deliberate change |
removed milestone as was a regression in 1.1 release, not 1.2. #38752 (comment), #38721 (comment) (of course if we have a regression fix for older releases, we would consider whether a backport is feasible) 'needs info' was used to indicate awaiting response from OP #38752 (comment) |
Thanks for the report, but as mentioned I suspect that is is an implementation artifact (numeric precision on the algorithm) that may be impossible to address. Closing, but if you could provide a more minimal example to pinpoint the issue, it would be helpful for the team to see if it could be addressed. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Problem description
In the code above i show that last values in rollings are not equal:
value
using rolling on full dataframe, i obtain0.015625000000100003
id=2
, i obtain1e-13
.It seems that the aggregation in the second group depends on the first group.
If i replace
a
on0
, i will get2.980242e-08
in a first rolling. And if i replacea
on1
, i will get a correct answer1e-13
Expected Output
1e-13
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: