-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-6652] RelDecorrelator can't decorrelate query with limit 1 #4181
base: main
Are you sure you want to change the base?
Conversation
If no one reviews it, I will start reviewing it this weekend, it may take some time. |
Thank you. If you have any questions or need further clarification, feel free to ask. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to review some more later, after I understand the relationship between order by and min/max.
Perhaps it's enough to check that the orderby does null handling in the expected way.
// MIN/MAX returns NULL, while LIMIT 1 returns 0 rows. | ||
// However, in the decorrelate, we add correlated variables to the group list | ||
// to ensure equivalence when Correlate is transformed to Join. When the group list | ||
// is non-empty, MIN/MAX will also return 0 rows if input with 0 rows.(This behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"if the input has 0 rows."
I think that the references to the other calcite issue should not be in this comment - they will become obsolete if the issue is closed, and they don't really clarify what is going on here.
// | ||
// Rewrite logic: | ||
// | ||
// If sorted with no OFFSET and FETCH = 1, and only one collation field, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the check for fetch == 1 is not here (so you can mention that it's enforced by the caller).
Moreover, for symmetry, should this comment be within the if branch?
} | ||
|
||
final int newIdx = requireNonNull(frame.oldToNewOutputs.get(collation.getFieldIndex())); | ||
RelBuilder.AggCall aggCall = relBuilder.push(frame.r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering: sorting has collation, but min/max do not.
For example, sorting can specify what happens to nulls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my exact same thought after a quick look at the PR... is the proposed conversion 100% equivalent in all cases, also when nulls are involved? What about when the original Sort is nulls-first and the relevant field has null values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as mentioned in JIRA, I missed this. This conversion is only valid if nulls-last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have some unit tests with nulls-first, to see the expected result being the conversion not being applied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that MAX works for NULL LAST only, and MIN for NULL FIRST.
Am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For MIN/MAX, null values are ignored in the databases I know (I didn't find a corresponding description in the sql standard). So I think both asc and desc should be null last, i.e.return null only if all values are null.
a7d9b79
to
6982d0b
Compare
6982d0b
to
58d47f8
Compare
@mihaibudiu @rubenada Thanks for your feedback! I've updated the code and added null value test, thanks to review when you have time. |
@@ -795,6 +801,146 @@ private static void shiftMapping(Map<Integer, Integer> mapping, int startIndex, | |||
return null; | |||
} | |||
|
|||
private @Nullable Frame decorrelateFetchOneSort(Sort sort, final Frame frame) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm a bit too paranoid, but just to be on the safe side, I'd propose the new methods (decorrelateFetchOneSort
and decorrelateSortAsAggregate
) to be protected
instead of private
; the reason is that if, for whatever reason, a downstream project wants to "switch off" this conversion, they could simply extend the RelDecorreletor with their own child class and override these methods to simply return null
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we need a safe way to avoid errors. Usually the user uses the decorrelate by calling decorrelateQuery. So I think is it possible to add something like SqlToRelConverter#CONFIG (which can be combined with [CALCITE-6674]) to choose whether to enable it or not. WDYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a cleaner solution, but IMO maybe a bit overkill for this particular purpose. I think protected methods would suffice to provide a way to "fallback". Let's see if there are other opinions on this subject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I've made the changes. Thanks for the suggestion.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming that the quidem test results have been checked with another database as well.
This PR handles decorrelate since the query contains limit 1. There are two main types of conversions: