Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Value_Type.Null, fix edge cases for read_many #11737

Merged
merged 63 commits into from
Jan 24, 2025

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Dec 2, 2024

Pull Request Description

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.
  • If meaningful changes were made to logic or tests affecting Enso Cloud integration in the libraries,
    or the Snowflake database integration, a run of the Extra Tests has been scheduled.
    • If applicable, it is suggested to paste a link to a successful run of the Extra Tests.

@radeusgd radeusgd added the CI: No changelog needed Do not require a changelog entry for this PR. label Dec 2, 2024
@radeusgd radeusgd self-assigned this Dec 2, 2024
@radeusgd radeusgd force-pushed the wip/radeusgd/6281-value-type-null branch 3 times, most recently from c57057b to 22a2371 Compare December 3, 2024 17:22
@radeusgd radeusgd marked this pull request as ready for review December 3, 2024 17:23
@radeusgd
Copy link
Member Author

radeusgd commented Dec 3, 2024

Changes affect Snowflake so I'm running the Extra suite:

I expect it to fail because it is failing on develop. Yet I want to see if there are no more failures than what is expected.

@radeusgd
Copy link
Member Author

radeusgd commented Dec 3, 2024

Followup to fix Nothing * Nothing: #11751 and Nothing in expressions in DB #11757.

@@ -628,6 +626,9 @@ make_custom_cast column target_value_type type_mapping =
result = Ref.new Nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this logic hard to follow. The return value would be either Nothing or the return value of result.put expr -- but isn't the the return value of Ref.put the old value?

Copy link
Member Author

@radeusgd radeusgd Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value of put is discarded.

The function returns ref.get at the very end.

I agree it is not ideal. This is a hack to get multiline if without adding a ton of indentation to the else branches. There's just too many branches here (and while sometimes too many branches is bad, here it is just intrinsic to this function) to make it readable.

Once we get multiline if (#6408) I will be more than happy to replace this with just:

if cond1 then
    ...
else if cond2 then
    ...
else
    Nothing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think I should rewrite it with increasing indentation?

I'd rather keep it as is as IMO it is readable once the reasons are explained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it makes more sense now.

@radeusgd
Copy link
Member Author

radeusgd commented Dec 19, 2024

Followup to fix Nothing * Nothing: #11751 and Nothing in expressions in DB #11757.

I think we could merge this PR and then improve the missing bits from these 2 tickets in a follow-up PR. Unless there is any opposing views and we want to do it as a single change?

@@ -628,6 +626,9 @@ make_custom_cast column target_value_type type_mapping =
result = Ref.new Nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it makes more sense now.

Comment on lines +374 to +375
if bucket_1.is_nothing || bucket_2.is_nothing then False else
bucket_1 == bucket_2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if bucket_1.is_nothing || bucket_2.is_nothing then False else
bucket_1 == bucket_2
bucket_1 == bucket_2

Since we know either_type_is_not_ordered is false here.

@@ -29,6 +31,10 @@ public ColumnStorage apply(
return boolStorage.makeNegated();
}

if (storage.getType() instanceof NullType) {
return new NullStorage(Math.toIntExact(storage.getSize()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason getSize() can return long but NullStorage can only accept int? Is an exception here possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSize is the Enso-facing one so it returns long because that is the default storage for integers in Enso.

All of our storages take int size because most of them are backed by arrays (Null is inded an exception) so they cannot hold more than int can index.

Thanks for pointing this out, making me rethink this - long getSize() is just wrapper for int size() that 'translates' to the more directly Enso-friendly long value, but I should have just used size() here and avoided the conversion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm after further checking - this function gets ColumnStorage which only has the long returning one in its interface...

I guess I'll just keep the toIntExact conversion. It shouldn't happen as long as our storages are int based. Once they aren't - the other code will need to be updated anyway. Does that seem ok?

# Conflicts:
#	std-bits/table/src/main/java/org/enso/table/aggregations/Sum.java
#	std-bits/table/src/main/java/org/enso/table/data/column/builder/Builder.java
#	std-bits/table/src/main/java/org/enso/table/data/column/builder/TypedBuilderImpl.java
#	std-bits/table/src/main/java/org/enso/table/data/column/operation/cast/ToBigDecimalConverter.java
#	std-bits/table/src/main/java/org/enso/table/data/column/operation/cast/ToBigIntegerConverter.java
#	std-bits/table/src/main/java/org/enso/table/data/column/operation/cast/ToBooleanStorageConverter.java
#	std-bits/table/src/main/java/org/enso/table/data/column/operation/cast/ToDateStorageConverter.java
#	std-bits/table/src/main/java/org/enso/table/data/column/operation/cast/ToDateTimeStorageConverter.java
#	std-bits/table/src/main/java/org/enso/table/data/column/operation/cast/ToIntegerStorageConverter.java
#	std-bits/table/src/main/java/org/enso/table/data/column/operation/cast/ToTextStorageConverter.java
#	std-bits/table/src/main/java/org/enso/table/data/column/operation/cast/ToTimeOfDayStorageConverter.java
#	std-bits/table/src/main/java/org/enso/table/data/table/Column.java
# Conflicts:
#	std-bits/table/src/main/java/org/enso/table/data/column/builder/Builder.java
#	std-bits/table/src/main/java/org/enso/table/data/column/builder/ObjectBuilder.java
#	std-bits/table/src/main/java/org/enso/table/data/table/Column.java
Comment on lines 158 to +160
public static Column fromRepeatedItem(
String name, Value item, int repeat, ProblemAggregator problemAggregator) {
if (repeat < 0) {
throw new IllegalArgumentException("Repeat count must be non-negative.");
}

Object converted = Polyglot_Utils.convertPolyglotValue(item);

if (converted == null) {
var builder = Builder.getForType(AnyObjectType.INSTANCE, repeat, problemAggregator);
builder.appendNulls(repeat);
return new Column(name, builder.seal());
}

StorageType storageType = StorageType.forBoxedItem(converted);
Builder builder = Builder.getForType(storageType, repeat, problemAggregator);
Context context = Context.getCurrent();
for (int i = 0; i < repeat; i++) {
builder.append(converted);
context.safepoint();
}

return new Column(name, builder.seal());
return new Column(name, Storage.fromRepeatedItem(item, repeat, problemAggregator));
Copy link
Member Author

@radeusgd radeusgd Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdunkerley I've allowed myself to overwrite some of your cleanups from the other PR in this particular method with a fully custom Storage::fromRepeatedItem.

It also relies on Builder.getForType for the generic case but also has 'specializations' that create a more compact storage directly (NullStorage and LongConstantStorage that are constructed in O(1) instead of O(N)).

I think we do want this 'optimization'. In the future we can even add more ConstantStorage kinds.

The major question is if Storage class is the right place for this method. I imagine once we start adding more backends it may not necessarily be. But for now I think it is relatively good place. So I'd keep it here and move it when it makes sense.

Does that seem OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes fine to do. Will rebase and cope :)

@radeusgd
Copy link
Member Author

radeusgd commented Jan 22, 2025

New Extra tests run after the new changes:

Comment on lines 158 to +160
public static Column fromRepeatedItem(
String name, Value item, int repeat, ProblemAggregator problemAggregator) {
if (repeat < 0) {
throw new IllegalArgumentException("Repeat count must be non-negative.");
}

Object converted = Polyglot_Utils.convertPolyglotValue(item);

if (converted == null) {
var builder = Builder.getForType(AnyObjectType.INSTANCE, repeat, problemAggregator);
builder.appendNulls(repeat);
return new Column(name, builder.seal());
}

StorageType storageType = StorageType.forBoxedItem(converted);
Builder builder = Builder.getForType(storageType, repeat, problemAggregator);
Context context = Context.getCurrent();
for (int i = 0; i < repeat; i++) {
builder.append(converted);
context.safepoint();
}

return new Column(name, builder.seal());
return new Column(name, Storage.fromRepeatedItem(item, repeat, problemAggregator));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes fine to do. Will rebase and cope :)

@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Jan 24, 2025
@mergify mergify bot merged commit bc32447 into develop Jan 24, 2025
42 of 44 checks passed
@mergify mergify bot deleted the wip/radeusgd/6281-value-type-null branch January 24, 2025 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce a Value_Type.Null.
4 participants