Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving data links to a DB_Table #11371

Merged
merged 32 commits into from
Oct 24, 2024

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Oct 21, 2024

Pull Request Description

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.
  • If meaningful changes were made to logic or tests affecting Enso Cloud integration in the libraries,
    or the Snowflake database integration, a run of the Extra Tests has been scheduled.
    • If applicable, it is suggested to paste a link to a successful run of the Extra Tests.

Copy link
Contributor

@somebody1234 somebody1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving new schemas in datalinkSchema.json (code review, schema may want to be reviewed separately by libs team)

@radeusgd
Copy link
Member Author

This is how the datalinks look like in the GUI:

  • Saving a table directly stores it by-name:
t.save_as_data_link (Enso_File.home / "postgres-table.datalink")

image

  • Simple query without interpolations (this could also be typed in by the user creating the datalink inside of the GUI):
t.rename_columns ["Y"] . save_as_data_link (Enso_File.home / "postgres-simple-query.datalink")

image

  • More complex serialized queries - these are not meant to be typed in by the user, but we still need to display them somehow.
t.set (t.at "X" * 1000 + 45) "Z" . save_as_data_link (Enso_File.home / "postgres-query.datalink")

image

@radeusgd
Copy link
Member Author

I've added a warning if the query that is being saved refers to temporary tables - as after a session is closed those are destroyed and the datalink will most likely become invalid.

image
and after pressing the play button:
image

This also works if the query is more complex and comes from more tables:
image

(warning is also displayed if only some of the set of tables are temporary)

referred_temporary_tables = _find_referred_temporary_tables table.connection table.context
if referred_temporary_tables.is_nothing then result else
warning = Illegal_State.Error "The saved query seems to refer to tables "+referred_temporary_tables.to_text+" which are temporary. Such tables may cease to exist once the session is closed, so the saved data link will no longer be valid and will fail to open."
Warning.attach warning result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should just be an error. Datalinks are mainly used to store things for the long term over many sessions, so I assume there isn't really a use case for storing a datalink just during one sessions (but perhaps I'm wrong about that).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I wanted to mention it, but I forgot.

In the doc comment I noted:

   Note that this is a heuristic and it may potentially lead to false positives
   if aliasing table names exist across schemas. Supporting tables with clashing
   names across schemas is something that may need to be revised overall in the
   Database library.

Indeed I'm not 100% sure of our table mapping in cases where one has two tables with the same name across two accessible schemas. This heuristic is simple and it could lead to false positives. If we do more 'reading tables across schemas' it'd probably have to be updated.

That's why I wanted to make it only a warning. If the warning is a false positive, it can be dismissed and one can continue working.

But if it were an error, it could prevent users from creating a valid datalink.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, this is not ideal - ideally we'd fix the heuristic to make sure it always works - and then error would make sense.

But I'm not sure if I know how to fix this heuristic properly. The whole idea of warning on temporary tables is something I added as an addition to this PR. Trying to fix the heuristic to be 100% accurate is well out of scope of this PR as it would require much more work and will probably be backend-specific. This was making me reconsider if I should add this check at all, but I decided that a relatively accurate check as a warning is better than no check at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I can revert the check for now and create a ticket to create a more accurate implementation - as it can take more time to do it, so it would need to be scheduled.

@radeusgd radeusgd added CI: Keep up to date Automatically update this PR to the latest develop. and removed CI: Keep up to date Automatically update this PR to the latest develop. labels Oct 23, 2024
Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving but with some comments that are worth scanning over and implementing as you agree

@@ -300,7 +300,7 @@ Text.find self pattern:(Regex | Text)=".*" case_sensitivity=Case_Sensitivity.Sen
## This matches `aABbbbc` @ character 0 and `aBC` @ character 11
"aABbbbccccaaBCaaaa".find_all "a[ab]+c" Case_Sensitivity.Insensitive
Text.find_all : Text -> Case_Sensitivity -> Vector Match ! Regex_Syntax_Error | Illegal_Argument
Text.find_all self pattern=".*" case_sensitivity=Case_Sensitivity.Sensitive =
Text.find_all self pattern:Text=".*" case_sensitivity:Case_Sensitivity=..Sensitive =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Text.find_all self pattern:Text=".*" case_sensitivity:Case_Sensitivity=..Sensitive =
Text.find_all self pattern:(Regex | Text)=".*" case_sensitivity:Case_Sensitivity=..Sensitive =

otherwise we can't feed a regex object in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, I did not realize that Regex is allowed here.

This should have tripped some tests, but didn't. Shows that we should up our test coverage a bit.

@@ -394,7 +394,7 @@ Text.to_regex self case_insensitive=False = Regex.compile self case_insensitive
'azbzczdzezfzg'.split ['b', 'zez'] == ['az', 'zczd', 'fzg']
@delimiter make_delimiter_selector
Text.split : Text | Vector Text -> Case_Sensitivity -> Boolean -> Vector Text ! Illegal_Argument
Text.split self delimiter="," case_sensitivity=Case_Sensitivity.Sensitive use_regex=False =
Text.split self delimiter="," case_sensitivity:Case_Sensitivity=..Sensitive use_regex=False =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Text.split self delimiter="," case_sensitivity:Case_Sensitivity=..Sensitive use_regex=False =
Text.split self delimiter="," case_sensitivity:Case_Sensitivity=..Sensitive use_regex:Boolean=False =

We should get rid of use_regex and take Regex or Text.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an out of scope change for this PR IMHO. Maybe I shouldn't have started the type changes at all - but I just wanted to do the trivial ones and thought they could be done 'in the meantime'.

But this requires logical changes and changes to the tests, I'd prefer to do it separately.

Comment on lines +122 to +125
prepared_statement : SQL_Statement ->
SQL_Builder.from_fragments prepared_statement.fragments
raw_code : Text ->
SQL_Builder.code raw_code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do this as conversion on SQL_Builder?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer this to be explicit.

Comment on lines 2842 to 2845
Creates a Data Link that will act as a view into the query represented by
this table.
@on_existing_file (Existing_File_Behavior.widget include_backup=False include_append=False)
save_as_data_link self destination (on_existing_file:Existing_File_Behavior = Existing_File_Behavior.Error) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a type on destination (File?)

We should have arguments section in documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not add a type as currently it is only accepting only Enso_File but in the future we could allow saving datalinks to other destinations, so I wanted to keep it open ended.

I guess Writable_File should do the trick.

Or I should not be too open ended yet and just do : Enso_File. We can always expand the type later as that's a compatible change.

Comment on lines +42 to +46
word = case link_type of
DB_Data_Link_Type.Database -> "connection"
DB_Data_Link_Type.Table _ -> "table"
DB_Data_Link_Type.Query _ -> "query"
DB_Data_Link_Type.SQL_Statement _ -> "query"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method on DB_Data_Link?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what this method should be called, it's very context-dependent how this conversion is handled.

@radeusgd radeusgd force-pushed the wip/radeusgd/11295-save-table-as-datalink branch from 69994d4 to 034e490 Compare October 24, 2024 11:41
@radeusgd
Copy link
Member Author

radeusgd commented Oct 24, 2024

Snowflake run after code review updates:

@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Oct 24, 2024
@mergify mergify bot merged commit ca9df70 into develop Oct 24, 2024
39 checks passed
@mergify mergify bot deleted the wip/radeusgd/11295-save-table-as-datalink branch October 24, 2024 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Save a DB_Table as a data link
4 participants