Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Support hierarchical data with optimised methods #17271

Open
rad-pat opened this issue Jan 13, 2025 · 6 comments
Open

Feature: Support hierarchical data with optimised methods #17271

rad-pat opened this issue Jan 13, 2025 · 6 comments
Assignees
Labels
C-feature Category: feature

Comments

@rad-pat
Copy link

rad-pat commented Jan 13, 2025

It would be great to have some optimised functions to process hierarchical data rather than having to build recursive CTEs to get information about trees of data - siblings/children etc.

This link provides information about one particular implementation, but it could perhaps be done differently https://help.sap.com/docs/SAP_HANA_PLATFORM/4f9859d273254e04af6ab3e9ea3af286/9aa2f6d912f24dc5baca772bd40745fc.html?locale=en-US&version=2.0.05

@rad-pat rad-pat added the C-feature Category: feature label Jan 13, 2025
@inviscid
Copy link

There does appear to be some discussion about handing nested/hierarchical data in Arrow and Parquet here: https://arrow.apache.org/blog/2022/10/08/arrow-parquet-encoding-part-2/

This would be a very nice feature to have since most "business" data includes some form of organizational layout like accounts, cost centers, departments, products, etc... Often times this data is jagged which makes using a predetermined number of table columns problematic.

The ability to quickly find ancestors, descendants, siblings, leaves, etc... without having to run recursive CTEs would be a very welcome capability.

@rad-pat
Copy link
Author

rad-pat commented Jan 14, 2025

There does appear to be some discussion about handing nested/hierarchical data in Arrow and Parquet here: https://arrow.apache.org/blog/2022/10/08/arrow-parquet-encoding-part-2/

I think this is referring to hierarchical data like JSON structure which I believe is a VARIANT column in Databend, not actually building a tree structure from parent-child or level relationships within a table of data.

To have the ability to create a TREE structure from a table and to then use the speed of RUST to get answers to the ancestors, descendants, siblings, leaves questions would be great. Perhaps the usage could be a little similar to a CTE.

e.g. sudo code

with TREE(table, parent_col, child_col) as t
select * 
from data_table d
where d.tree_node_col in siblings(t, node1, node2);

@b41sh
Copy link
Member

b41sh commented Jan 15, 2025

It looks a bit similar to the json_path function, we can consider supporting these functions, it should be useful.

@b41sh b41sh self-assigned this Jan 15, 2025
@sundy-li
Copy link
Member

SAP supports CREATE COLUMN TABLE

@rad-pat
Copy link
Author

rad-pat commented Jan 15, 2025

It looks a bit similar to the json_path function, we can consider supporting these functions, it should be useful.

I wonder if I can leverage json_path functions for what we need in the interim, are those functions fully implemented @b41sh ?

@rad-pat
Copy link
Author

rad-pat commented Jan 17, 2025

I wonder if I can leverage json_path functions for what we need in the interim, are those functions fully implemented @b41sh ?

For now it seems that I cannot get the answers I desire using json_path because a recursive descent operator is not yet available. I raised an issue on that in the JSONB repo, I hope that is the correct place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category: feature
Projects
None yet
Development

No branches or pull requests

4 participants