Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer schema from gremlin/sparql query results for Neptune qpt #2543

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

VenkatasivareddyTR
Copy link
Contributor

Issue #, if available:

Description of changes:
Currently we are generating schema from aws glue for Neptune qpt as well
Neptune_QPT_Tests.xlsx
, this change will get schema from gremlin/sparql query results for Neptune qpt.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@@ -296,7 +291,7 @@ public GetTableResponse doGetQueryPassthroughSchema(BlockAllocator allocator, Ge
Client client = neptuneConnection.getNeptuneClientConnection();
GraphTraversalSource graphTraversalSource = neptuneConnection.getTraversalSource(client);
String gremlinQuery = qptArguments.get(NeptuneQueryPassthrough.QUERY);
gremlinQuery = gremlinQuery.concat(".limit(1)");
gremlinQuery = gremlinQuery.concat(".limit(10)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the first 10 row data is different than the rest of rows?

Example:
Can first 10 rows contains a different result set which has no column field call resource_type and 1120 rows has that? If yes, the schema inferred will drop the column? (and vice-versa, if no column in first 10 rows and data coming from 1120, when reading data will there be an issue?)

It is okay, if not supported but i want to document the behavior as documentation

Thanks team

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can update documentation about this limitation.

We determine the schema by examining the first 10 records. If the data exhibits schema variations in further rows, then we may not accurately capture the complete schema. It seems similar approach is followed in DynamoDB connector as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants