Infer schema from gremlin/sparql query results for Neptune qpt #2543

VenkatasivareddyTR · 2025-01-22T12:36:52Z

Issue #, if available:

Description of changes:
Currently we are generating schema from aws glue for Neptune qpt as well
Neptune_QPT_Tests.xlsx
, this change will get schema from gremlin/sparql query results for Neptune qpt.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…ctor.

chngpe · 2025-01-22T19:42:02Z

...na-neptune/src/main/java/com/amazonaws/athena/connectors/neptune/NeptuneMetadataHandler.java

@@ -296,7 +291,7 @@ public GetTableResponse doGetQueryPassthroughSchema(BlockAllocator allocator, Ge
                Client client = neptuneConnection.getNeptuneClientConnection();
                GraphTraversalSource graphTraversalSource = neptuneConnection.getTraversalSource(client);
                String gremlinQuery = qptArguments.get(NeptuneQueryPassthrough.QUERY);
-                gremlinQuery = gremlinQuery.concat(".limit(1)");
+                gremlinQuery = gremlinQuery.concat(".limit(10)");


Is it possible that the first 10 row data is different than the rest of rows?

Example:
Can first 10 rows contains a different result set which has no column field call resource_type and 11~~20 rows has that? If yes, the schema inferred will drop the column? (and vice-versa, if no column in first 10 rows and data coming from 11~~20, when reading data will there be an issue?)

It is okay, if not supported but i want to document the behavior as documentation

Thanks team

Yes, we can update documentation about this limitation.

We determine the schema by examining the first 10 records. If the data exhibits schema variations in further rows, then we may not accurately capture the complete schema. It seems similar approach is followed in DynamoDB connector as well.

Trianz-Akshay and others added 2 commits January 17, 2025 18:10

neptune qpt schema change

c28f9d6

Get schema from gremlin/sparql query results for Neptune qpt and refa…

a53d82f

…ctor.

chngpe reviewed Jan 22, 2025

View reviewed changes

check not null condition for optional clause in rdf qpt query

a69c5d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infer schema from gremlin/sparql query results for Neptune qpt #2543

Infer schema from gremlin/sparql query results for Neptune qpt #2543

VenkatasivareddyTR commented Jan 22, 2025

chngpe Jan 22, 2025

VenkatasivareddyTR Jan 23, 2025

Infer schema from gremlin/sparql query results for Neptune qpt #2543

Are you sure you want to change the base?

Infer schema from gremlin/sparql query results for Neptune qpt #2543

Conversation

VenkatasivareddyTR commented Jan 22, 2025

chngpe Jan 22, 2025

Choose a reason for hiding this comment

VenkatasivareddyTR Jan 23, 2025

Choose a reason for hiding this comment