NDC Spec v0.2.0 support #666

BenoitRanque · 2025-01-01T20:22:54Z

What

This PR updates ndc-postgres to ndc spec v0.2.0
This includes a lot of changes to tests. These have been justified in individual commits.

How

BenoitRanque

Note: Failing tests: we expect failing tests related to the deprecation of the root column comparison.
These will be fixed in a separate PR, to be merged on this one before merging to main.

This has now been merged.

BenoitRanque · 2025-01-14T08:42:47Z

crates/configuration/src/version4/to_runtime_configuration.rs

                    },
                )
            })
            .collect(),
    )
 }

+/// Infer scalar type representation from scalar type name, if necessary. Defaults to JSON representation
+fn convert_or_infer_type_representation(


V0.2.0 requires type representation.

Type representation comes from introspection configuration, and may be absent.

So, if type representation is missing, we infer the type based on the name and fetch the corresponding type representation from the default introspection configuration.

BenoitRanque · 2025-01-16T22:27:14Z

crates/configuration/src/version3/mod.rs

                    },
                )
            })
            .collect(),
    )
 }

+/// Infer scalar type representation from scalar type name, if necessary. Defaults to JSON representation
+fn convert_or_infer_type_representation(


V0.2.0 requires type representation, but configuration did not require configuration to be present.

To maximize compatibility with older configuration versions, we infer missing type representations based on scalar type name. If missing, we default to JSON representation.

BenoitRanque · 2025-01-16T22:30:28Z

crates/configuration/src/version4/comparison.rs

@@ -29,22 +29,22 @@ impl ComparisonOperatorMapping {
            ComparisonOperatorMapping {
                operator_name: "<=".to_string(),
                exposed_name: "_lte".to_string(),
-                operator_kind: OperatorKind::Custom,
+                operator_kind: OperatorKind::LessThanOrEqual,


Default introspection configuration changed to tag lt(e),gt(e) operators.

This will only affect new configurations, so any deployments with existing configuration will see no change in behavior.

BenoitRanque · 2025-01-16T22:36:27Z

crates/connectors/ndc-postgres/src/schema/mod.rs

+                                function_name.as_str(),
+                                function_definition.return_type.as_str(),
+                            ) {
+                                ("sum", "float8" | "int8") => {


v0.2.0 adds standard aggregate functions. These have specific expectations, such as sum needing to return a scalar represented as either Float64 or Int64.

We check for specific aggregate functions returning matching data types, and mark applicable functions as such.

Non-compliant functions (eg. sum on interval types which are represented as strings) will be tagged as custom aggregate functions

All looks good - let's add these comments into the code where it makes sense.

Added a comment. Also changed up the code to fetch the referenced (possibly scalar) type, and match based on that representation.
If we fail to match we assume it's not a scalar, and the functions will be custom.

This should be better than matching on scalar type names.

BenoitRanque · 2025-01-16T22:37:23Z

crates/connectors/ndc-postgres/src/schema/mod.rs

    Ok(models::SchemaResponse {
        collections,
        procedures,
        functions: vec![],
        object_types,
        scalar_types,
+        capabilities: Some(models::CapabilitySchemaInfo {


Adding this is required, but also means we will see a change in returned schemas, even if configuration has not been changed.

This is fine, the entire schema types are going to change because of the ndc-models bump anyway.

BenoitRanque · 2025-01-16T23:45:37Z

crates/query-engine/translation/src/translation/query/filtering.rs

+                        field_path,
+                        scope,
+                    } => {
+                        let scoped_table = current_table_scope.scoped_table(scope)?;


Apply scope, if any, before traversing path

BenoitRanque · 2025-01-16T23:54:40Z

crates/query-engine/translation/src/translation/query/sorting.rs

+                                                args: vec![column],
+                                            }
+                                        }
+                                        OrderByAggregate::CountStar | OrderByAggregate::Count => {


Count Star and Count actually behave the same, where we only count left-hand rows that actually exists.

This is important, as left joins + count(*) will actually count all rows, even if there were no matching left-hand rows.

I believe those semantics are correct, but something to double check.

I've looked through here and I'm not sure, perhaps a question for @daniel-chambers ?

CountStar and Count(column) are not semantically the same. As mentioned in that doc, CountStar counts all rows, Count(column) counts all rows that have a non-null value in column. This is actually consistent with the usual behaviour COUNT(*) and COUNT(column) in SQL.

I'd need to see the SQL that this generates to know what's actually going on here... because this is counting across a join, you can't use count(*) as Benoit pointed out, but I'd like to know what we're actually counting then. I tried to follow the code and it exceeded my 10:30pm patience 😂. My gut says that in this situation a "count(*)" should be count("the join key column"), which may be what's going on here.

Regardless, it needs to satisfy the semantics I mentioned above.

The generated SQL for CountStar looks like this:

LEFT OUTER JOIN LATERAL ( SELECT COUNT("%1_ORDER_PART_Album"."count") AS "count" FROM ( SELECT 1 AS "count" FROM "public"."Album" AS "%1_ORDER_PART_Album" WHERE ( "%0_Artist"."ArtistId" = "%1_ORDER_PART_Album"."ArtistId" ) ) AS "%1_ORDER_PART_Album" ) AS "%2_ORDER_FOR_Artist" ON ('true')

There's a previous step that generates the inner SQL, and gives us a reference to the column we can count on, which is either a synthetic column with a value 1 for CountStar, or the column to count, or the column to aggregate on when aggregating with a custom function.

Which is to say, I'm pretty confident we are doing the right thing here

Do we have a test that confirms this works? If not, can we add one please?

The SQL I shared above is from a test so I think we're covered in terms of SQL generation working as intended

BenoitRanque · 2025-01-16T23:58:22Z

crates/query-engine/translation/src/translation/query/sorting.rs

 }
+
+enum OrderByAggregate {


We created a new enum for the various ordering aggregates

BenoitRanque · 2025-01-16T23:58:55Z

crates/query-engine/translation/src/translation/query/sorting.rs

@@ -703,10 +740,10 @@ fn translate_targets(
                                // Aggregates do not have a field path.
                                field_path: (&None).into(),
                                expression: sql::ast::Expression::Value(sql::ast::Value::Int4(1)),
-                                aggregate: Some(sql::ast::Function::Unknown("COUNT".to_string())),
+                                aggregate: Some(OrderByAggregate::CountStar),


We used our new ordering aggregate enum instead of a direct SQL AST function.

crates/tests/tests-common/src/request.rs

crates/tests/tests-common/src/router.rs

Cargo.toml

danieljharvey · 2025-01-21T09:35:34Z

Cargo.toml

@@ -53,15 +53,18 @@ nonempty = "0.10"
 percent-encoding = "2"
 prometheus = "0.13"
 ref-cast = "1"
-reqwest = "0.11"
+reqwest = "0.12"


Can we bump this separately before the PR goes in? Good not to mix up functional and non-functional changes.

No. ndc-test needs to be updated alongside ndc-models and ndc-sdk, and it uses reqwest 12, which is the reason for this change.

crates/configuration/src/version3/mod.rs

crates/query-engine/translation/tests/goldenfiles/aggregate_distinct_albums/request.json

crates/configuration/src/version3/mod.rs

crates/connectors/ndc-postgres/src/schema/mod.rs

danieljharvey · 2025-01-21T10:59:44Z

crates/query-engine/metadata/src/metadata/database.rs

@@ -30,7 +30,7 @@ pub struct ScalarType {
    pub description: Option<String>,
    pub aggregate_functions: BTreeMap<models::AggregateFunctionName, AggregateFunction>,
    pub comparison_operators: BTreeMap<models::ComparisonOperatorName, ComparisonOperator>,
-    pub type_representation: Option<TypeRepresentation>,
+    pub type_representation: TypeRepresentation,


I am starting to think we should just use ndc_models::TypeRepresentation here instead of this type which appears to be a complete copy? Feel bugs will lurk in subtle differences between the two, and I don't understand what the indirection buys us.

Agreed, I'll take this chance to do that. I didn't change it originally to keep changes to a minimum

This could be done in a separate PR to be fair, before or after this one.

danieljharvey · 2025-01-21T11:02:40Z

crates/query-engine/translation/src/translation/query/aggregates.rs

+                    // postgres SUM aggregate returns null if no input rows are provided
+                    // however, the ndc spec requires that SUM aggregates over no input rows return 0
+                    // we achieve this with COALESCE, falling back to 0 if the aggregate expression returns null
+                    if function.as_str() == "sum" {


👍 Thank you for the explanatory comments.

...uery-engine/translation/tests/goldenfiles/aggregate_limit_offset_order_by/configuration.json

danieljharvey · 2025-01-22T16:42:06Z

crates/query-engine/translation/tests/goldenfiles/nested_recursive_relationship/request.json

@@ -31,15 +32,15 @@
  "collection_relationships": {
    "ArtistAlbums": {
      "column_mapping": {
-        "ArtistId": "ArtistId"
+        "ArtistId": ["ArtistId"]


This column_mapping type change seems to be the main change to request files - is there anything else I should expect to see?

You'll also see any requests with a left-handed path be replaced with a exists expression, since left-handed path was deprecated.

danieljharvey · 2025-01-23T11:19:37Z

crates/query-engine/translation/src/translation/helpers.rs

-pub struct RootAndCurrentTables {
-    /// The root (top-most) table in the query.
-    pub root_table: TableSourceAndReference,
+pub struct TableScope {


Does this TableScope refactor still make sense in ndc-models 0.1.0? If so I would really like us to apply it in a separate PR before this one, as it's the source of a lot of changes I can see.

If we could split out the TableScope changes from the ndc-models changes then I think this will become a much clearer atomic change.

TableScope is for v0.2, as it enables supporting named scopes, which replace root column references. We can apply it in a subsequent PR, but some tests won't pass until then

danieljharvey · 2025-01-24T09:07:06Z

crates/query-engine/translation/src/translation/query/sorting.rs

@@ -97,11 +97,15 @@ struct Column(models::FieldName);
 /// An aggregate operation to select from a table used in an order by.
 #[derive(Debug)]
 enum Aggregate {
-    CountStarAggregate,
-    SingleColumnAggregate {
+    StarCount,


These new names are better, but again, this change is arbitrary and just more noise, this kind of thing should just be a tiny gardening PR.

That's fair, the suffix was removed to make the linter happy

danieljharvey · 2025-01-24T09:26:12Z

crates/query-engine/translation/tests/snapshots/tests__aggregate_limit_offset_order_by.snap

@@ -23,7 +24,7 @@ FROM
              COUNT("%2_Invoice"."InvoiceId") AS "InvoiceId_count",
              min("%2_Invoice"."Total") AS "Total__min",
              max("%2_Invoice"."Total") AS "Total__max",
-              sum("%2_Invoice"."Total") AS "Total__sum",
+              coalesce(sum("%2_Invoice"."Total"), 0) AS "Total__sum",


danieljharvey · 2025-01-24T09:29:08Z

crates/tests/tests-common/goldenfiles/select_where_in_column.json

@@ -19,11 +19,8 @@
      "operator": "_in",
      "value": {
        "type": "column",


What is going on with this change?

ComparisonValue::Column no longer uses ComparisonTarget to pick the column. Instead, the necessary column and pathing details are inlined onto the enum variant.

This got flattened

danieljharvey · 2025-01-31T11:17:26Z

crates/query-engine/translation/src/translation/query/filtering.rs

@@ -434,18 +445,17 @@ fn translate_comparison_pathelements(
 /// translate a comparison target.
 fn translate_comparison_target(
    env: &Env,
-    state: &mut State,
-    root_and_current_tables: &RootAndCurrentTables,
+    _state: &mut State,


Remove unused arguments pls.

danieljharvey · 2025-01-31T11:56:42Z

There's a failing e2e test here that seems to have an error returning relationships from a mutation, can we add a test for this here please? (and fix whatever comes up, obv)

danieljharvey · 2025-02-14T21:43:11Z

.github/workflows/e2e-tests.yaml

          trap 'just stop-everything' EXIT
-          RUST_LOG=DEBUG ./crates/postgres/static/run-postgres-tests.sh '.*' "$NDC_POSTGRES_BINARY" "$ENGINE_BINARY"
+          RUST_LOG=DEBUG ./crates/postgres_v0_2/static/run-postgres-tests.sh '.*' "$NDC_POSTGRES_BINARY" "$ENGINE_BINARY"


This now explicitly runs the new e2e tests that use ndc-models 0.2.0

Note we are pointing to a specific sdk revision We should tag a release and point to that

…n does not include a type representation, we infer one based on the scalar type name. We default to JSON representation if we don't recognize the scalar type. The mapping is pulled from the default introspection configuration. This should enable a smooth upgrade, but we may need to publish a new version of the configuration with a mechanism to guarantee type representations, later.

Note! This is a regression with regards to named scopes, which replace the previously supported RootTableColumn. There was technically no way to consume this api from the engine, so this is not a major issue, and will be addressed in an upcoming PR.

Type representations are no longer optional Schema Response now includes a reference to the scalar type to be used for count results. AggregateFunctionDefinition is now an enum, so we map based on function name. Note! We are currently lying by omission about the return types. Postgres aggregates will return NULL if aggregating over no rows, except COUNT. We should have a discussion about wether we want to change aggregate function definitions to reflect this behavior, whether all these scalars will be implicitly nullable, or whether we want to change the SQL using COALESCE to default to some value when no rows are present. Arguably, there's no proper MAX, MIN, or AVG default values. As for SUM, ndc-test expects all SUM return values to be either represented as 64 bit integers or 64 bit floats. Postgres has types like INTERVAL, which is represented as a string, and can be aggregated with SUM. We need to discuss whether any of the above needs to be revisited. We cannot represent intervals as float64 or int64.

…y tables in scope for an exists, instead of only root and current. (#674)  ### What  `ComparisonTarget::RootCollectionColumn` was removed, to be replaced by [named scopes](https://github.com/hasura/ndc-spec/blob/36855ff20dcbd7d129427794aee9746b895390af/rfcs/0015-named-scopes.md). This PR implements the replacement functionality.  ### How  This PR replaces RootAndCurrentTables, with TableScope, a struct that keeps track of the current table and any tables in scope for exists expression. See the accompanying review for details on the code itself.

…(e) operators

…nfer based on scalar type name

…ing whether we can mark aggregate functions

…ning block of mutations

…dels-020

BenoitRanque · 2025-03-17T21:01:09Z

Merged main into this branch

…dels-020

BenoitRanque commented Jan 1, 2025

View reviewed changes

danieljharvey requested a review from a team January 7, 2025 18:47

BenoitRanque commented Jan 14, 2025

View reviewed changes

BenoitRanque force-pushed the benoit/eng-362-update-ndc-postgres-to-ndc_models-020 branch 2 times, most recently from 4234509 to 829886f Compare January 16, 2025 22:11

BenoitRanque commented Jan 17, 2025

View reviewed changes