Use full type name when checking max schema key #1373

pondzix · 2025-01-21T16:15:29Z

Scenario:

Input batch contains data using schema, let’s say link_click
link_click schema is used as a context AND as an entity/self describing event
We have multiple versions of link_click schema

When translated to the content of shredding_complete JSON message, it would contain Iines like this:

      "types": [
        {
          "schemaKey": "iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-0",
          "snowplowEntity": "SELF_DESCRIBING_EVENT"
        },
        ....
        {
          "schemaKey": "iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-1",
          "snowplowEntity": "CONTEXT"
        }
      ]

For such scenario it looks like we skip necessary warehouse migration for self describing column. We only execute migration for the context:

INFO Migration: Migrating contexts_com_snowplowanalytics_snowplow_link_click_1 AddColumn(Fragment("ALTER TABLE atomic.events ADD COLUMN contexts_com_snowplowanalytics_snowplow_link_click_1 ARRAY"),List()) (pre-transaction)

but never for unstruct_event_com_snowplowanalytics_snowplow_link_click_1, which results in an error when inserting data to the table:

ERROR Error executing transaction. Sleeping for 30 seconds for the first time
net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error: error line 1 at position 1,779
invalid identifier 'UNSTRUCT_EVENT_COM_SNOWPLOWANALYTICS_SNOWPLOW_LINK_CLICK_1'

It seems to be caused by this line where we group incoming types by the name and find max schema key per group. But name doesn’t contain unstruct/context prefix, so it’s possible to “lose” type when schema is used as both context/unstruct + with older version.

I think the solution here could be using full type name instead of simple short name. Full name takes into account if it’s context/unstruct, still groups types, but with additional prefix.

Scenario: * Input batch contains data using schema, let’s say link_click * link_click schema is used as a context AND as an entity/self describing event * We have multiple versions of link_click schema When translated to the content of shredding_complete JSON message, it would contain Iines like this: ``` "types": [ { "schemaKey": "iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-0", "snowplowEntity": "SELF_DESCRIBING_EVENT" }, .... { "schemaKey": "iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-1", "snowplowEntity": "CONTEXT" } ] ``` For such scenario it looks like we skip necessary warehouse migration for self describing column. We only execute migration for the context: ``` INFO Migration: Migrating contexts_com_snowplowanalytics_snowplow_link_click_1 AddColumn(Fragment("ALTER TABLE atomic.events ADD COLUMN contexts_com_snowplowanalytics_snowplow_link_click_1 ARRAY"),List()) (pre-transaction) ``` but never for unstruct_event_com_snowplowanalytics_snowplow_link_click_1, which results in an error when inserting data to the table: ``` ERROR Error executing transaction. Sleeping for 30 seconds for the first time net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error: error line 1 at position 1,779 invalid identifier 'UNSTRUCT_EVENT_COM_SNOWPLOWANALYTICS_SNOWPLOW_LINK_CLICK_1' ``` It seems to be caused by this [line](https://github.com/snowplow/snowplow-rdb-loader/blob/fffcbe460d7960714116aa7fe606a5ffbb4fd31d/modules/loader/src/main/scala/com/snowplowanalytics/snowplow/rdbloader/discovery/DataDiscovery.scala#L213) where we group incoming types by the name and find max schema key per group. But name doesn’t contain unstruct/context prefix, so it’s possible to “lose” type when schema is used as both context/unstruct + with older version. I think the solution here could be using full type name instead of simple short name. Full name takes into account if it’s context/unstruct, still groups types, but with additional prefix.

istreeter approved these changes Jan 22, 2025

View reviewed changes

pondzix force-pushed the fix/full_type_name branch from 030eabd to d95b3e3 Compare January 22, 2025 10:51

pondzix merged commit d95b3e3 into develop Jan 22, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use full type name when checking max schema key #1373

Use full type name when checking max schema key #1373

pondzix commented Jan 21, 2025 •

edited

Loading

Use full type name when checking max schema key #1373

Use full type name when checking max schema key #1373

Conversation

pondzix commented Jan 21, 2025 • edited Loading

pondzix commented Jan 21, 2025 •

edited

Loading