You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parquet with different schemes fail in databricks loader (close#1085)
We have an issue where we read data from multiple parquet files with different schemas (optional column only exist in some of the files).
It generates the following exception in Databricks:
`com.databricks.backend.common.rpc.SparkDriverExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: [MISSING_COLUMN] Column 'unstruct_event_com_lego_3dcatalogue_like_product_1' does not exist. Did you mean one of the following?`
Recreating the issue in Databricks within a notebook and testing different options revealed we had to add the FORMAT_OPTIONS with mergeSchema to fix the issue.
Copy file name to clipboardExpand all lines: modules/databricks-loader/src/main/scala/com/snowplowanalytics/snowplow/loader/databricks/Databricks.scala
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -111,6 +111,7 @@ object Databricks {
111
111
SELECT $frSelectColumns from '$frPath' $frAuth
112
112
)
113
113
FILEFORMAT = PARQUET
114
+
FORMAT_OPTIONS('MERGESCHEMA' = 'TRUE')
114
115
COPY_OPTIONS('MERGESCHEMA' = 'TRUE')""";
115
116
case_: Statement.ShreddedCopy=>
116
117
thrownewIllegalStateException("Databricks Loader does not support migrations")
0 commit comments