Skip to content

CCLOG-2401: Support for hierarchical ORC data and logical tpes. #651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 23, 2023

Conversation

snehashisp
Copy link
Member

Problem

Orc support in the current connector has several limitations.

  • Currently Support for orc is limited to non-hierarchical or flattened struct data. Nested structs are not properly supported.
  • Logical types (Date, Timestamp, Time, Decimal) are not correctly inferred to their corresponding hive types. Instead they are inferred as their base type (int32, int64 and bytes[]).
  • Arrays and Maps were using ArrayPrimitiveWritable and MapWritable respectively which failed as the OrcStruct requires native java List and Map types for for array and Map type data. The conversion process of arrays and maps also did not support structured subtypes and nested arrays.

Solution

Some of the problems were identified and a solution was provided in #636. This is an extension of the work done there.

  • OrcUtil is refactored and improved to support hierarchical Orc data.
  • Support for parsing logical types correctly is added in common. This will be pulled in here after merge. The required changes in OrcUtil for writing logical types are added.
  • Arrays and Maps are written correctly.
  • Unit Tests and Integration tests are added.
  • As part of the solution OrcFileReader needed to be updated to be able to recover the full hierarchical connect schema.
Does this solution apply anywhere else?
  • yes
  • no
If yes, where?

Test Strategy

Testing done:
  • Unit tests
  • Integration tests
  • System tests
  • Manual tests

Release Plan

@snehashisp snehashisp requested a review from a team as a code owner February 6, 2023 08:12
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ snehashisp
❌ jsims-slower
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Member

@sudeshwasnik sudeshwasnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks @snehashisp !

@snehashisp snehashisp merged commit 00f47be into master Feb 23, 2023
@snehashisp snehashisp deleted the log-types branch February 23, 2023 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants