Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server][common][vpj] Introduce ComplexVenicePartitioner to materialized view #1509

Merged
merged 5 commits into from
Feb 22, 2025

Conversation

xunyin8
Copy link
Contributor

@xunyin8 xunyin8 commented Feb 7, 2025

[server][common][vpj] Introduce ComplexVenicePartitioner to materialized view

The change will not work if record is actually large and chunked. Proper chunking support is needed and will be addressed in a separate PR.

  1. Introduced ComplexVenicePartitioner which extends VenicePartitioner and offer a new API to partition by value and provide possible one-to-many partition mapping.

  2. Added value provider of type Lazy to VeniceViewWriter's processRecord API to access deserialized value if needed. e.g. when a ComplexVenicePartitioner is involved.

  3. MergeConflictResultWrapper and WriteComputeResultWrapper will now provide deserialized value in a best effort manner. This is useful when we already deserialized the value for a partial update operation so that the deserialized value can be provided directly to the materialized view writer.

  4. Refactored VeniceWriter to expose some APIs to child class. Introduced ComplexVeniceWriter which extends VeniceWriter. Reasoning here is that the ComplexVeniceWriter will have different APIs to be used in MaterializedViewWriter and CompositeVeniceWriter to write to materialized view partition(s) and potentially involving a ComplexVenicePartitioner. Alternatively we could push common logic from VeniceWriter to AbstractVeniceWriter. However, ComplexVeniceWriter needs/shares too much common logic with VeniceWriter (chunking, DIV support, pubSubAdapter, etc...) it will make AbstractVeniceWriter too specialized and unable to offer the flexibility it needs to support something like the CompositeVeniceWriter.

  5. Override putLargeValue in ComplexVeniceWriter to skip chunking and writing large messages. Once we have proper chunking support we need to be careful to not re-chunk when writing the same value to different partition in ComplexVeniceWriter.

How was this PR tested?

Added new integration test with A/A, W/C and a new test value based partitioner.
Will add new unit tests once we have some consensus on the API changes

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.

Copy link
Contributor

@FelixGV FelixGV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some early thoughts... did not read the whole PR yet. But hopefully useful in terms of discussing the API changes.

@xunyin8 xunyin8 force-pushed the value-based-partitioner branch from 0d80a11 to fa6c001 Compare February 13, 2025 07:18
@xunyin8 xunyin8 changed the title [server][common][vpj] Introduce VeniceComplexPartitioner to materialized view [server][common][vpj] Introduce ComplexVenicePartitioner to materialized view Feb 13, 2025
@xunyin8 xunyin8 force-pushed the value-based-partitioner branch from fa6c001 to fee9bc7 Compare February 13, 2025 07:30
@xunyin8 xunyin8 force-pushed the value-based-partitioner branch from fee9bc7 to ca52bc5 Compare February 18, 2025 07:23
Copy link
Contributor

@gaojieliu gaojieliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change looks good overall.
I do think we need to take care of the comment I just left, which is very tricky as it is a race condition.

@xunyin8 xunyin8 force-pushed the value-based-partitioner branch 3 times, most recently from d17e50b to 942ee52 Compare February 21, 2025 07:21
Copy link
Contributor

@gaojieliu gaojieliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to actively lookup the previous value when there is a complex view writer:
https://github.com/linkedin/venice/blob/main/clients/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/ActiveActiveStoreIngestionTask.java#L458

  if (hasChangeCaptureView) {
      /**
       * Since this function will update the transient cache before writing the view, and if there is
       * a change capture view writer, we need to lookup first, otherwise the transient cache will be populated
       * when writing to the view after this function.
       */
      oldValueProvider.get();
    }

I guess the condition can be delete op + complex partitioner.

…zed view

The change will not work if record is actually large and chunked. Proper chunking
support is needed and will be addressed in a separate PR.

1. Introduced VeniceComplexPartitioner which extends VenicePartitioner and offer
a new API to partition by value and provide possible one-to-many partition mapping.

2. Added value provider of type Lazy<GenericRecord> to VeniceViewWriter's processRecord
API to access deserialized value if needed. e.g. when a VeniceComplexPartitioner is
involved.

3. MergeConflictResult will now provide deserialized value in a best effort manner.
This is useful when we already deserialized the value for a partial update operation
so that the deserialized value can be provided directly to the materialized view writer.

4. Refactored VeniceWriter to expose an API to write to desired partition with new
DIV. This is only used by the new method writeWithComplexPartitioner for now to handle
the partitioning and writes of the same value to mulitple partitions. However, this newly
exposed API should also come handy when we build proper chunking support to forward chunks
to predetermined view topic partitions.

5. writeWithComplexPartitioner in VeniceWriter will re-chunk when writing to each partition.
This should be optimized when we build proper chunking support.
@xunyin8 xunyin8 force-pushed the value-based-partitioner branch from 942ee52 to 23eaeb8 Compare February 21, 2025 21:08
Copy link
Contributor

@gaojieliu gaojieliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change!

@xunyin8 xunyin8 merged commit 2f3a731 into linkedin:main Feb 22, 2025
58 checks passed
@xunyin8 xunyin8 deleted the value-based-partitioner branch February 22, 2025 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants