-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[server][common][vpj] Introduce ComplexVenicePartitioner to materialized view #1509
Conversation
...ient/src/main/java/com/linkedin/davinci/kafka/consumer/LeaderFollowerStoreIngestionTask.java
Outdated
Show resolved
Hide resolved
...ient/src/main/java/com/linkedin/davinci/kafka/consumer/LeaderFollowerStoreIngestionTask.java
Outdated
Show resolved
Hide resolved
...ts/da-vinci-client/src/main/java/com/linkedin/davinci/store/view/MaterializedViewWriter.java
Outdated
Show resolved
Hide resolved
...sh-job/src/main/java/com/linkedin/venice/hadoop/task/datawriter/AbstractPartitionWriter.java
Outdated
Show resolved
Hide resolved
...rc/main/java/com/linkedin/venice/hadoop/task/datawriter/ComplexPartitionerWriterAdapter.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/writer/VeniceWriter.java
Outdated
Show resolved
Hide resolved
...a-vinci-client/src/main/java/com/linkedin/davinci/replication/merge/MergeConflictResult.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some early thoughts... did not read the whole PR yet. But hopefully useful in terms of discussing the API changes.
...ce-client-common/src/main/java/com/linkedin/venice/partitioner/VeniceComplexPartitioner.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/writer/VeniceWriter.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/writer/VeniceWriter.java
Outdated
Show resolved
Hide resolved
0d80a11
to
fa6c001
Compare
fa6c001
to
fee9bc7
Compare
...ient/src/main/java/com/linkedin/davinci/kafka/consumer/LeaderFollowerStoreIngestionTask.java
Outdated
Show resolved
Hide resolved
...nci-client/src/main/java/com/linkedin/davinci/kafka/consumer/MergeConflictResultWrapper.java
Show resolved
Hide resolved
clients/da-vinci-client/src/main/java/com/linkedin/davinci/store/view/VeniceViewWriter.java
Show resolved
Hide resolved
...ient/src/main/java/com/linkedin/davinci/kafka/consumer/LeaderFollowerStoreIngestionTask.java
Outdated
Show resolved
Hide resolved
...nci-client/src/main/java/com/linkedin/davinci/kafka/consumer/MergeConflictResultWrapper.java
Outdated
Show resolved
Hide resolved
...nci-client/src/main/java/com/linkedin/davinci/kafka/consumer/MergeConflictResultWrapper.java
Outdated
Show resolved
Hide resolved
...ient/src/main/java/com/linkedin/davinci/kafka/consumer/LeaderFollowerStoreIngestionTask.java
Outdated
Show resolved
Hide resolved
...a-vinci-client/src/main/java/com/linkedin/davinci/replication/merge/MergeConflictResult.java
Outdated
Show resolved
Hide resolved
...rc/main/java/com/linkedin/venice/hadoop/task/datawriter/ComplexPartitionerWriterAdapter.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/writer/ComplexVeniceWriter.java
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/writer/ComplexVeniceWriter.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/writer/VeniceWriterFactory.java
Show resolved
Hide resolved
...rc/main/java/com/linkedin/venice/hadoop/task/datawriter/ComplexPartitionerWriterAdapter.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/writer/ComplexVeniceWriter.java
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/writer/ComplexVeniceWriter.java
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/writer/ComplexVeniceWriter.java
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/writer/CompositeVeniceWriter.java
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/writer/VeniceWriter.java
Show resolved
Hide resolved
fee9bc7
to
ca52bc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code change looks good overall.
I do think we need to take care of the comment I just left, which is very tricky as it is a race condition.
...ient/src/main/java/com/linkedin/davinci/kafka/consumer/LeaderFollowerStoreIngestionTask.java
Outdated
Show resolved
Hide resolved
d17e50b
to
942ee52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to actively lookup the previous value when there is a complex view writer:
https://github.com/linkedin/venice/blob/main/clients/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/ActiveActiveStoreIngestionTask.java#L458
if (hasChangeCaptureView) {
/**
* Since this function will update the transient cache before writing the view, and if there is
* a change capture view writer, we need to lookup first, otherwise the transient cache will be populated
* when writing to the view after this function.
*/
oldValueProvider.get();
}
I guess the condition can be delete
op + complex partitioner.
…zed view The change will not work if record is actually large and chunked. Proper chunking support is needed and will be addressed in a separate PR. 1. Introduced VeniceComplexPartitioner which extends VenicePartitioner and offer a new API to partition by value and provide possible one-to-many partition mapping. 2. Added value provider of type Lazy<GenericRecord> to VeniceViewWriter's processRecord API to access deserialized value if needed. e.g. when a VeniceComplexPartitioner is involved. 3. MergeConflictResult will now provide deserialized value in a best effort manner. This is useful when we already deserialized the value for a partial update operation so that the deserialized value can be provided directly to the materialized view writer. 4. Refactored VeniceWriter to expose an API to write to desired partition with new DIV. This is only used by the new method writeWithComplexPartitioner for now to handle the partitioning and writes of the same value to mulitple partitions. However, this newly exposed API should also come handy when we build proper chunking support to forward chunks to predetermined view topic partitions. 5. writeWithComplexPartitioner in VeniceWriter will re-chunk when writing to each partition. This should be optimized when we build proper chunking support.
942ee52
to
23eaeb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change!
[server][common][vpj] Introduce ComplexVenicePartitioner to materialized view
The change will not work if record is actually large and chunked. Proper chunking support is needed and will be addressed in a separate PR.
Introduced ComplexVenicePartitioner which extends VenicePartitioner and offer a new API to partition by value and provide possible one-to-many partition mapping.
Added value provider of type Lazy to VeniceViewWriter's processRecord API to access deserialized value if needed. e.g. when a ComplexVenicePartitioner is involved.
MergeConflictResultWrapper and WriteComputeResultWrapper will now provide deserialized value in a best effort manner. This is useful when we already deserialized the value for a partial update operation so that the deserialized value can be provided directly to the materialized view writer.
Refactored VeniceWriter to expose some APIs to child class. Introduced ComplexVeniceWriter which extends VeniceWriter. Reasoning here is that the ComplexVeniceWriter will have different APIs to be used in MaterializedViewWriter and CompositeVeniceWriter to write to materialized view partition(s) and potentially involving a ComplexVenicePartitioner. Alternatively we could push common logic from VeniceWriter to AbstractVeniceWriter. However, ComplexVeniceWriter needs/shares too much common logic with VeniceWriter (chunking, DIV support, pubSubAdapter, etc...) it will make AbstractVeniceWriter too specialized and unable to offer the flexibility it needs to support something like the CompositeVeniceWriter.
Override putLargeValue in ComplexVeniceWriter to skip chunking and writing large messages. Once we have proper chunking support we need to be careful to not re-chunk when writing the same value to different partition in ComplexVeniceWriter.
How was this PR tested?
Added new integration test with A/A, W/C and a new test value based partitioner.
Will add new unit tests once we have some consensus on the API changes
Does this PR introduce any user-facing changes?