-
Notifications
You must be signed in to change notification settings - Fork 114
[controller] Add support for configuring unclean leader election for realtime topics #2443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[controller] Add support for configuring unclean leader election for realtime topics #2443
Conversation
| .put(TopicConfig.MIN_IN_SYNC_REPLICAS_CONFIG, Integer.toString(minIsrConfig))); | ||
| pubSubTopicConfiguration.getUncleanLeaderElectionEnable() | ||
| .ifPresent( | ||
| uncleanLeaderElection -> topicProperties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting used here. unmarshallProperties is used in the createTopic method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only set if setting is present. This will make sure we use cluster configs by default.
|
Do we need to synchronize this configuration across real time topic and version topic? If not, it might be better clarify the decision in this PR or through code comments as part of the configuration.
|
|
TODO: This PR needs to add the ULE config to the store configuration so we may keep track of which store has this setting enabled/disabled. Currently, the config only shows the setting new RTs will adopt. |
13938a8 to
2f0d39c
Compare
Add uncleanLeaderElectionEnabledForRTTopics as a store-level config using the tri-state pattern (NOT_SPECIFIED/ENABLED/DISABLED). When NOT_SPECIFIED, falls back to the cluster-level config. This enables per-store tracking of ULE settings and preserves the setting during store migration. Changes: - New Avro schema versions (StoreMetaValue v41, AdminOperation v96) - Store interface/impl (Store, ZKStore, ReadOnlyStore, SystemStore, StoreInfo) - UpdateStoreQueryParams with migration constructor support - Controller logic (VeniceParentHelixAdmin, AdminExecutionTask, VeniceHelixAdmin) - RealTimeTopicSwitcher store-level override with cluster-level fallback - resolveUncleanLeaderElection helper for store-then-cluster resolution - Tests for override, fallback, and resolution logic Co-Authored-By: Claude Opus 4.6 <[email protected]>
Problem Statement
Venice currently doesn't provide a way to configure
unclean.leader.election.enablefor Kafka topics. For realtime topics, disabling unclean leader election is important to prevent data loss at the cost of availability. Operators need fine-grained control to disable unclean leader election specifically for RT topics while keeping the cluster default for version topics.Solution
Added new configuration
kafka.unclean.leader.election.enable.rt.topicsthat allows operators to control unclean leader election for realtime topics only:false, prevents data loss by ensuring only in-sync replicas can become leaders for RT topicsImplementation details:
uncleanLeaderElectionEnablefield toPubSubTopicConfigurationApacheKafkaAdminAdapterto marshall/unmarshall the Kafka topic config propertyTopicManagerto accept and propagate the config when creating topicsVeniceHelixAdminandRealTimeTopicSwitcherto read from cluster config and pass to RT topic creationTesting:
Code changes
kafka.unclean.leader.election.enable.rt.topics, default: uses Kafka cluster default (not explicitly set)Concurrency-Specific Checks
Both reviewer and PR author to verify
How was this PR tested?
PubSubTopicConfigurationTest.testUncleanLeaderElectionEnableConfiguration- tests field storage and retrievalApacheKafkaAdminAdapterTest.testUncleanLeaderElectionEnableConfig- tests marshalling/unmarshallingRealTimeTopicSwitcherTest.testUncleanLeaderElectionConfigForRTTopics- tests config propagationTopicManagerE2ETest.testUncleanLeaderElectionConfigForRealtimeTopic- end-to-end test verifying config is correctly applied to actual Kafka topicsRealTimeTopicSwitcherTest.testEnsurePreconditionsto handle new parameterOptional.empty()Does this PR introduce any user-facing or breaking changes?
This is a new optional configuration that defaults to not being set (uses cluster defaults). Existing behavior is unchanged unless operators explicitly configure it.