Skip to content

Commit 36ebed0

Browse files
committed
KAFKA-10021: Changed Kafka backing stores to use shared admin client to get end offsets and create topics (apache#9780)
The existing `Kafka*BackingStore` classes used by Connect all use `KafkaBasedLog`, which needs to frequently get the end offsets for the internal topic to know whether they are caught up. `KafkaBasedLog` uses its consumer to get the end offsets and to consume the records from the topic. However, the Connect internal topics are often written very infrequently. This means that when the `KafkaBasedLog` used in the `Kafka*BackingStore` classes is already caught up and its last consumer poll is waiting for new records to appear, the call to the consumer to fetch end offsets will block until the consumer returns after a new record is written (unlikely) or the consumer’s `fetch.max.wait.ms` setting (defaults to 500ms) ends and the consumer returns no more records. IOW, the call to `KafkaBasedLog.readToEnd()` may block for some period of time even though it’s already caught up to the end. Instead, we want the `KafkaBasedLog.readToEnd()` to always return quickly when the log is already caught up. The best way to do this is to have the `KafkaBackingStore` use the admin client (rather than the consumer) to fetch end offsets for the internal topic. The consumer and the admin API both use the same `ListOffset` broker API, so the functionality is ultimately the same but we don't have to block for any ongoing consumer activity. Each Connect distributed runtime includes three instances of the `Kafka*BackingStore` classes, which means we have three instances of `KafkaBasedLog`. We don't want three instances of the admin client, and should have all three instances of the `KafkaBasedLog` share a single admin client instance. In fact, each `Kafka*BackingStore` instance currently creates, uses and closes an admin client instance when it checks and initializes that store's internal topic. If we change `Kafka*BackingStores` to share one admin client instance, we can change that initialization logic to also reuse the supplied admin client instance. The final challenge is that `KafkaBasedLog` has been used by projects outside of Apache Kafka. While `KafkaBasedLog` is definitely not in the public API for Connect, we can make these changes in ways that are backward compatible: create new constructors and deprecate the old constructors. Connect can be changed to only use the new constructors, and this will give time for any downstream users to make changes. These changes are implemented as follows: 1. Add a `KafkaBasedLog` constructor to accept in its parameters a supplier from which it can get an admin instance, and deprecate the old constructor. We need a supplier rather than just passing an instance because `KafkaBasedLog` is instantiated before Connect starts up, so we need to create the admin instance only when needed. At the same time, we'll change the existing init function parameter from a no-arg function to accept an admin instance as an argument, allowing that init function to reuse the shared admin instance used by the `KafkaBasedLog`. Note: if no admin supplier is provided (in deprecated constructor that is no longer used in AK), the consumer is still used to get latest offsets. 2. Add to the `Kafka*BackingStore` classes a new constructor with the same parameters but with an admin supplier, and deprecate the old constructor. When the classes instantiate its `KafkaBasedLog` instance, it would pass the admin supplier and pass an init function that takes an admin instance. 3. Create a new `SharedTopicAdmin` that lazily creates the `TopicAdmin` (and underlying Admin client) when required, and closes the admin objects when the `SharedTopicAdmin` is closed. 4. Modify the existing `TopicAdmin` (used only in Connect) to encapsulate the logic of fetching end offsets using the admin client, simplifying the logic in `KafkaBasedLog` mentioned in #1 above. Doing this also makes it easier to test that logic. 5. Change `ConnectDistributed` to create a `SharedTopicAdmin` instance (that is `AutoCloseable`) before creating the `Kafka*BackingStore` instances, passing the `SharedTopicAdmin` (which is an admin supplier) to all three `Kafka*BackingStore objects`, and finally always closing the `SharedTopicAdmin` upon termination. (Shutdown of the worker occurs outside of the `ConnectDistributed` code, so modify `DistributedHerder` to take in its constructor additional `AutoCloseable` objects that should be closed when the herder is closed, and then modify `ConnectDistributed` to pass the `SharedTopicAdmin` as one of those `AutoCloseable` instances.) 6. Change `MirrorMaker` similarly to `ConnectDistributed`. 7. Change existing unit tests to no longer use deprecated constructors. 8. Add unit tests for new functionality. Author: Randall Hauch <[email protected]> Reviewer: Konstantine Karantasis <[email protected]>
1 parent c488a9b commit 36ebed0

File tree

14 files changed

+724
-75
lines changed

14 files changed

+724
-75
lines changed

connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorMaker.java

+13-4
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
import org.apache.kafka.connect.connector.policy.AllConnectorClientConfigOverridePolicy;
3737
import org.apache.kafka.connect.connector.policy.ConnectorClientConfigOverridePolicy;
3838

39+
import org.apache.kafka.connect.util.SharedTopicAdmin;
3940
import org.slf4j.Logger;
4041
import org.slf4j.LoggerFactory;
4142

@@ -233,20 +234,28 @@ private void addHerder(SourceAndTarget sourceAndTarget) {
233234
plugins.compareAndSwapWithDelegatingLoader();
234235
DistributedConfig distributedConfig = new DistributedConfig(workerProps);
235236
String kafkaClusterId = ConnectUtils.lookupKafkaClusterId(distributedConfig);
236-
KafkaOffsetBackingStore offsetBackingStore = new KafkaOffsetBackingStore();
237+
// Create the admin client to be shared by all backing stores for this herder
238+
Map<String, Object> adminProps = new HashMap<>(config.originals());
239+
ConnectUtils.addMetricsContextProperties(adminProps, distributedConfig, kafkaClusterId);
240+
SharedTopicAdmin sharedAdmin = new SharedTopicAdmin(adminProps);
241+
KafkaOffsetBackingStore offsetBackingStore = new KafkaOffsetBackingStore(sharedAdmin);
237242
offsetBackingStore.configure(distributedConfig);
238243
Worker worker = new Worker(workerId, time, plugins, distributedConfig, offsetBackingStore, CLIENT_CONFIG_OVERRIDE_POLICY);
239244
WorkerConfigTransformer configTransformer = worker.configTransformer();
240245
Converter internalValueConverter = worker.getInternalValueConverter();
241-
StatusBackingStore statusBackingStore = new KafkaStatusBackingStore(time, internalValueConverter);
246+
StatusBackingStore statusBackingStore = new KafkaStatusBackingStore(time, internalValueConverter, sharedAdmin);
242247
statusBackingStore.configure(distributedConfig);
243248
ConfigBackingStore configBackingStore = new KafkaConfigBackingStore(
244249
internalValueConverter,
245250
distributedConfig,
246-
configTransformer);
251+
configTransformer,
252+
sharedAdmin);
253+
// Pass the shared admin to the distributed herder as an additional AutoCloseable object that should be closed when the
254+
// herder is stopped. MirrorMaker has multiple herders, and having the herder own the close responsibility is much easier than
255+
// tracking the various shared admin objects in this class.
247256
Herder herder = new DistributedHerder(distributedConfig, time, worker,
248257
kafkaClusterId, statusBackingStore, configBackingStore,
249-
advertisedUrl, CLIENT_CONFIG_OVERRIDE_POLICY);
258+
advertisedUrl, CLIENT_CONFIG_OVERRIDE_POLICY, sharedAdmin);
250259
herders.put(sourceAndTarget, herder);
251260
}
252261

connect/runtime/src/main/java/org/apache/kafka/connect/cli/ConnectDistributed.java

+14-4
Original file line numberDiff line numberDiff line change
@@ -36,12 +36,14 @@
3636
import org.apache.kafka.connect.storage.KafkaStatusBackingStore;
3737
import org.apache.kafka.connect.storage.StatusBackingStore;
3838
import org.apache.kafka.connect.util.ConnectUtils;
39+
import org.apache.kafka.connect.util.SharedTopicAdmin;
3940
import org.slf4j.Logger;
4041
import org.slf4j.LoggerFactory;
4142

4243
import java.net.URI;
4344
import java.util.Arrays;
4445
import java.util.Collections;
46+
import java.util.HashMap;
4547
import java.util.Map;
4648

4749
/**
@@ -101,7 +103,12 @@ public Connect startConnect(Map<String, String> workerProps) {
101103
URI advertisedUrl = rest.advertisedUrl();
102104
String workerId = advertisedUrl.getHost() + ":" + advertisedUrl.getPort();
103105

104-
KafkaOffsetBackingStore offsetBackingStore = new KafkaOffsetBackingStore();
106+
// Create the admin client to be shared by all backing stores.
107+
Map<String, Object> adminProps = new HashMap<>(config.originals());
108+
ConnectUtils.addMetricsContextProperties(adminProps, config, kafkaClusterId);
109+
SharedTopicAdmin sharedAdmin = new SharedTopicAdmin(adminProps);
110+
111+
KafkaOffsetBackingStore offsetBackingStore = new KafkaOffsetBackingStore(sharedAdmin);
105112
offsetBackingStore.configure(config);
106113

107114
ConnectorClientConfigOverridePolicy connectorClientConfigOverridePolicy = plugins.newPlugin(
@@ -112,17 +119,20 @@ public Connect startConnect(Map<String, String> workerProps) {
112119
WorkerConfigTransformer configTransformer = worker.configTransformer();
113120

114121
Converter internalValueConverter = worker.getInternalValueConverter();
115-
StatusBackingStore statusBackingStore = new KafkaStatusBackingStore(time, internalValueConverter);
122+
StatusBackingStore statusBackingStore = new KafkaStatusBackingStore(time, internalValueConverter, sharedAdmin);
116123
statusBackingStore.configure(config);
117124

118125
ConfigBackingStore configBackingStore = new KafkaConfigBackingStore(
119126
internalValueConverter,
120127
config,
121-
configTransformer);
128+
configTransformer,
129+
sharedAdmin);
122130

131+
// Pass the shared admin to the distributed herder as an additional AutoCloseable object that should be closed when the
132+
// herder is stopped. This is easier than having to track and own the lifecycle ourselves.
123133
DistributedHerder herder = new DistributedHerder(config, time, worker,
124134
kafkaClusterId, statusBackingStore, configBackingStore,
125-
advertisedUrl.toString(), connectorClientConfigOverridePolicy);
135+
advertisedUrl.toString(), connectorClientConfigOverridePolicy, sharedAdmin);
126136

127137
final Connect connect = new Connect(herder, rest);
128138
log.info("Kafka Connect distributed worker initialization took {}ms", time.hiResClockMs() - initStart);

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java

+34-3
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
import org.apache.kafka.common.utils.LogContext;
2929
import org.apache.kafka.common.utils.ThreadUtils;
3030
import org.apache.kafka.common.utils.Time;
31+
import org.apache.kafka.common.utils.Utils;
3132
import org.apache.kafka.connect.connector.Connector;
3233
import org.apache.kafka.connect.connector.policy.ConnectorClientConfigOverridePolicy;
3334
import org.apache.kafka.connect.errors.AlreadyExistsException;
@@ -67,6 +68,7 @@
6768
import javax.crypto.SecretKey;
6869
import javax.ws.rs.core.Response;
6970
import java.util.ArrayList;
71+
import java.util.Arrays;
7072
import java.util.Collection;
7173
import java.util.Collections;
7274
import java.util.HashSet;
@@ -139,6 +141,7 @@ public class DistributedHerder extends AbstractHerder implements Runnable {
139141

140142
private final Time time;
141143
private final HerderMetrics herderMetrics;
144+
private final List<AutoCloseable> uponShutdown;
142145

143146
private final String workerGroupId;
144147
private final int workerSyncTimeoutMs;
@@ -186,16 +189,33 @@ public class DistributedHerder extends AbstractHerder implements Runnable {
186189

187190
private final DistributedConfig config;
188191

192+
/**
193+
* Create a herder that will form a Connect cluster with other {@link DistributedHerder} instances (in this or other JVMs)
194+
* that have the same group ID.
195+
*
196+
* @param config the configuration for the worker; may not be null
197+
* @param time the clock to use; may not be null
198+
* @param worker the {@link Worker} instance to use; may not be null
199+
* @param kafkaClusterId the identifier of the Kafka cluster to use for internal topics; may not be null
200+
* @param statusBackingStore the backing store for statuses; may not be null
201+
* @param configBackingStore the backing store for connector configurations; may not be null
202+
* @param restUrl the URL of this herder's REST API; may not be null
203+
* @param connectorClientConfigOverridePolicy the policy specifying the client configuration properties that may be overridden
204+
* in connector configurations; may not be null
205+
* @param uponShutdown any {@link AutoCloseable} objects that should be closed when this herder is {@link #stop() stopped},
206+
* after all services and resources owned by this herder are stopped
207+
*/
189208
public DistributedHerder(DistributedConfig config,
190209
Time time,
191210
Worker worker,
192211
String kafkaClusterId,
193212
StatusBackingStore statusBackingStore,
194213
ConfigBackingStore configBackingStore,
195214
String restUrl,
196-
ConnectorClientConfigOverridePolicy connectorClientConfigOverridePolicy) {
215+
ConnectorClientConfigOverridePolicy connectorClientConfigOverridePolicy,
216+
AutoCloseable... uponShutdown) {
197217
this(config, worker, worker.workerId(), kafkaClusterId, statusBackingStore, configBackingStore, null, restUrl, worker.metrics(),
198-
time, connectorClientConfigOverridePolicy);
218+
time, connectorClientConfigOverridePolicy, uponShutdown);
199219
configBackingStore.setUpdateListener(new ConfigUpdateListener());
200220
}
201221

@@ -210,7 +230,8 @@ public DistributedHerder(DistributedConfig config,
210230
String restUrl,
211231
ConnectMetrics metrics,
212232
Time time,
213-
ConnectorClientConfigOverridePolicy connectorClientConfigOverridePolicy) {
233+
ConnectorClientConfigOverridePolicy connectorClientConfigOverridePolicy,
234+
AutoCloseable... uponShutdown) {
214235
super(worker, workerId, kafkaClusterId, statusBackingStore, configBackingStore, connectorClientConfigOverridePolicy);
215236

216237
this.time = time;
@@ -224,6 +245,7 @@ public DistributedHerder(DistributedConfig config,
224245
this.keySignatureVerificationAlgorithms = config.getList(DistributedConfig.INTER_WORKER_VERIFICATION_ALGORITHMS_CONFIG);
225246
this.keyGenerator = config.getInternalRequestKeyGenerator();
226247
this.isTopicTrackingEnabled = config.getBoolean(TOPIC_TRACKING_ENABLE_CONFIG);
248+
this.uponShutdown = Arrays.asList(uponShutdown);
227249

228250
String clientIdConfig = config.getString(CommonClientConfigs.CLIENT_ID_CONFIG);
229251
String clientId = clientIdConfig.length() <= 0 ? "connect-" + CONNECT_CLIENT_ID_SEQUENCE.getAndIncrement() : clientIdConfig;
@@ -677,6 +699,15 @@ public void halt() {
677699
}
678700
}
679701

702+
@Override
703+
protected void stopServices() {
704+
try {
705+
super.stopServices();
706+
} finally {
707+
this.uponShutdown.forEach(closeable -> Utils.closeQuietly(closeable, closeable != null ? closeable.toString() : "<unknown>"));
708+
}
709+
}
710+
680711
@Override
681712
public void stop() {
682713
log.info("Herder stopping");

connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaConfigBackingStore.java

+21-17
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@
6262
import java.util.concurrent.ExecutionException;
6363
import java.util.concurrent.TimeUnit;
6464
import java.util.concurrent.TimeoutException;
65+
import java.util.function.Supplier;
6566

6667
/**
6768
* <p>
@@ -224,6 +225,7 @@ public static String COMMIT_TASKS_KEY(String connectorName) {
224225
// Connector and task configs: name or id -> config map
225226
private final Map<String, Map<String, String>> connectorConfigs = new HashMap<>();
226227
private final Map<ConnectorTaskId, Map<String, String>> taskConfigs = new HashMap<>();
228+
private final Supplier<TopicAdmin> topicAdminSupplier;
227229

228230
// Set of connectors where we saw a task commit with an incomplete set of task config updates, indicating the data
229231
// is in an inconsistent state and we cannot safely use them until they have been refreshed.
@@ -241,11 +243,17 @@ public static String COMMIT_TASKS_KEY(String connectorName) {
241243

242244
private final WorkerConfigTransformer configTransformer;
243245

246+
@Deprecated
244247
public KafkaConfigBackingStore(Converter converter, WorkerConfig config, WorkerConfigTransformer configTransformer) {
248+
this(converter, config, configTransformer, null);
249+
}
250+
251+
public KafkaConfigBackingStore(Converter converter, WorkerConfig config, WorkerConfigTransformer configTransformer, Supplier<TopicAdmin> adminSupplier) {
245252
this.lock = new Object();
246253
this.started = false;
247254
this.converter = converter;
248255
this.offset = -1;
256+
this.topicAdminSupplier = adminSupplier;
249257

250258
this.topic = config.getString(DistributedConfig.CONFIG_TOPIC_CONFIG);
251259
if (this.topic == null || this.topic.trim().length() == 0)
@@ -471,6 +479,7 @@ KafkaBasedLog<String, byte[]> setupAndCreateKafkaBasedLog(String topic, final Wo
471479

472480
Map<String, Object> adminProps = new HashMap<>(originals);
473481
ConnectUtils.addMetricsContextProperties(adminProps, config, clusterId);
482+
Supplier<TopicAdmin> adminSupplier = topicAdminSupplier != null ? topicAdminSupplier : () -> new TopicAdmin(adminProps);
474483
Map<String, Object> topicSettings = config instanceof DistributedConfig
475484
? ((DistributedConfig) config).configStorageTopicSettings()
476485
: Collections.emptyMap();
@@ -481,30 +490,25 @@ KafkaBasedLog<String, byte[]> setupAndCreateKafkaBasedLog(String topic, final Wo
481490
.replicationFactor(config.getShort(DistributedConfig.CONFIG_STORAGE_REPLICATION_FACTOR_CONFIG))
482491
.build();
483492

484-
return createKafkaBasedLog(topic, producerProps, consumerProps, new ConsumeCallback(), topicDescription, adminProps);
493+
return createKafkaBasedLog(topic, producerProps, consumerProps, new ConsumeCallback(), topicDescription, adminSupplier);
485494
}
486495

487496
private KafkaBasedLog<String, byte[]> createKafkaBasedLog(String topic, Map<String, Object> producerProps,
488497
Map<String, Object> consumerProps,
489498
Callback<ConsumerRecord<String, byte[]>> consumedCallback,
490-
final NewTopic topicDescription, final Map<String, Object> adminProps) {
491-
Runnable createTopics = new Runnable() {
492-
@Override
493-
public void run() {
494-
log.debug("Creating admin client to manage Connect internal config topic");
495-
try (TopicAdmin admin = new TopicAdmin(adminProps)) {
496-
// Create the topic if it doesn't exist
497-
Set<String> newTopics = admin.createTopics(topicDescription);
498-
if (!newTopics.contains(topic)) {
499-
// It already existed, so check that the topic cleanup policy is compact only and not delete
500-
log.debug("Using admin client to check cleanup policy of '{}' topic is '{}'", topic, TopicConfig.CLEANUP_POLICY_COMPACT);
501-
admin.verifyTopicCleanupPolicyOnlyCompact(topic,
502-
DistributedConfig.CONFIG_TOPIC_CONFIG, "connector configurations");
503-
}
504-
}
499+
final NewTopic topicDescription, Supplier<TopicAdmin> adminSupplier) {
500+
java.util.function.Consumer<TopicAdmin> createTopics = admin -> {
501+
log.debug("Creating admin client to manage Connect internal config topic");
502+
// Create the topic if it doesn't exist
503+
Set<String> newTopics = admin.createTopics(topicDescription);
504+
if (!newTopics.contains(topic)) {
505+
// It already existed, so check that the topic cleanup policy is compact only and not delete
506+
log.debug("Using admin client to check cleanup policy of '{}' topic is '{}'", topic, TopicConfig.CLEANUP_POLICY_COMPACT);
507+
admin.verifyTopicCleanupPolicyOnlyCompact(topic,
508+
DistributedConfig.CONFIG_TOPIC_CONFIG, "connector configurations");
505509
}
506510
};
507-
return new KafkaBasedLog<>(topic, producerProps, consumerProps, consumedCallback, Time.SYSTEM, createTopics);
511+
return new KafkaBasedLog<>(topic, producerProps, consumerProps, adminSupplier, consumedCallback, Time.SYSTEM, createTopics);
508512
}
509513

510514
@SuppressWarnings("unchecked")

0 commit comments

Comments
 (0)