Skip to content

Commit 86f796a

Browse files
authored
[FLINK-35808] Let ConsumerConfig.(KEY|VALUE)_DESERIALIZER_CLASS_CONFIG be overridable by user in KafkaSourceBuilder (#108)
## What is the purpose of the change Let `ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG` and `ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG` be overridable by user in `KafkaSourceBuilder`, in order to enable the Large Message use-case discussed in this [mailing list discussion](https://lists.apache.org/thread/spl88o63sjm2dv4l5no0ym632d2yt2o6). This allows users to easily implement the [`claim check` large message pattern](https://developer.confluent.io/patterns/event-processing/claim-check/) without bringing any concerns into the Flink codebase otherwise, by specifying a `value.deserializer` that handles it, but otherwise passes through the bytes. Note: [overriding `value.serializer` is already supported on the Producer side. ](https://github.com/apache/flink-connector-kafka/blob/15d3fbd4e65dae6c334e2386dd337d2bf423c216/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/sink/KafkaSinkBuilder.java#L82-L83) Other Reading: https://www.kai-waehner.de/blog/2020/08/07/apache-kafka-handling-large-messages-and-files-for-image-video-audio-processing/ https://www.conduktor.io/kafka/how-to-send-large-messages-in-apache-kafka/#Option-1:-using-an-external-store-(GB-size-messages)-0 ## Brief change log - Updates key and value deserializers to be overridable by users in `KafkaSourceBuilder` ## Verifying this change - [x] Test that both key and value deserializers can be overridden - [x] Tests to ensure that the user-supplied deserializer(s) returns bytes (byte[]) ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: yes - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable
1 parent 4429b78 commit 86f796a

File tree

8 files changed

+114
-17
lines changed

8 files changed

+114
-17
lines changed

docs/content.zh/docs/connectors/datastream/kafka.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -222,8 +222,6 @@ Kafka Source 支持流式和批式两种运行模式。默认情况下,KafkaSo
222222
Kafka consumer 的配置可以参考 [Apache Kafka 文档](http://kafka.apache.org/documentation/#consumerconfigs)
223223

224224
请注意,即使指定了以下配置项,构建器也会将其覆盖:
225-
- ```key.deserializer``` 始终设置为 ByteArrayDeserializer
226-
- ```value.deserializer``` 始终设置为 ByteArrayDeserializer
227225
- ```auto.offset.reset.strategy``` 被 OffsetsInitializer#getAutoOffsetResetStrategy() 覆盖
228226
- ```partition.discovery.interval.ms``` 会在批模式下被覆盖为 -1
229227

docs/content.zh/docs/connectors/table/upsert-kafka.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ of all available metadata fields.
136136
<td>
137137
该选项可以传递任意的 Kafka 参数。选项的后缀名必须匹配定义在 <a href="https://kafka.apache.org/documentation/#configuration">Kafka 参数文档</a>中的参数名。
138138
Flink 会自动移除 选项名中的 "properties." 前缀,并将转换后的键名以及值传入 KafkaClient。 例如,你可以通过 <code>'properties.allow.auto.create.topics' = 'false'</code>
139-
来禁止自动创建 topic。 但是,某些选项,例如<code>'key.deserializer'</code> 和 <code>'value.deserializer'</code> 是不允许通过该方式传递参数,因为 Flink 会重写这些参数的值。
139+
来禁止自动创建 topic。 但是,某些选项,例如<code>'auto.offset.reset'</code> 是不允许通过该方式传递参数,因为 Flink 会重写这些参数的值。
140140
</td>
141141
</tr>
142142
<tr>

docs/content/docs/connectors/datastream/kafka.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -235,8 +235,6 @@ for more details.
235235

236236
Please note that the following keys will be overridden by the builder even if
237237
it is configured:
238-
- ```key.deserializer``` is always set to ```ByteArrayDeserializer```
239-
- ```value.deserializer``` is always set to ```ByteArrayDeserializer```
240238
- ```auto.offset.reset.strategy``` is overridden by ```OffsetsInitializer#getAutoOffsetResetStrategy()```
241239
for the starting offsets
242240
- ```partition.discovery.interval.ms``` is overridden to -1 when

docs/content/docs/connectors/table/kafka.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ Connector Options
233233
<td style="word-wrap: break-word;">(none)</td>
234234
<td>String</td>
235235
<td>
236-
This can set and pass arbitrary Kafka configurations. Suffix names must match the configuration key defined in <a href="https://kafka.apache.org/documentation/#configuration">Kafka Configuration documentation</a>. Flink will remove the "properties." key prefix and pass the transformed key and values to the underlying KafkaClient. For example, you can disable automatic topic creation via <code>'properties.allow.auto.create.topics' = 'false'</code>. But there are some configurations that do not support to set, because Flink will override them, e.g. <code>'key.deserializer'</code> and <code>'value.deserializer'</code>.
236+
This can set and pass arbitrary Kafka configurations. Suffix names must match the configuration key defined in <a href="https://kafka.apache.org/documentation/#configuration">Kafka Configuration documentation</a>. Flink will remove the "properties." key prefix and pass the transformed key and values to the underlying KafkaClient. For example, you can disable automatic topic creation via <code>'properties.allow.auto.create.topics' = 'false'</code>. But there are some configurations that do not support to set, because Flink will override them, e.g. <code>'auto.offset.reset'</code>.
237237
</td>
238238
</tr>
239239
<tr>

docs/content/docs/connectors/table/upsert-kafka.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ Connector Options
144144
<td style="word-wrap: break-word;">(none)</td>
145145
<td>String</td>
146146
<td>
147-
This can set and pass arbitrary Kafka configurations. Suffix names must match the configuration key defined in <a href="https://kafka.apache.org/documentation/#configuration">Kafka Configuration documentation</a>. Flink will remove the "properties." key prefix and pass the transformed key and values to the underlying KafkaClient. For example, you can disable automatic topic creation via <code>'properties.allow.auto.create.topics' = 'false'</code>. But there are some configurations that do not support to set, because Flink will override them, e.g. <code>'key.deserializer'</code> and <code>'value.deserializer'</code>.
147+
This can set and pass arbitrary Kafka configurations. Suffix names must match the configuration key defined in <a href="https://kafka.apache.org/documentation/#configuration">Kafka Configuration documentation</a>. Flink will remove the "properties." key prefix and pass the transformed key and values to the underlying KafkaClient. For example, you can disable automatic topic creation via <code>'properties.allow.auto.create.topics' = 'false'</code>. But there are some configurations that do not support to set, because Flink will override them, e.g. <code>'auto.offset.reset'</code>.
148148
</td>
149149
</tr>
150150
<tr>

flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/KafkaSourceBuilder.java

Lines changed: 46 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,12 @@
3131
import org.apache.kafka.clients.consumer.ConsumerConfig;
3232
import org.apache.kafka.common.TopicPartition;
3333
import org.apache.kafka.common.serialization.ByteArrayDeserializer;
34+
import org.apache.kafka.common.serialization.Deserializer;
3435
import org.slf4j.Logger;
3536
import org.slf4j.LoggerFactory;
3637

38+
import java.lang.reflect.ParameterizedType;
39+
import java.lang.reflect.Type;
3740
import java.util.Arrays;
3841
import java.util.Collection;
3942
import java.util.List;
@@ -379,8 +382,6 @@ public KafkaSourceBuilder<OUT> setRackIdSupplier(SerializableSupplier<String> ra
379382
* created.
380383
*
381384
* <ul>
382-
* <li><code>key.deserializer</code> is always set to {@link ByteArrayDeserializer}.
383-
* <li><code>value.deserializer</code> is always set to {@link ByteArrayDeserializer}.
384385
* <li><code>auto.offset.reset.strategy</code> is overridden by {@link
385386
* OffsetsInitializer#getAutoOffsetResetStrategy()} for the starting offsets, which is by
386387
* default {@link OffsetsInitializer#earliest()}.
@@ -405,8 +406,6 @@ public KafkaSourceBuilder<OUT> setProperty(String key, String value) {
405406
* created.
406407
*
407408
* <ul>
408-
* <li><code>key.deserializer</code> is always set to {@link ByteArrayDeserializer}.
409-
* <li><code>value.deserializer</code> is always set to {@link ByteArrayDeserializer}.
410409
* <li><code>auto.offset.reset.strategy</code> is overridden by {@link
411410
* OffsetsInitializer#getAutoOffsetResetStrategy()} for the starting offsets, which is by
412411
* default {@link OffsetsInitializer#earliest()}.
@@ -457,11 +456,11 @@ private void parseAndSetRequiredProperties() {
457456
maybeOverride(
458457
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
459458
ByteArrayDeserializer.class.getName(),
460-
true);
459+
false);
461460
maybeOverride(
462461
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
463462
ByteArrayDeserializer.class.getName(),
464-
true);
463+
false);
465464
if (!props.containsKey(ConsumerConfig.GROUP_ID_CONFIG)) {
466465
LOG.warn(
467466
"Offset commit on checkpoint is disabled because {} is not specified",
@@ -534,6 +533,47 @@ private void sanityCheck() {
534533
if (stoppingOffsetsInitializer instanceof OffsetsInitializerValidator) {
535534
((OffsetsInitializerValidator) stoppingOffsetsInitializer).validate(props);
536535
}
536+
if (props.containsKey(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG)) {
537+
checkDeserializer(props.getProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG));
538+
}
539+
if (props.containsKey(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG)) {
540+
checkDeserializer(props.getProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG));
541+
}
542+
}
543+
544+
private void checkDeserializer(String deserializer) {
545+
try {
546+
Class<?> deserClass = Class.forName(deserializer);
547+
if (!Deserializer.class.isAssignableFrom(deserClass)) {
548+
throw new IllegalArgumentException(
549+
String.format(
550+
"Deserializer class %s is not a subclass of %s",
551+
deserializer, Deserializer.class.getName()));
552+
}
553+
554+
// Get the generic type information
555+
Type[] interfaces = deserClass.getGenericInterfaces();
556+
for (Type iface : interfaces) {
557+
if (iface instanceof ParameterizedType) {
558+
ParameterizedType parameterizedType = (ParameterizedType) iface;
559+
Type rawType = parameterizedType.getRawType();
560+
561+
// Check if it's Deserializer<byte[]>
562+
if (rawType == Deserializer.class) {
563+
Type[] typeArguments = parameterizedType.getActualTypeArguments();
564+
if (typeArguments.length != 1 || typeArguments[0] != byte[].class) {
565+
throw new IllegalArgumentException(
566+
String.format(
567+
"Deserializer class %s does not deserialize byte[]",
568+
deserializer));
569+
}
570+
}
571+
}
572+
}
573+
} catch (ClassNotFoundException e) {
574+
throw new IllegalArgumentException(
575+
String.format("Deserializer class %s not found", deserializer), e);
576+
}
537577
}
538578

539579
private boolean offsetCommitEnabledManually() {

flink-connector-kafka/src/test/java/org/apache/flink/connector/kafka/source/KafkaSourceBuilderTest.java

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,15 +27,20 @@
2727
import org.apache.kafka.clients.admin.AdminClient;
2828
import org.apache.kafka.clients.consumer.ConsumerConfig;
2929
import org.apache.kafka.common.TopicPartition;
30+
import org.apache.kafka.common.serialization.ByteArrayDeserializer;
3031
import org.apache.kafka.common.serialization.StringDeserializer;
3132
import org.junit.jupiter.api.Test;
3233
import org.junit.jupiter.api.extension.ExtendWith;
34+
import org.junit.jupiter.params.ParameterizedTest;
35+
import org.junit.jupiter.params.provider.Arguments;
36+
import org.junit.jupiter.params.provider.MethodSource;
3337

3438
import java.util.Collections;
3539
import java.util.HashMap;
3640
import java.util.Map;
3741
import java.util.Set;
3842
import java.util.regex.Pattern;
43+
import java.util.stream.Stream;
3944

4045
import static org.assertj.core.api.Assertions.assertThat;
4146
import static org.assertj.core.api.Assertions.assertThatThrownBy;
@@ -191,6 +196,27 @@ public void testSettingCustomKafkaSubscriber() {
191196
"Cannot use partitions for consumption because a ExampleCustomSubscriber is already set for consumption.");
192197
}
193198

199+
@ParameterizedTest
200+
@MethodSource("provideSettingCustomDeserializerTestParameters")
201+
public void testSettingCustomDeserializer(String propertyKey, String propertyValue) {
202+
final KafkaSource<String> kafkaSource =
203+
getBasicBuilder().setProperty(propertyKey, propertyValue).build();
204+
assertThat(
205+
kafkaSource
206+
.getConfiguration()
207+
.get(ConfigOptions.key(propertyKey).stringType().noDefaultValue()))
208+
.isEqualTo(propertyValue);
209+
}
210+
211+
@ParameterizedTest
212+
@MethodSource("provideInvalidCustomDeserializersTestParameters")
213+
public void testSettingInvalidCustomDeserializers(
214+
String propertyKey, String propertyValue, String expectedError) {
215+
assertThatThrownBy(() -> getBasicBuilder().setProperty(propertyKey, propertyValue).build())
216+
.isInstanceOf(IllegalArgumentException.class)
217+
.hasMessageContaining(expectedError);
218+
}
219+
194220
private KafkaSourceBuilder<String> getBasicBuilder() {
195221
return new KafkaSourceBuilder<String>()
196222
.setBootstrapServers("testServer")
@@ -206,4 +232,43 @@ public Set<TopicPartition> getSubscribedTopicPartitions(AdminClient adminClient)
206232
return Collections.singleton(new TopicPartition("topic", 0));
207233
}
208234
}
235+
236+
private static Stream<Arguments> provideSettingCustomDeserializerTestParameters() {
237+
return Stream.of(
238+
Arguments.of(
239+
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
240+
TestByteArrayDeserializer.class.getName()),
241+
Arguments.of(
242+
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
243+
TestByteArrayDeserializer.class.getName()));
244+
}
245+
246+
private static Stream<Arguments> provideInvalidCustomDeserializersTestParameters() {
247+
String deserOne = String.class.getName();
248+
String deserTwo = "NoneExistentClass";
249+
String deserThree = StringDeserializer.class.getName();
250+
return Stream.of(
251+
Arguments.of(
252+
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
253+
deserOne,
254+
String.format(
255+
"Deserializer class %s is not a subclass of org.apache.kafka.common.serialization.Deserializer",
256+
deserOne)),
257+
Arguments.of(
258+
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
259+
deserTwo,
260+
String.format("Deserializer class %s not found", deserTwo)),
261+
Arguments.of(
262+
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
263+
deserThree,
264+
String.format(
265+
"Deserializer class %s does not deserialize byte[]", deserThree)),
266+
Arguments.of(
267+
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
268+
deserThree,
269+
String.format(
270+
"Deserializer class %s does not deserialize byte[]", deserThree)));
271+
}
272+
273+
private class TestByteArrayDeserializer extends ByteArrayDeserializer {}
209274
}

flink-python/pyflink/datastream/connectors/kafka.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -629,8 +629,6 @@ def set_property(self, key: str, value: str) -> 'KafkaSourceBuilder':
629629
Note that the following keys will be overridden by the builder when the KafkaSource is
630630
created.
631631
632-
* ``key.deserializer`` is always set to ByteArrayDeserializer.
633-
* ``value.deserializer`` is always set to ByteArrayDeserializer.
634632
* ``auto.offset.reset.strategy`` is overridden by AutoOffsetResetStrategy returned by
635633
:class:`KafkaOffsetsInitializer` for the starting offsets, which is by default
636634
:meth:`KafkaOffsetsInitializer.earliest`.
@@ -652,8 +650,6 @@ def set_properties(self, props: Dict) -> 'KafkaSourceBuilder':
652650
Note that the following keys will be overridden by the builder when the KafkaSource is
653651
created.
654652
655-
* ``key.deserializer`` is always set to ByteArrayDeserializer.
656-
* ``value.deserializer`` is always set to ByteArrayDeserializer.
657653
* ``auto.offset.reset.strategy`` is overridden by AutoOffsetResetStrategy returned by
658654
:class:`KafkaOffsetsInitializer` for the starting offsets, which is by default
659655
:meth:`KafkaOffsetsInitializer.earliest`.

0 commit comments

Comments
 (0)