Kafka Connect HTTP Connector

Kafka Connect connector that enables Change Data Capture from JSON/HTTP APIs into Kafka.

This connector is for you if

You want to (live) replicate a dataset exposed through JSON/HTTP API
You want to do so efficiently
You want to capture only changes, not full snapshots
You want to do so via configuration, with no custom coding
You want to be able to extend the connector if it comes to that

Examples

Getting Started

If your Kafka Connect deployment is automated and packaged with Maven, you can unpack the artifact on Kafka Connect plugins folder.

<plugin>
    <artifactId>maven-dependency-plugin</artifactId>
    <execution>
        <id>copy-kafka-connect-plugins</id>
        <phase>prepare-package</phase>
        <goals>
            <goal>unpack</goal>
        </goals>
        <configuration>
            <outputDirectory>${project.build.directory}/docker-build/plugins</outputDirectory>
            <artifactItems>
                <artifactItem>
                    <groupId>com.github.castorm</groupId>
                    <artifactId>kafka-connect-http</artifactId>
                    <version>0.7.6</version>
                    <type>tar.gz</type>
                    <classifier>plugin</classifier>
                </artifactItem>
            </artifactItems>
        </configuration>
    </execution>
</plugin>

Otherwise, you'll have to do it manually by downloading the package from the Releases Page.

More details on how to Install Connectors.

Source Connector

com.github.castorm.kafka.connect.http.HttpSourceConnector

Extension points

The connector can be easily extended by implementing your own version of any of the components below.

These are better understood by looking at the source task implementation:

public List<SourceRecord> poll() throws InterruptedException {

    throttler.throttle(offset.getTimestamp().orElseGet(Instant::now));

    HttpRequest request = requestFactory.createRequest(offset);

    HttpResponse response = requestExecutor.execute(request);

    List<SourceRecord> records = responseParser.parse(response);

    return recordSorter.sort(records).stream()
            .filter(recordFilterFactory.create(offset))
            .collect(toList());
}

public void commitRecord(SourceRecord record) {
    offset = Offset.of(record.sourceOffset(), record.timestamp());
}

Timer: Throttling HttpRequests

Controls the rate at which HTTP requests are executed by informing the task, how long until the next execution is due.

http.timer
public interface Timer extends Configurable {

    Long getRemainingMillis();

    default void reset(Instant lastZero) {
        // Do nothing
    }
}
Type: Class

Default: com.github.castorm.kafka.connect.timer.AdaptableIntervalTimer

Available implementations:

com.github.castorm.kafka.connect.timer.FixedIntervalTimer

com.github.castorm.kafka.connect.timer.AdaptableIntervalTimer

Throttling HttpRequests with FixedIntervalThrottler

Throttles rate of requests based on a fixed interval.

http.timer.interval.millis

Interval in between requests

Type: Long

Default: 60000

Throttling HttpRequests with AdaptableIntervalThrottler

Throttles rate of requests based on a fixed interval. However, it has two modes of operation, with two different intervals:

Up to date: No new records, or they have been created since last poll
Catching up: There were new records in last poll but they were created long ago (longer than interval)

http.timer.interval.millis

Interval in between requests when up-to-date

Type: Long

Default: 60000

http.timer.catchup.interval.millis

Interval in between requests when catching up

Type: Long

Default: 30000

HttpRequestFactory: Creating a HttpRequest

The first thing our connector will need to do is creating a HttpRequest.

http.request.factory
public interface HttpRequestFactory extends Configurable {

    HttpRequest createRequest(Offset offset);
}
Type: Class

Default: com.github.castorm.kafka.connect.http.request.template.TemplateHttpRequestFactory

Available implementations:

com.github.castorm.kafka.connect.http.request.template.TemplateHttpRequestFactory

http.offset.initial

Initial offset, comma separated list of pairs.

Example: property1=value1, property2=value2

Type: String

Default: ""

Creating a HttpRequest with TemplateHttpRequestFactory

This HttpRequestFactory is based on template resolution.

http.request.method

Http method to use in the request.

Type: String

Default: GET

http.request.url

Http url to use in the request.

Required

Type: String

http.request.headers

Http headers to use in the request, , separated list of : separated pairs.

Example: Name: Value, Name2: Value2

Type: String

Default: ""

http.request.params

Http query parameters to use in the request, & separated list of = separated pairs.

Example: name=value & name2=value2

Type: String

Default: ""

http.request.body

Http body to use in the request.

Type: String

Default: ""

http.request.template.factory
public interface TemplateFactory {

    Template create(String template);
}

public interface Template {

    String apply(Offset offset);
}
Class responsible for creating the templates that will be used on every request.

Type: Class

Default: com.github.castorm.kafka.connect.http.request.template.freemarker.BackwardsCompatibleFreeMarkerTemplateFactory

Available implementations:

com.github.castorm.kafka.connect.http.request.template.freemarker.BackwardsCompatibleFreeMarkerTemplateFactory Implementation based on FreeMarker which accepts offset properties without offset namespace (Deprecated)

com.github.castorm.kafka.connect.http.request.template.freemarker.FreeMarkerTemplateFactory Implementation based on FreeMarker

com.github.castorm.kafka.connect.http.request.template.NoTemplateFactory

Creating a HttpRequest with FreeMarkerTemplateFactory

FreeMarker templates will have the following data model available:

offset
- key
- timestamp
- ... (custom offset properties)

Accessing any of the above withing a template can be achieved like this:

http.request.params=after=${offset.timestamp}

For a complete understanding of the features provided by FreeMarker, please, refer to the User Manual

HttpClient: Executing a HttpRequest

Once our HttpRequest is ready, we have to execute it to get some results out of it. That's the purpose of the HttpClient

http.client
public interface HttpClient extends Configurable {

    HttpResponse execute(HttpRequest request) throws IOException;
}
Type: Class

Default: com.github.castorm.kafka.connect.http.client.okhttp.OkHttpClient

Available implementations:

com.github.castorm.kafka.connect.http.client.okhttp.OkHttpClient

Executing a HttpRequest with OkHttpClient

Uses a OkHttp client.

http.client.connection.timeout.millis

Timeout for opening a connection

Type: Long

Default: 2000

http.client.read.timeout.millis

Timeout for reading a response

Type: Long

Default: 2000

http.client.connection.ttl.millis

Time to live for the connection

Type: Long

Default: 300000

http.client.max-idle

Maximum number of idle connections in the connection pool

Type: Integer

Default: 1

HttpAuthenticator: Authenticating a HttpRequest

When executing the request, authentication might be required. The HttpAuthenticator is responsible for resolving the authentication header to be included in the request HttpAuthenticator

http.auth
public interface HttpAuthenticator extends Configurable {

    Optional<String> getAuthorizationHeader();
}
Type: Class

Default: com.github.castorm.kafka.connect.http.auth.ConfigurableHttpAuthenticator

Available implementations:

com.github.castorm.kafka.connect.http.auth.ConfigurableHttpAuthenticator

com.github.castorm.kafka.connect.http.auth.NoneHttpAuthenticator

com.github.castorm.kafka.connect.http.auth.BasicHttpAuthenticator

Authenticating a HttpRequest with ConfigurableHttpAuthenticator

Allows selecting the athentication type via configuration property

http.auth.type

Type of authentication

Type: String

Default: None

Available options:

None

Basic

Authenticating a HttpRequest with BasicHttpAuthenticator

Allows selecting the athentication type via configuration property

http.auth.user

Type: String

Default: ``

http.auth.password

Type: String

Default: ``

HttpResponseParser: Parsing a HttpResponse

Once our HttpRequest has been executed, as a result we'll have to deal with a HttpResponse and translate it into the list of SourceRecords expected by Kafka Connect.

http.response.parser
public interface HttpResponseParser extends Configurable {

    List<SourceRecord> parse(HttpResponse response);
}
Type: Class

Default: com.github.castorm.kafka.connect.http.response.PolicyHttpResponseParser

Available implementations:

com.github.castorm.kafka.connect.http.response.PolicyHttpResponseParser

com.github.castorm.kafka.connect.http.response.KvHttpResponseParser

Parsing a HttpResponse with PolicyHttpResponseParser

Vets the HTTP response deciding whether the response should be processed, skipped or failed. This decision is delegated to a HttpResponsePolicy. When the decision is to process the response, this processing is delegated to a secondary HttpResponseParser.

HttpResponsePolicy: Vetting a HttpResponse

http.response.policy
public interface HttpResponsePolicy extends Configurable {

    HttpResponseOutcome resolve(HttpResponse response);

    enum HttpResponseOutcome {
        PROCESS, SKIP, FAIL
    }
}
Type: Class

Default: com.github.castorm.kafka.connect.http.response.StatusCodeHttpResponsePolicy

Available implementations:

com.github.castorm.kafka.connect.http.response.StatusCodeHttpResponsePolicy

http.response.policy.parser

Type: Class

Default: com.github.castorm.kafka.connect.http.response.KvHttpResponseParser

Available implementations:

com.github.castorm.kafka.connect.http.response.KvHttpResponseParser

Vetting a HttpResponse with StatusCodeHttpResponsePolicy

Does response vetting based on HTTP status codes in the response and the configuration below.

http.response.policy.codes.process

Comma separated list of code ranges that will result in the parser processing the response

Example: 200..205, 207..210

Type: String

Default: 200..299

http.response.policy.codes.skip

Comma separated list of code ranges that will result in the parser skipping the response

Example: 300..305, 307..310

Type: String

Default: 300..399

Parsing a HttpResponse with KvHttpResponseParser

Parses the HTTP response into a key-value SourceRecord. This process is decomposed in two steps:

Parsing the HttpResponse into a KvRecord
Mapping the KvRecord into a SourceRecord

http.response.record.parser
public interface KvRecordHttpResponseParser extends Configurable {

    List<KvRecord> parse(HttpResponse response);
}
Type: Class

Default: com.github.castorm.kafka.connect.http.response.jackson.JacksonKvRecordHttpResponseParser

Available implementations:

com.github.castorm.kafka.connect.http.response.jackson.JacksonKvRecordHttpResponseParser

http.response.record.mapper
public interface KvSourceRecordMapper extends Configurable {

    SourceRecord map(KvRecord record);
}
Type: Class

Default: com.github.castorm.kafka.connect.http.record.SchemedKvSourceRecordMapper

Available implementations:

com.github.castorm.kafka.connect.http.record.SchemedKvSourceRecordMapper Maps key to a Struct schema with a single property key, and value to a Struct schema with a single property value

com.github.castorm.kafka.connect.http.record.StringKvSourceRecordMapper Maps both key and value to a String schema

Parsing a HttpResponse with JacksonKvRecordHttpResponseParser

Uses Jackson to look for the records in the response.

http.response.list.pointer

JsonPointer to the property in the response body containing an array of records

Example: /items

Type: String

Default: /

http.response.record.pointer

JsonPointer to the individual record to be used as kafka record body. Useful when the object we are interested in is under a nested structure

Type: String

Default: /

http.response.record.key.pointer

JsonPointer to the comma separated list of properties that compound, uniquely identify the individual record to be used as key in kafka record key This is especially important on partitioned topics

Example: /id

Type: String

Default: ""

http.response.record.timestamp.pointer

JsonPointer to the timestamp of the individual record to be used as kafka record timestamp This is especially important to track progress, enable latency calculations, improved throttling and feedback to TemplateHttpRequestFactory

Type: String

Default: ""

http.response.record.timestamp.parser

Class responsible for converting the timestamp property captured above into a java.time.Instant.

Type: String

Default: com.github.castorm.kafka.connect.http.response.timestamp.EpochMillisOrDelegateTimestampParser

Available implementations:

com.github.castorm.kafka.connect.http.response.timestamp.EpochMillisTimestampParser Implementation that captures the timestamp as an epoch millis long

com.github.castorm.kafka.connect.http.response.timestamp.EpochMillisOrDelegateTimestampParser Implementation that tries to capture as epoch millis or delegates to another parser in case of failure

com.github.castorm.kafka.connect.http.response.timestamp.DateTimeFormatterTimestampParser Implementation based on based on a DateTimeFormatter

com.github.castorm.kafka.connect.http.response.timestamp.NattyTimestampParser Implementation based on Natty parser

http.response.record.timestamp.parser.pattern

When using DateTimeFormatterTimestampParser, a custom pattern can be specified

Type: String

Default: yyyy-MM-dd'T'HH:mm:ss[.SSS]X

http.response.record.timestamp.parser.zone

Timezone of the timestamp. Accepts ZoneId valid identifiers

Type: String

Default: UTC

http.response.record.offset.pointer

Comma separated list of key=/value pairs where the key is the name of the property in the offset, and the value is the JsonPointer to the value being used as offset for future requests This is the mechanism that enables sharing state in between HttpRequests. HttpRequestFactory implementations receive this Offset.

One of the roles of the offset, even if not required for preparing the next request, is helping in deduplication of already seen records, by providing a sense of progress, assuming consistent ordering. (e.g. even if the response returns some repeated results in between requests because they have the same timestamp, anything prior to the last seen offset will be ignored). see OffsetFilterFactory

Example: id=/itemId

Type: String

Default: ""

Mapping a KvRecord into SourceRecord with SimpleKvSourceRecordMapper

Once we have our KvRecord we have to translate it into what Kafka Connect is expecting: SourceRecords

Embeds the record properties into a common simple envelope to enable schema evolution. This envelope simply contains a key and a value properties with customizable field names.

Here is also where we'll tell Kafka Connect to what topic and on what partition do we want to send our record.

kafka.topic

Name of the topic where the record will be sent to

Required

Type: String

Default: ""

http.record.schema.key.property.name

Name of the key property in the key-value envelope

Type: String

Default: key

http.record.schema.value.property.name

Name of the value property in the key-value envelope

Type: String

Default: value

SourceRecordSorter: Sorting SourceRecords

Some Http resources not designed for CDC, return snapshots with most recent records first. In this cases de-duplication is especially important, as subsequent request are likely to produce similar results. The de-duplication mechanisms offered by this connector are order-dependent, as they are usually based on timestamps.

To enable de-duplication in cases like this, we can instruct the connector to assume a specific order direction, either ASC, DESC, or IMPLICIT, where implicit figures it out based on records' timestamps.

http.record.sorter
public interface SourceRecordSorter extends Configurable {

    List<SourceRecord> sort(List<SourceRecord> records);
}
Type: Class

Default: com.github.castorm.kafka.connect.http.record.OrderDirectionSourceRecordSorter

Available implementations:

com.github.castorm.kafka.connect.http.record.OrderDirectionSourceRecordSorter

http.response.list.order.direction

Order direction of the results in the response list.

Type: Enum { ASC, DESC, IMPLICIT }

Default: IMPLICIT

SourceRecordFilterFactory: Filtering out SourceRecord

There are cases when we'll be interested in filtering out certain records. One of these would be de-duplication.

http.record.filter.factory
public interface SourceRecordFilterFactory extends Configurable {

    Predicate<SourceRecord> create(Offset offset);
}
Type: Class

Default: com.github.castorm.kafka.connect.http.record.OffsetRecordFilterFactory

Available implementations:

com.github.castorm.kafka.connect.http.record.OffsetRecordFilterFactory

com.github.castorm.kafka.connect.http.record.OffsetTimestampRecordFilterFactory

com.github.castorm.kafka.connect.http.record.PassthroughRecordFilterFactory

Filtering out SourceRecord with OffsetTimestampRecordFilterFactory

De-duplicates based on Offset's timestamp, filtering out records with earlier or the same timestamp. Useful when timestamp is used to filter the HTTP resource, but the filter does not have full timestamp precision. Assumptions:

Records are ordered by timestamp
No two records can contain the same timestamp (to whatever precision the HTTP resource uses)

If the latter assumption cannot be satisfied, check OffsetRecordFilterFactory to try and prevents data loss.

Filtering out SourceRecord with OffsetRecordFilterFactory

De-duplicates based on Offset's timestamp, key and any other custom property present in the Offset, filtering out records with earlier timestamps, or when in the same timestamp, only those up to the last seen Offset properties. Useful when timestamp alone is not unique but together with some other Offset property is. Assumptions:

Records are ordered by timestamp
There is an Offset property that uniquely identify records (e.g. key)
There won't be new items preceding already seen ones

Development

Building

mvn package

Running the tests

mvn test

Releasing

Update CHANGELOG.md and README.md files.
Prepare release: mvn release:clean release:prepare

Contributing

Contributions are welcome via pull requests, pending definition of code of conduct.

Versioning

We use SemVer for versioning.

Authors

Cástor Rodríguez - Only contributor so far - castorm

License

This project is licensed under the Apache 2.0 License - see the LICENSE.txt file for details

Built With

Maven - Dependency Management
Kafka Connect - The framework for our connectors
OkHttp - HTTP Client
Jackson - Json deserialization
FreeMarker - Template engine
Natty - Date parser

Acknowledgments

Inspired by llofberg/kafka-connect-rest

Name		Name	Last commit message	Last commit date
Latest commit History 301 Commits
.github		.github
config		config
docs		docs
examples		examples
kafka-connect-http-test		kafka-connect-http-test
kafka-connect-http		kafka-connect-http
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.txt		LICENSE.txt
NOTICE.md		NOTICE.md
README.md		README.md
THIRD-PARTY.txt		THIRD-PARTY.txt
_config.yml		_config.yml
pom.xml		pom.xml

License

euranova/kafka-connect-http

Folders and files

Latest commit

History

Repository files navigation

Kafka Connect HTTP Connector

This connector is for you if

Examples

Getting Started

Source Connector

Extension points

Timer: Throttling HttpRequests

http.timer

Throttling HttpRequests with FixedIntervalThrottler

http.timer.interval.millis

Throttling HttpRequests with AdaptableIntervalThrottler

http.timer.interval.millis

http.timer.catchup.interval.millis

HttpRequestFactory: Creating a HttpRequest

http.request.factory

http.offset.initial

Creating a HttpRequest with TemplateHttpRequestFactory

http.request.method

http.request.url

http.request.headers

http.request.params

http.request.body

http.request.template.factory

Creating a HttpRequest with FreeMarkerTemplateFactory

HttpClient: Executing a HttpRequest

http.client

Executing a HttpRequest with OkHttpClient

http.client.connection.timeout.millis

http.client.read.timeout.millis

http.client.connection.ttl.millis

http.client.max-idle

HttpAuthenticator: Authenticating a HttpRequest

http.auth

Authenticating a HttpRequest with ConfigurableHttpAuthenticator

http.auth.type

Authenticating a HttpRequest with BasicHttpAuthenticator

http.auth.user

http.auth.password

HttpResponseParser: Parsing a HttpResponse

http.response.parser

Parsing a HttpResponse with PolicyHttpResponseParser

HttpResponsePolicy: Vetting a HttpResponse

http.response.policy

http.response.policy.parser

Vetting a HttpResponse with StatusCodeHttpResponsePolicy

http.response.policy.codes.process

http.response.policy.codes.skip

Parsing a HttpResponse with KvHttpResponseParser

http.response.record.parser

http.response.record.mapper

Parsing a HttpResponse with JacksonKvRecordHttpResponseParser

http.response.list.pointer

http.response.record.pointer

http.response.record.key.pointer

http.response.record.timestamp.pointer

http.response.record.timestamp.parser

http.response.record.timestamp.parser.pattern

http.response.record.timestamp.parser.zone

http.response.record.offset.pointer

Mapping a KvRecord into SourceRecord with SimpleKvSourceRecordMapper

kafka.topic

http.record.schema.key.property.name

http.record.schema.value.property.name

SourceRecordSorter: Sorting SourceRecords

http.record.sorter

http.response.list.order.direction

SourceRecordFilterFactory: Filtering out SourceRecord

http.record.filter.factory

Filtering out SourceRecord with OffsetTimestampRecordFilterFactory

Filtering out SourceRecord with OffsetRecordFilterFactory

Development

Building

Running the tests

Releasing

`http.timer`

`http.timer.interval.millis`

`http.timer.interval.millis`

`http.timer.catchup.interval.millis`

`http.request.factory`

`http.offset.initial`

`http.request.method`

`http.request.url`

`http.request.headers`

`http.request.params`

`http.request.body`

`http.request.template.factory`

`http.client`

`http.client.connection.timeout.millis`

`http.client.read.timeout.millis`

`http.client.connection.ttl.millis`

`http.client.max-idle`

`http.auth`

`http.auth.type`

`http.auth.user`

`http.auth.password`

`http.response.parser`

`http.response.policy`

`http.response.policy.parser`

`http.response.policy.codes.process`

`http.response.policy.codes.skip`

`http.response.record.parser`

`http.response.record.mapper`

`http.response.list.pointer`

`http.response.record.pointer`

`http.response.record.key.pointer`

`http.response.record.timestamp.pointer`

`http.response.record.timestamp.parser`

`http.response.record.timestamp.parser.pattern`

`http.response.record.timestamp.parser.zone`

`http.response.record.offset.pointer`

`kafka.topic`

`http.record.schema.key.property.name`

`http.record.schema.value.property.name`

`http.record.sorter`

`http.response.list.order.direction`

`http.record.filter.factory`

Packages