Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AmazonS3MoveCleanupPolicy doesn't cleanup files during KafkaConnect workers restart #671

Open
arnitolog opened this issue Sep 23, 2024 · 1 comment

Comments

@arnitolog
Copy link

Hello,
I have KafkaConnect with 2 workers and FilePulse connector with 4 tasks. During the rolling restart of the workers, connector stops cleaning up files from S3. At the same time once the restart is finished, new files are processed and cleaned up successfully.
All not-cleaned files have the status "COMMITTED" in the status topic.
here is my connector config:

    aws.s3.region: "us-east-1"
    aws.s3.bucket.name: "test-xxxxx-data-us-east-1"
    aws.s3.bucket.prefix: "emails/"

    topic: "test-xxxxx-data"
    fs.listing.class: io.streamthoughts.kafka.connect.filepulse.fs.AmazonS3FileSystemListing
    fs.listing.interval.ms: 5000

    fs.cleanup.policy.class: io.streamthoughts.kafka.connect.filepulse.fs.clean.AmazonS3MoveCleanupPolicy
    fs.cleanup.policy.move.success.aws.prefix.path: "success"
    fs.cleanup.policy.move.failure.aws.prefix.path: "failure"

    allow.tasks.reconfiguration.after.timeout.ms: 120000
    tasks.reader.class: io.streamthoughts.kafka.connect.filepulse.fs.reader.AmazonS3BytesArrayInputReader
    tasks.file.status.storage.bootstrap.servers: kafka-cluster-kafka-bootstrap:9093
    tasks.file.status.storage.topic: test-xxxxx-data-status-internal
    tasks.file.status.storage.topic.partitions: 10
    tasks.file.status.storage.topic.replication.factor: 3

There are no errors in the logs.
Just a thought, is it possible/makes sense to check file status by AmazonS3MoveCleanupPolicy retrospectively and if it has "COMMITTED" status clean it up?

@arnitolog arnitolog changed the title AmazonS3MoveCleanupPolicy doesn't cleanup after restart AmazonS3MoveCleanupPolicy doesn't cleanup files during KafkaConnect workers restart Sep 23, 2024
@fhussonnois
Copy link
Member

Hey @arnitolog , I know it's old question and I don't know if you succeed to find a solution for your problem but here are some details about the cleanup policy.

The cleanup process is handle by the connector thread. Files can be cleanup either on the COMPLETED (default) or COMMITTED status depending on the value of the fs.cleanup.policy.triggered.on property.

However, you may have encountered a bug. If the connector thread is killed or restarted while the file status changes from COMPLETED to COMMITTED, once restarted, the connector thread missed the state transition and did not clean up the file.

One solution is to configure the property fs.cleanup.policy.triggered.on=COMMITTED then doing a rolling-restart of Kafka Connect should trigger the cleanup of files stuck in COMMITTED state

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants