Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How could I persist Object Meta Data along with Object into S3 using the Connector #109

Open
FelixKJose opened this issue Aug 2, 2019 · 6 comments

Comments

@FelixKJose
Copy link

I have requirement that I have to persist the object metadata along with object. So later we could use that in Amazon Athena to do some queries and also avoid applications to pull only meta data instead of entire object. Is there any support in the connector to do persist the meta data (which AWS S3 SDK supports)?
I have seen great provisions to dynamically create S3 object Key by deriving from Object fields etc, but couldn't find a way to derive the meta data and persist that along with Object.

@abhisheksahani
Copy link

abhisheksahani commented Aug 2, 2019 via email

@FelixKJose
Copy link
Author

Thank you Abhishek.
But if we have a Web Application, that just needs the metaData instead of entire object, then that is not possible if I don't persist the metadata[ eg: created user, created date, company id etc]
AWS SDK supports
PutObjectRequest putObjectRequest = new PutObjectRequest(container, key, new ByteArrayInputStream(payload), objectMetaData);
amazonS3.putObject(putObjectRequest);

The provision s3 gives to retrieve just meta data using AmazonS3.getObjectMetadata(bucket, key).

@FelixKJose
Copy link
Author

Can someone please give me an answer for this?

@OneCricketeer
Copy link

OneCricketeer commented Sep 17, 2019

If your question is about the S3 connector, that repo is here - https://github.com/confluentinc/kafka-connect-storage-cloud

It's not clear what metadata you would expect a Kafka connector to add other than what it generically knows about (topic name, partition, and offset)

Seems the only metadata that is added, though, is the SSE Algorithm -https://github.com/confluentinc/kafka-connect-storage-cloud/blob/master/kafka-connect-s3/src/main/java/io/confluent/connect/s3/storage/S3OutputStream.java#L180-L193

@FelixKJose
Copy link
Author

Yes, I was asking whether I could put some more custom meta information along with SSEAlgorithm. For example: appId, user name etc. Could kafka publisher publish some meta data along with the message and that meta data can be stored along with the S3 object.

Object MetaData reference from AWS S3:
https://docs.aws.amazon.com/AmazonS3/latest/user-guide/add-object-metadata.html
in that I am talking about User-defined metadata

@OneCricketeer
Copy link

Sure, it could, but currently does not allow that to be configurable, and that should be an issue for a differernt repo. https://github.com/confluentinc/kafka-connect-storage-cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants