-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Recommended config for CSV without primary key #146
Comments
@scheung38 Keys do not need to be unique. If you get two keys that are the same you will still get two records into Kafka. They will end up in the same partition. |
I'm not sure on the KSQL part. I'd have to look into that. I think it doesn't support compound keys so you might need to use a single message transform to make just "order-01" your key. If you are looking for the aggregate view of order-01 you might need to use Kafka Streams instead and build a hierarchy. |
Then could we hash each row on the fly to provide uniqueness, better to write UDF that hashes based on several columns, say C, D, E for hash1 and E, F, G for hash2. But just from one field say 'order-01' wont provide sufficient uniqueness, require several fields. |
Maybe. Given I don't fully understand this data I can't tell you for sure. You could also build a single message transform that concatenates a few fields. Think order-01:8:2020, etc. I personally would want to aggregate this to another topic that is keyed by order-1 with an array of the other content. something like key: order-1, value: [{},{},{}] where values are the combination of those rows. That would give me all the data of an order in a single record |
This is just a tiny sample, will get large amount of data. Other than the first two columns the rest of the fields should be different enough to generate unique hash as primary key for each row. Why put into new topic? Could we not append hash as new column? |
What is the recommended source config if loading CSV into Kafka without unique primary keys? This will be consumed by postgres and other sinks
The text was updated successfully, but these errors were encountered: