You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description:
The Avro Map extension is often referenced in conjunction with Siddhi's extnesion for Kafka. Unfortunately none of the available examples for using these together seem to work!
The culprit appears to be that Confluent's support for Avro in Kafka utiilizes an interaction with its Schema Registry to assign a schema ID to each unique schema it finds, and prefixes every message it stores in Kafka with its schema ID in a 5-byte prefix just before every Avro-seraiilized message follows. Those five bytes are not stripped away before this mapper is invoked, and the extra 5 byte header throws the binary decoding off kilter.
It is possible to reproduce the same incorrect decoding by taking a message from an Avro-enabled topic and get the identical wrong result just by retaining the 5-byte header, and then also to cure the problem by removing those five out-of-band header bytes.
Affected Product Version:
2.0.6
OS, DB, other environment details and versions:
MacOS Mojave (10.14.4) running with Kafka 2.3.1's Docker Images on Docker Desktop Community 2.1.0.4
Confirm that the retention of Confluent's 5 byte header for Schema Registry yields the same incorrect result: avro-tools fragtojson --schema-file sample.avsc onemsg.dat Observe that the name is similarly absent and the amount is still 1.410940917531979E224, not the expected 0.9629494268310558
Alternately, consider providing an alternate map extension that leverages https://github.com/AbsaOSS/ABRiS, an Avro bridge for Apache Spark that supports encoding and decoding with either a user-provided schema, a Confluent schema registry, or both. The first of these three case is what Siddhi's existing map covers, the other two are the cases it does not.
The difference between ABRiS's two schema registry cases, one with user-provided schema, the other without, boil down to whether registry is used to provide source-of-truth schema for encoding/decoding or to verify compatibility with user-provided source-of-truth schema used for encoding/decoding. In both cases, ABRiS ensures correct addition/removal of Confluent's 5-byte header with content kept consistent with Schema Registry semantics.
Hold on a minute. Maybe I missed it before, but this extension already accepts a schema registry argument. It seems to want a hard-coded reference to the expected schema ID, rather than using a topic name, but this issue may boil down to user error on my part. I may be revisiting this use case again soon and will post an amendment soon after that if what I'm seeing on revisiting it this evening pans out. It looks as though I simply failed to set the right options to accomplish my use case. I'll return to either clarify or close this issue once I've found that opportunity to revisit it, and leave this half-embarrassed comment to help anyone who happens to look at it before then in the meantime.
Description:
The Avro Map extension is often referenced in conjunction with Siddhi's extnesion for Kafka. Unfortunately none of the available examples for using these together seem to work!
The culprit appears to be that Confluent's support for Avro in Kafka utiilizes an interaction with its Schema Registry to assign a schema ID to each unique schema it finds, and prefixes every message it stores in Kafka with its schema ID in a 5-byte prefix just before every Avro-seraiilized message follows. Those five bytes are not stripped away before this mapper is invoked, and the extra 5 byte header throws the binary decoding off kilter.
It is possible to reproduce the same incorrect decoding by taking a message from an Avro-enabled topic and get the identical wrong result just by retaining the 5-byte header, and then also to cure the problem by removing those five out-of-band header bytes.
Affected Product Version:
2.0.6
OS, DB, other environment details and versions:
MacOS Mojave (10.14.4) running with Kafka 2.3.1's Docker Images on Docker Desktop Community 2.1.0.4
Steps to reproduce:
sample.json
:avro-tools fragtojson --schema-file sample.avsc onemsg.dat
Observe that the name is similarly absent and the amount is still 1.410940917531979E224, not the expected 0.9629494268310558
Observe that this time the 5-byte-shorter message deserialized too our original input was!!
The text was updated successfully, but these errors were encountered: