Conversation
|
I like this approach, especially launching separate nodes per stream. Makes sense to me. I'm a little confused on the extractor bit though: In the old |
|
Configuration examples or docs would help |
They are related. For example the framer splits on CSV lines but not on individual fields which have to be selected separately. If that makes sense? We could change it so that there's one thing that splits records and canonicalizes them into a JSON structure, and then provide a jq-compatible syntax for mapping individual fields to topics. (I'd love a different word than 'frame'.) |
figuernd
left a comment
There was a problem hiding this comment.
Documentation / usage examples are the main thing missing for me
| else: | ||
| fr = RawFramer() | ||
| if isinstance(cfg.extractor, DelimitedExtractorConfig): | ||
| ex = DelimitedExtractor(cfg.extractor.pattern, cfg.extractor.fields) |
There was a problem hiding this comment.
Yeah how would a delimited extractor work with a JsonFramer?
This pull request replaces the
network_data_capturenode with a newio_node, which provides approximately a superset of functionality ofnetwork_data_captureandbridge_nodefromds_util_nodes.An
io_nodeinstance is comprised of a Transport, a Framer, and an Extractor:UDPTransport-- for UDP portsSerialTransport-- for serial portsRawFramer-- each read from the data stream returns a single complete unitDelimitedFramer-- buffers data and splits on a delimiter, such as a line endingJsonFramer-- buffers data and splits on fully formed JSON objectsDelimitedExtractor-- splits a unit on a delimiter and extracts specified indicesJsonExtractor-- parses a unit as JSON and extract specified fields by path (a subset of JSONPath)An example config:
Note
Extractors are intended for ad hoc data collection purposes but are not a substitute for a proper driver node whenever possible.
Raw messages are published to
~in, while parsed fields publish to~in/<field_name>. The node also subscribes to~outfor outbound traffic.Only a single I/O stream is supported per node. This is a simplification so that the node does not have to loop over connections and shuttle around complex state objects. An uncaught exception in one stream cannot disrupt the others. Where multiple connections are needed, multiple nodes should be executed through the launch file. (Configuration improvements will help to make this easier for operators.)
Configuration validation is provided by Pydantic, which provides reasonably human-friendly error messages.
io_nodeimplementationnetwork_data_capturenetwork_data_captureandbridge_nodeCloses #78 and resolves a blocker for #68.