Skip to content

Styx Data Path

Mikko Karjalainen edited this page Feb 5, 2020 · 8 revisions

This document discusses Styx forwarding path for HTTP requests. Writing it here for the benefit of new developers.

Specifically we will cover the following:

  • Styx live request/response messages
  • Content body publishers
  • Http Pipeline Handler

Request/Response Messages

Styx is a cut-through proxy (https://en.wikipedia.org/wiki/Cut-through_switching). It begins forwarding the received message as soon as the HTTP headers are received. It won’t wait for the message body. The body will stream through later as soon as it is received.

Styx LiveHttpRequest and LiveHttpResponse message types are designed for the cut-through proxying. They expose all HTTP headers, but the content is available asynchronously via ByteStream publisher:

class LiveHttpRequest {
    ..
    public ByteStream body() { ... }
    ..
}

The ByteStream is just a reactive streams Publisher:

public class ByteStream implements Publisher<Buffer> { … }

A LiveHttpRequest propagates synchronously through Styx Core. It is passed on the call stack, from a handler to handler, and from an interceptor to an interceptor. But in the opposite direction, a LiveHttpResponse comes back as an asynchronous reactive Publisher event.

Both request and response body always flows through asynchronously via ByteStream publisher.

ByteStream must support flow control. Without flow control Styx will ingest too much data at once and run out of memory.

ByteStream Content Publishers

A ByteStream is a Reactive Streams Publisher, and as such it supports non blocking back pressure. There are other good sources of information about reactive back pressure protocol, and I won’t go into details here. But in a nutshell:

  • The Subscriber requests for N events.
  • The Publisher will emit up to the requested N events, and never more than the requested N events.
  • The Publisher would discard, queue, or otherwise avoid sending excess events until the subscriber requests for more.

Obviously we can’t just discard random user data packets. Therefore ByteStream content producer queues any excess data, and suspends the underlying Netty channel until the subscriber is ready to accept for more. This is how Styx implements flow control.

  • Queue excess data packets.
  • Suspend the Netty channel by setting its autoRead property to false. This stops Netty from issuing more channelRead events until we ask for more. This then fills the TCP receive window, and eventually slows down the sender (remote end).
  • A back pressure request from a subscriber will unblock some queued content data packets, and ask for Netty to produce more data.

The ByteStream subscriber uses reactive back pressure requests to control the rate at which the publisher produces events. When Styx HTTP response writer (and request writer) consumes the byte stream, in order to proxy the content body, it asks for one buffer at the time. Specifically it requests for the next content packet only after the previous packet is successfully written to the Netty socket channel.

FlowControllingHttpContentProducer

The ByteStream is non-opinionated of its underlying data source. You can create a ByteStream from Java String like so:

ByteStream.from("Hello!", UTF_8);

Or from another Publisher:

Publisher p = createPublisher(...);
new ByteStream(p);

The FlowControllingPublisher and FlowControllingHttpContentProducer classes implement a reactive Publisher to source ByteStream content from Netty channel. The former is just a shallow shim to implement the Publisher interface, while the latter does all the heavy lifting.

The primary purpose of FlowControllingHttpContentProducer is to implement non blocking back pressure, and to enforce the Reactive Streams event order. It also mediates between asynchronous event sources:

  • Netty pipeline - produces content data (Buffer) events on the Netty IO thread.
  • Content subscriber - a reactive subscriber thread, and reactive back pressure requests. The subscriber runs on a separate thread.

As it publishes "real time" networking events, it has some additional considerations:

  • It cannot lose any user data. All content chunks are queued until a subscriber subscribes.
  • It allows only one subscription. There is little point having a 2nd subscription half way through the stream.
  • It has to deal with network errors, timeouts etc.

We implemented FlowControllingHttpContentProducer using a Finite State Machine (FSM).

An FSM is a great tool to manage complexity in asynchronous environment. It makes it easy to reason about the content stream state. This would be notoriously difficult otherwise. Perhaps most importantly we can enumerate all possible state/transition pairs to ensure all possible scenarios are covered.

Other important scenarios to consider, are:

  • Subscriber times out
  • Network times out
  • Threading model, and lock free operation
  • TCP connection closure

The FSM Design

The FSM regulates the packet flow as per reactive back pressure protocol. It receives content data packets from Netty channel pipeline, and emits them to the subscriber as requested. The specifics depend on the producer state.

All data packets go through a FIFO queue, from which they are dequeued as per reactive back pressure requests.

The FSM has four basic states:

BUFFERING

  • An initial state
  • Waiting for subscribtion.
  • The subscriber has not yet subscribed.
  • The FSM queues all received data because it doesn’t have a subscriber to send to.

BUFFERING_COMPLETED

  • Still waiting for subscription.
  • The message body is fully received. Therefore no more content chunk events are expected.
  • The full message body sits in the queue until a subscriber subscribes.

STREAMING

  • Has an active subscription.
  • The message body is still ongoing. It has not yet been fully received.
  • The FSM keeps emitting content chunks from the FIFO, subject to reactive back pressure protocol.

EMITTING_BUFFERED_CONTENT

  • Has an active subscription.
  • The message body is fully received. Therefore no more content chunk events are expected.
  • There is still data in FIFO that has not yet been emitted due to back pressure constraints.
Clone this wiki locally