Skip to content

Conversation

@MahdiBM
Copy link
Contributor

@MahdiBM MahdiBM commented Oct 16, 2025

Implement AsyncSequence/split() functions similar to String/split() functions in std-lib.
Implement AsyncSequence/splitLines() functions similar to String/split(whereSeparator: \.isNewline) in std-lib.

Motivation:

Provide an easy way for users to split the data incoming from an async sequence, using their preferred separator.
Provide an easy way for users to split the data incoming from an async sequence, on new lines.

Modifications:

Add internal SplitMessageDecoder: NIOSingleStepByteToMessageDecoder.
Add public NIOSplitLinesMessageDecoder: NIOSingleStepByteToMessageDecoder.
Add public AsyncSequence/splitLines(omittingEmptySubsequences:maximumBufferSize) -> AsyncSeq<ByteBuffer>.
Add public AsyncSequence/splitUTF8Lines(omittingEmptySubsequences:maximumBufferSize) -> AsyncSeq<String>.

Result:

Users can easily split the data.

@MahdiBM MahdiBM force-pushed the mmbm-split-lines-take-2 branch 4 times, most recently from f0c655d to 6d6aaf1 Compare October 16, 2025 10:11
@MahdiBM MahdiBM marked this pull request as ready for review October 16, 2025 10:17
@MahdiBM
Copy link
Contributor Author

MahdiBM commented Oct 16, 2025

@glbrntt please review whenever you can 🙂

@MahdiBM MahdiBM force-pushed the mmbm-split-lines-take-2 branch 2 times, most recently from 987e507 to eee0202 Compare October 16, 2025 13:14
@MahdiBM
Copy link
Contributor Author

MahdiBM commented Oct 16, 2025

@glbrntt I actually agree with you about split not being that helpful by itself, but it's a nice building block for line-split.
So basically we can have this generic split function for free, and also have splitLines.

This is the whole splitLines implementation right now, which mainly uses this "splitDecoder":

/// A decoder which splits the data into subsequences that are separated by a line break.
///
/// Use `AsyncSequence/splitLines(omittingEmptySubsequences:maximumBufferSize:)` to create a
/// `NIODecodedAsyncSequence` that uses this decoder.
///
/// The following Characters are considered line breaks, similar to
/// standard library's `String.split(whereSeparator: \.isNewline)`:
/// - "\n" (U+000A): LINE FEED (LF)
/// - U+000B: LINE TABULATION (VT)
/// - U+000C: FORM FEED (FF)
/// - "\r" (U+000D): CARRIAGE RETURN (CR)
/// - "\r\n" (U+000D U+000A): CR-LF
///
/// The following Characters are NOT considered line breaks, unlike in
/// standard library's `String.split(whereSeparator: \.isNewline)`:
/// - U+0085: NEXT LINE (NEL)
/// - U+2028: LINE SEPARATOR
/// - U+2029: PARAGRAPH SEPARATOR
///
/// This is because these characters would require unicode and data-encoding awareness, which
/// are outside swift-nio's scope.
///
/// Usage:
/// ```swift
/// let baseSequence = MyAsyncSequence<ByteBuffer>(...)
/// let splitLinesSequence = baseSequence.splitLines()
///
/// for try await buffer in splitLinesSequence {
///     print("Split by line breaks!\n", buffer.hexDump(format: .detailed))
/// }
/// ```
public struct NIOSplitLinesMessageDecoder: NIOSingleStepByteToMessageDecoder {
    public typealias InboundOut = ByteBuffer

    @usableFromInline
    var splitDecoder: NIOSplitMessageDecoder
    @usableFromInline
    var previousSeparatorWasCR: Bool

    @inlinable
    init(omittingEmptySubsequences: Bool) {
        self.splitDecoder = NIOSplitMessageDecoder(
            omittingEmptySubsequences: omittingEmptySubsequences,
            whereSeparator: Self.isLineBreak
        )
        self.previousSeparatorWasCR = false
    }

    /// - "\n" (U+000A): LINE FEED (LF)
    /// - U+000B: LINE TABULATION (VT)
    /// - U+000C: FORM FEED (FF)
    /// - "\r" (U+000D): CARRIAGE RETURN (CR)
    /// - "\r\n" (U+000D U+000A): CR-LF
    ///
    /// "\r\n" is manually accounted for during the decoding.
    @inlinable
    static func isLineBreak(_ byte: UInt8) -> Bool {
        // First check <= \r. Most bytes won't pass this check, so we can return earlier than if we checked >= \n first.
        byte <= UInt8(ascii: "\r") && byte >= UInt8(ascii: "\n")
    }

    /// Decode the next message from the given buffer.
    @inlinable
    mutating func decode(buffer: inout ByteBuffer, hasReceivedLastChunk: Bool) throws -> InboundOut? {
        while true {
            guard
                let (slice, separator) = try self.splitDecoder.decode(
                    buffer: &buffer,
                    hasReceivedLastChunk: hasReceivedLastChunk
                )
            else {
                return nil
            }

            // If we are getting rid of empty subsequences then it doesn't matter if we detect
            // \r\n as CR+LF, or as a CR + a LF. The backing decoder gets rid of the empty subsequence
            // anyway. Therefore, we can return early right here and skip the rest of the logic.
            if self.splitDecoder.omittingEmptySubsequences {
                return slice
            }

            // "\r\n" is 2 bytes long, so we need to manually account for it.
            switch separator {
            case UInt8(ascii: "\n") where slice.readableBytes == 0:
                let isCRLF = self.previousSeparatorWasCR
                self.previousSeparatorWasCR = false
                if isCRLF {
                    continue
                }
            case UInt8(ascii: "\r"):
                self.previousSeparatorWasCR = true
            default:
                self.previousSeparatorWasCR = false
            }

            return slice
        }
    }

    /// Decode the next message separated by one of the ASCII line breaks.
    /// To be used when we're still receiving data.
    @inlinable
    public mutating func decode(buffer: inout ByteBuffer) throws -> InboundOut? {
        try self.decode(buffer: &buffer, hasReceivedLastChunk: false)
    }

    /// Decode the next message separated by one of the ASCII line breaks.
    /// To be used when the last chunk of data has been received.
    @inlinable
    public mutating func decodeLast(buffer: inout ByteBuffer, seenEOF: Bool) throws -> InboundOut? {
        try self.decode(buffer: &buffer, hasReceivedLastChunk: true)
    }
}

@MahdiBM MahdiBM force-pushed the mmbm-split-lines-take-2 branch from eee0202 to 4ceefc7 Compare October 16, 2025 17:50
glbrntt pushed a commit that referenced this pull request Oct 17, 2025
…3412)

### Motivation:

`ByteBuffer.firsIndex` is of suboptimal performance.
The default Collection implementations don't go through any "magic
underscored" functions like `_customIndexOfEquatableElement`.

### Modifications:

Manually implement `firstIndex(where:)`.

### Result:

Basically free performance boost. 2x+ boost even for not big buffers of
a few hundred bytes.

There are some usage of this function in `BufferedReader`. Those will
become much faster.
Also this function is used in `ByteBufferView.trim(limitingElements:)`.

Also makes #3411 stuff faster. See:
#3411 (comment)
@MahdiBM MahdiBM force-pushed the mmbm-split-lines-take-2 branch 3 times, most recently from e905683 to d025100 Compare October 17, 2025 16:05
@MahdiBM MahdiBM changed the title Implement AsyncSequence/split(whereSeparator:) Implement AsyncSequence/splitLines() Oct 17, 2025
@MahdiBM MahdiBM force-pushed the mmbm-split-lines-take-2 branch 5 times, most recently from 87f0589 to 41c1ac0 Compare October 19, 2025 16:37
@MahdiBM MahdiBM requested review from Lukasa and glbrntt October 20, 2025 07:30
@MahdiBM
Copy link
Contributor Author

MahdiBM commented Oct 20, 2025

@glbrntt (cc @Lukasa ) FYI there are a few new smaller refinements commits, after
Revert "Use ByteToMessageDecoderVerifier ....

Copy link
Contributor

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question but otherwise looks good, thanks @MahdiBM!

@MahdiBM MahdiBM requested a review from glbrntt October 21, 2025 10:07
@glbrntt glbrntt added the 🆕 semver/minor Adds new public API. label Oct 21, 2025
Copy link
Contributor

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bearing with me on this one Mahdi!

Copy link
Contributor

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the tests don't compile on Swift 6.0

@MahdiBM MahdiBM requested a review from glbrntt October 21, 2025 10:53
@MahdiBM
Copy link
Contributor Author

MahdiBM commented Oct 21, 2025

@glbrntt I think it was a trailing comma issue, let's see if that's the case or custom comments are simply unavailable in 6.0.

@MahdiBM MahdiBM force-pushed the mmbm-split-lines-take-2 branch from 1c5d498 to 83359c0 Compare October 21, 2025 13:43
@glbrntt glbrntt merged commit 0469372 into apple:main Oct 21, 2025
58 of 62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🆕 semver/minor Adds new public API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants