-
Notifications
You must be signed in to change notification settings - Fork 708
Implement AsyncSequence/splitLines()
#3411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f0c655d
to
6d6aaf1
Compare
@glbrntt please review whenever you can 🙂 |
987e507
to
eee0202
Compare
@glbrntt I actually agree with you about split not being that helpful by itself, but it's a nice building block for line-split. This is the whole splitLines implementation right now, which mainly uses this "splitDecoder": /// A decoder which splits the data into subsequences that are separated by a line break.
///
/// Use `AsyncSequence/splitLines(omittingEmptySubsequences:maximumBufferSize:)` to create a
/// `NIODecodedAsyncSequence` that uses this decoder.
///
/// The following Characters are considered line breaks, similar to
/// standard library's `String.split(whereSeparator: \.isNewline)`:
/// - "\n" (U+000A): LINE FEED (LF)
/// - U+000B: LINE TABULATION (VT)
/// - U+000C: FORM FEED (FF)
/// - "\r" (U+000D): CARRIAGE RETURN (CR)
/// - "\r\n" (U+000D U+000A): CR-LF
///
/// The following Characters are NOT considered line breaks, unlike in
/// standard library's `String.split(whereSeparator: \.isNewline)`:
/// - U+0085: NEXT LINE (NEL)
/// - U+2028: LINE SEPARATOR
/// - U+2029: PARAGRAPH SEPARATOR
///
/// This is because these characters would require unicode and data-encoding awareness, which
/// are outside swift-nio's scope.
///
/// Usage:
/// ```swift
/// let baseSequence = MyAsyncSequence<ByteBuffer>(...)
/// let splitLinesSequence = baseSequence.splitLines()
///
/// for try await buffer in splitLinesSequence {
/// print("Split by line breaks!\n", buffer.hexDump(format: .detailed))
/// }
/// ```
public struct NIOSplitLinesMessageDecoder: NIOSingleStepByteToMessageDecoder {
public typealias InboundOut = ByteBuffer
@usableFromInline
var splitDecoder: NIOSplitMessageDecoder
@usableFromInline
var previousSeparatorWasCR: Bool
@inlinable
init(omittingEmptySubsequences: Bool) {
self.splitDecoder = NIOSplitMessageDecoder(
omittingEmptySubsequences: omittingEmptySubsequences,
whereSeparator: Self.isLineBreak
)
self.previousSeparatorWasCR = false
}
/// - "\n" (U+000A): LINE FEED (LF)
/// - U+000B: LINE TABULATION (VT)
/// - U+000C: FORM FEED (FF)
/// - "\r" (U+000D): CARRIAGE RETURN (CR)
/// - "\r\n" (U+000D U+000A): CR-LF
///
/// "\r\n" is manually accounted for during the decoding.
@inlinable
static func isLineBreak(_ byte: UInt8) -> Bool {
// First check <= \r. Most bytes won't pass this check, so we can return earlier than if we checked >= \n first.
byte <= UInt8(ascii: "\r") && byte >= UInt8(ascii: "\n")
}
/// Decode the next message from the given buffer.
@inlinable
mutating func decode(buffer: inout ByteBuffer, hasReceivedLastChunk: Bool) throws -> InboundOut? {
while true {
guard
let (slice, separator) = try self.splitDecoder.decode(
buffer: &buffer,
hasReceivedLastChunk: hasReceivedLastChunk
)
else {
return nil
}
// If we are getting rid of empty subsequences then it doesn't matter if we detect
// \r\n as CR+LF, or as a CR + a LF. The backing decoder gets rid of the empty subsequence
// anyway. Therefore, we can return early right here and skip the rest of the logic.
if self.splitDecoder.omittingEmptySubsequences {
return slice
}
// "\r\n" is 2 bytes long, so we need to manually account for it.
switch separator {
case UInt8(ascii: "\n") where slice.readableBytes == 0:
let isCRLF = self.previousSeparatorWasCR
self.previousSeparatorWasCR = false
if isCRLF {
continue
}
case UInt8(ascii: "\r"):
self.previousSeparatorWasCR = true
default:
self.previousSeparatorWasCR = false
}
return slice
}
}
/// Decode the next message separated by one of the ASCII line breaks.
/// To be used when we're still receiving data.
@inlinable
public mutating func decode(buffer: inout ByteBuffer) throws -> InboundOut? {
try self.decode(buffer: &buffer, hasReceivedLastChunk: false)
}
/// Decode the next message separated by one of the ASCII line breaks.
/// To be used when the last chunk of data has been received.
@inlinable
public mutating func decodeLast(buffer: inout ByteBuffer, seenEOF: Bool) throws -> InboundOut? {
try self.decode(buffer: &buffer, hasReceivedLastChunk: true)
}
} |
eee0202
to
4ceefc7
Compare
…3412) ### Motivation: `ByteBuffer.firsIndex` is of suboptimal performance. The default Collection implementations don't go through any "magic underscored" functions like `_customIndexOfEquatableElement`. ### Modifications: Manually implement `firstIndex(where:)`. ### Result: Basically free performance boost. 2x+ boost even for not big buffers of a few hundred bytes. There are some usage of this function in `BufferedReader`. Those will become much faster. Also this function is used in `ByteBufferView.trim(limitingElements:)`. Also makes #3411 stuff faster. See: #3411 (comment)
e905683
to
d025100
Compare
AsyncSequence/split(whereSeparator:)
AsyncSequence/splitLines()
87f0589
to
41c1ac0
Compare
@glbrntt (cc @Lukasa ) FYI there are a few new smaller refinements commits, after |
f1bdb4d
to
cd1dced
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question but otherwise looks good, thanks @MahdiBM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bearing with me on this one Mahdi!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the tests don't compile on Swift 6.0
@glbrntt I think it was a trailing comma issue, let's see if that's the case or custom comments are simply unavailable in 6.0. |
…'s issue)" This reverts commit 7654587.
1c5d498
to
83359c0
Compare
ImplementAsyncSequence/split()
functions similar toString/split()
functions in std-lib.Implement
AsyncSequence/splitLines()
functions similar toString/split(whereSeparator: \.isNewline)
in std-lib.Motivation:
Provide an easy way for users to split the data incoming from an async sequence, using their preferred separator.Provide an easy way for users to split the data incoming from an async sequence, on new lines.
Modifications:
Add
internal SplitMessageDecoder: NIOSingleStepByteToMessageDecoder
.Add
public NIOSplitLinesMessageDecoder: NIOSingleStepByteToMessageDecoder
.Add
public AsyncSequence/splitLines(omittingEmptySubsequences:maximumBufferSize) -> AsyncSeq<ByteBuffer>
.Add
public AsyncSequence/splitUTF8Lines(omittingEmptySubsequences:maximumBufferSize) -> AsyncSeq<String>
.Result:
Users can easily split the data.