Skip to content

Commit 47118c3

Browse files
committed
Newests updates to drop ABI annotations
1 parent 09b508a commit 47118c3

File tree

1 file changed

+9
-50
lines changed

1 file changed

+9
-50
lines changed

proposals/nnnn-utf8span-safe-utf8-processing.md

Lines changed: 9 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -42,31 +42,11 @@ We propose a non-escapable `UTF8Span` which exposes `String` functionality for v
4242
`UTF8Span` is a borrowed view into contiguous memory containing validly-encoded UTF-8 code units.
4343

4444
```swift
45-
@frozen
46-
public struct UTF8Span: Copyable, ~Escapable {
47-
@usableFromInline
48-
internal var _unsafeBaseAddress: UnsafeRawPointer?
49-
50-
/*
51-
A bit-packed count and flags (such as isASCII)
52-
53-
╔═══════╦═════╦══════════╦═══════╗
54-
║ b63 ║ b62 ║ b61:56 ║ b56:0 ║
55-
╠═══════╬═════╬══════════╬═══════╣
56-
║ ASCII ║ NFC ║ reserved ║ count ║
57-
╚═══════╩═════╩══════════╩═══════╝
58-
59-
ASCII means the contents are all-ASCII (<0x7F).
60-
NFC means contents are in normal form C for fast comparisons.
61-
SSC means single-scalar Characters (i.e. grapheme clusters): every
62-
`Character` holds only a single `Unicode.Scalar`.
63-
*/
64-
@usableFromInline
65-
internal var _countAndFlags: UInt64
66-
}
67-
45+
public struct UTF8Span: Copyable, ~Escapable, BitwiseCopyable {}
6846
```
6947

48+
`UTF8Span` is a trivial struct and is 2 words in size on 64-bit platforms.
49+
7050
### UTF-8 validation
7151

7252
We propose new API for identifying where and what kind of encoding errors are present in UTF-8 content.
@@ -166,7 +146,6 @@ extension Unicode.UTF8 {
166146
╚═════════════════╩══════╩═════╩═════╩═════╩═════╩═════╩═════╩══════╝
167147
168148
*/
169-
@frozen
170149
public struct EncodingError: Error, Sendable, Hashable, Codable {
171150
/// The kind of encoding error
172151
public var kind: Unicode.UTF8.EncodingError.Kind
@@ -185,7 +164,6 @@ extension Unicode.UTF8 {
185164

186165
extension UTF8.EncodingError {
187166
/// The kind of encoding error encountered during validation
188-
@frozen
189167
public struct Kind: Error, Sendable, Hashable, Codable, RawRepresentable {
190168
public var rawValue: UInt8
191169

@@ -247,7 +225,6 @@ extension UTF8Span {
247225
public func makeUnicodeScalarIterator() -> UnicodeScalarIterator
248226

249227
/// Iterate the `Unicode.Scalar`s contents of a `UTF8Span`.
250-
@frozen
251228
public struct UnicodeScalarIterator: ~Escapable {
252229
public let codeUnits: UTF8Span
253230

@@ -292,14 +269,14 @@ extension UTF8Span {
292269
///
293270
/// Returns the number of `Unicode.Scalar`s skipped over, which can be 0
294271
/// if at the start of the UTF8Span.
295-
public mutating func skipBack() -> Bool
272+
public mutating func skipBack() -> Int
296273

297274
/// Move `codeUnitOffset` to the start of the previous `n` scalars,
298275
/// without decoding them.
299276
///
300277
/// Returns the number of `Unicode.Scalar`s skipped over, which can be
301278
/// fewer than `n` if at the start of the UTF8Span.
302-
public mutating func skipBack(by n: Int) -> Bool
279+
public mutating func skipBack(by n: Int) -> Int
303280

304281
/// Reset to the nearest scalar-aligned code unit offset `<= i`.
305282
public mutating func reset(roundingBackwardsFrom i: Int)
@@ -335,14 +312,13 @@ extension UTF8Span {
335312

336313
```
337314

338-
339315
### Character processing
340316

341317
We similarly propose a `UTF8Span.CharacterIterator` type that can do grapheme-breaking forwards and backwards.
342318

343319
The `CharacterIterator` assumes that the start and end of the `UTF8Span` is the start and end of content.
344320

345-
Any scalar-aligned position is a valid place to start or reset the grapheme-breaking algorithm to, though you could get different `Character` output if if resetting to a position that isn't `Character`-aligned relative to the start of the `UTF8Span` (e.g. in the middle of a series of regional indicators).
321+
Any scalar-aligned position is a valid place to start or reset the grapheme-breaking algorithm to, though you could get different `Character` output if resetting to a position that isn't `Character`-aligned relative to the start of the `UTF8Span` (e.g. in the middle of a series of regional indicators).
346322

347323
```swift
348324

@@ -357,8 +333,9 @@ extension UTF8Span {
357333
public struct CharacterIterator: ~Escapable {
358334
public let codeUnits: UTF8Span
359335

360-
/// The byte offset of the start of the next `Character`. This is
361-
/// always scalar-aligned and `Character`-aligned.
336+
/// The byte offset of the start of the next `Character`. This is always
337+
/// scalar-aligned. It is always `Character`-aligned relative to the last
338+
/// call to `reset` (or the start of the span if not called).
362339
public var currentCodeUnitOffset: Int { get private(set) }
363340

364341
public init(_ span: UTF8Span)
@@ -827,23 +804,5 @@ Finally, in the future there will likely be some kind of `Container` protocol fo
827804

828805
Karoy Lorentey, Karl, Geordie_J, and fclout, contributed to this proposal with their clarifying questions and discussions.
829806

830-
<!--
831-
832-
Pending questions:
833-
834-
1) How should we talk about `_countAndFlags` and the frozenness of `UTF8Span` and its stored properties?
835-
836-
We want to be able to communicate to SE what the type is and how it could evolve.
837-
838-
Basically, I want to say that this is a trivial 2-word struct whose lifetime is statically managed. Trivial 2-word comes from `@frozen` and listing its stored members in the proposal and statically managed comes from mentioning the `: ~Escapable`. This is similar to how the `Span` proposal specified both `@frozen` and the stored members (it did omit `@usableFromInline`).
839-
840-
If we are going to talk about the layout in the proposal, then the next question is whether it makes some sense to talk about the custom hand-coded bit interpretation for some of that layout. It is very much ABI and it shows potential evolution directions and constraints. I could see arguments either way.
841-
842-
2) Should we have a public unsafe unchecked initializer that skips UTF-8 validation?
843-
844-
We'd want the developer to be very sure that it is in fact valid UTF-8. For example, Rust has `from_utf8_unchecked()`.
845-
846-
Or, we keep it internal and have `String` use it.
847807

848808

849-
-->

0 commit comments

Comments
 (0)