Skip to content

Commit 2a3025a

Browse files
committed
perf: canonical encoding via type parameters
Signed-off-by: Liam Gray <[email protected]>
1 parent bef3820 commit 2a3025a

File tree

6 files changed

+212
-138
lines changed

6 files changed

+212
-138
lines changed

ciborium/Cargo.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ hex = "0.4"
3434
[features]
3535
default = ["std"]
3636
std = ["ciborium-io/std", "serde/std"]
37-
canonical = ["std"]
3837

3938
[package.metadata.docs.rs]
4039
all-features = true

ciborium/README.md

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -11,20 +11,13 @@ Ciborium contains CBOR serialization and deserialization implementations for ser
1111

1212
## Quick Start
1313

14-
You're probably looking for [`from_reader()`](crate::de::from_reader)
15-
and [`into_writer()`](crate::ser::into_writer), which are
16-
the main functions. Note that byte slices are also readers and writers and can be
17-
passed to these functions just as streams can.
14+
You're probably looking for [`from_reader()`](crate::de::from_reader),
15+
[`to_vec()`](crate::ser::to_vec), and [`into_writer()`](crate::ser::into_writer),
16+
which are the main functions. Note that byte slices are also readers and writers
17+
and can be passed to these functions just as streams can.
1818

1919
For dynamic CBOR value creation/inspection, see [`Value`](crate::value::Value).
2020

21-
## Features
22-
- `std`: enabled by default.
23-
- `canonical`: allows serializing with a `CanonicalizationScheme` for deterministic
24-
outputs. Incurs a small performance penalty (~20% slower) when serializing
25-
without a canonicalization scheme, and a large penalty (~100% slower) when
26-
serializing with a canonicalization scheme.
27-
2821
## Design Decisions
2922

3023
### Always Serialize Numeric Values to the Smallest Size
@@ -96,4 +89,23 @@ be avoided because it can be fragile as it exposes invariants of your Rust
9689
code to remote actors. We might consider adding this in the future. If you
9790
are interested in this, please contact us.
9891

92+
### Canonical Encodings
93+
94+
The ciborium crate has support for various canonical encodings during
95+
serialization.
96+
97+
- [`NoCanonicalization`](crate::canonical::NoCanonicalization): the default,
98+
numbers are still encoded in their smallest form, but map keys are not
99+
sorted for maximum serialization speed.
100+
- [`Rfc7049`](crate::canonical::Rfc7049): the canonicalization scheme from
101+
RFC 7049 that sorts map keys in a length-first order. Eg.
102+
`["a", "b", "aa"]`.
103+
- [`Rfc8949`](crate::canonical::Rfc8949): the canonicalization scheme from
104+
RFC 8949 that sorts map keys in a bytewise lexicographic order. Eg.
105+
`["a", "aa", "b"]`.
106+
107+
To use canonicalization, you must enable the `std` feature. See the examples
108+
in [`to_vec_canonical`](crate::ser::to_vec_canonical) and
109+
[`into_writer_canonical`](crate::ser::into_writer_canonical) for more.
110+
99111
License: Apache-2.0

ciborium/src/canonical.rs

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
//! Canonicalization support for CBOR serialization.
2+
//!
3+
//! Supports various canonicalization schemes for deterministic CBOR serialization. The default is
4+
//! [NoCanonicalization] for the fastest serialization. Canonical serialization is around 2x slower.
5+
6+
/// Which canonicalization scheme to use for CBOR serialization.
7+
///
8+
/// Can only be initialized with the `std` feature enabled.
9+
#[doc(hidden)]
10+
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
11+
pub enum CanonicalizationScheme {
12+
/// Sort map keys in output according to [RFC 7049]'s deterministic encoding spec.
13+
///
14+
/// Also aligns with [RFC 8949 4.2.3]'s backwards compatibility sort order.
15+
///
16+
/// Uses length-first map key ordering. Eg. `["a", "b", "aa"]`.
17+
#[cfg(feature = "std")]
18+
Rfc7049,
19+
20+
/// Sort map keys in output according to [RFC 8949]'s deterministic encoding spec.
21+
///
22+
/// Uses bytewise lexicographic map key ordering. Eg. `["a", "aa", "b"]`.
23+
#[cfg(feature = "std")]
24+
Rfc8949,
25+
}
26+
27+
/// Don't sort map key output.
28+
pub struct NoCanonicalization;
29+
30+
/// Sort map keys in output according to [RFC 7049]'s deterministic encoding spec.
31+
///
32+
/// Also aligns with [RFC 8949 4.2.3]'s backwards compatibility sort order.
33+
///
34+
/// Uses length-first map key ordering. Eg. `["a", "b", "aa"]`.
35+
#[cfg(feature = "std")]
36+
pub struct Rfc7049;
37+
38+
/// Sort map keys in output according to [RFC 8949]'s deterministic encoding spec.
39+
///
40+
/// Uses bytewise lexicographic map key ordering. Eg. `["a", "aa", "b"]`.
41+
#[cfg(feature = "std")]
42+
pub struct Rfc8949;
43+
44+
/// Trait for canonicalization schemes.
45+
///
46+
/// See implementors:
47+
/// - [NoCanonicalization] for no canonicalization (fastest).
48+
/// - [Rfc7049] for length-first map key sorting.
49+
/// - [Rfc8949] for bytewise lexicographic map key sorting.
50+
pub trait Canonicalization {
51+
/// True if keys should be cached and sorted.
52+
const IS_CANONICAL: bool;
53+
54+
/// Determines which sorting implementation to use.
55+
const SCHEME: Option<CanonicalizationScheme>;
56+
}
57+
58+
impl Canonicalization for NoCanonicalization {
59+
const IS_CANONICAL: bool = false;
60+
const SCHEME: Option<CanonicalizationScheme> = None;
61+
}
62+
63+
#[cfg(feature = "std")]
64+
impl Canonicalization for Rfc7049 {
65+
const IS_CANONICAL: bool = true;
66+
const SCHEME: Option<CanonicalizationScheme> = Some(CanonicalizationScheme::Rfc7049);
67+
}
68+
69+
#[cfg(feature = "std")]
70+
impl Canonicalization for Rfc8949 {
71+
const IS_CANONICAL: bool = true;
72+
const SCHEME: Option<CanonicalizationScheme> = Some(CanonicalizationScheme::Rfc8949);
73+
}

ciborium/src/lib.rs

Lines changed: 25 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,20 +6,13 @@
66
//!
77
//! # Quick Start
88
//!
9-
//! You're probably looking for [`from_reader()`](crate::de::from_reader)
10-
//! and [`into_writer()`](crate::ser::into_writer), which are
11-
//! the main functions. Note that byte slices are also readers and writers and can be
12-
//! passed to these functions just as streams can.
9+
//! You're probably looking for [`from_reader()`](crate::de::from_reader),
10+
//! [`to_vec()`](crate::ser::to_vec), and [`into_writer()`](crate::ser::into_writer),
11+
//! which are the main functions. Note that byte slices are also readers and writers
12+
//! and can be passed to these functions just as streams can.
1313
//!
1414
//! For dynamic CBOR value creation/inspection, see [`Value`](crate::value::Value).
1515
//!
16-
//! # Features
17-
//! - `std`: enabled by default.
18-
//! - `canonical`: allows serializing with a `CanonicalizationScheme` for deterministic
19-
//! outputs. Incurs a small performance penalty (~20% slower) when serializing
20-
//! without a canonicalization scheme, and a large penalty (~100% slower) when
21-
//! serializing with a canonicalization scheme.
22-
//!
2316
//! # Design Decisions
2417
//!
2518
//! ## Always Serialize Numeric Values to the Smallest Size
@@ -90,6 +83,25 @@
9083
//! be avoided because it can be fragile as it exposes invariants of your Rust
9184
//! code to remote actors. We might consider adding this in the future. If you
9285
//! are interested in this, please contact us.
86+
//!
87+
//! ## Canonical Encodings
88+
//!
89+
//! The ciborium crate has support for various canonical encodings during
90+
//! serialization.
91+
//!
92+
//! - [`NoCanonicalization`](crate::canonical::NoCanonicalization): the default,
93+
//! numbers are still encoded in their smallest form, but map keys are not
94+
//! sorted for maximum serialization speed.
95+
//! - [`Rfc7049`](crate::canonical::Rfc7049): the canonicalization scheme from
96+
//! RFC 7049 that sorts map keys in a length-first order. Eg.
97+
//! `["a", "b", "aa"]`.
98+
//! - [`Rfc8949`](crate::canonical::Rfc8949): the canonicalization scheme from
99+
//! RFC 8949 that sorts map keys in a bytewise lexicographic order. Eg.
100+
//! `["a", "aa", "b"]`.
101+
//!
102+
//! To use canonicalization, you must enable the `std` feature. See the examples
103+
//! in [`to_vec_canonical`](crate::ser::to_vec_canonical) and
104+
//! [`into_writer_canonical`](crate::ser::into_writer_canonical) for more.
93105
94106
#![cfg_attr(not(feature = "std"), no_std)]
95107
#![deny(missing_docs)]
@@ -99,6 +111,7 @@
99111

100112
extern crate alloc;
101113

114+
pub mod canonical;
102115
pub mod de;
103116
pub mod ser;
104117
pub mod tag;
@@ -113,11 +126,7 @@ pub use crate::ser::{into_writer, Serializer};
113126

114127
#[doc(inline)]
115128
#[cfg(feature = "std")]
116-
pub use crate::ser::to_vec;
117-
118-
#[doc(inline)]
119-
#[cfg(feature = "canonical")]
120-
pub use crate::ser::{into_writer_canonical, to_vec_canonical};
129+
pub use crate::ser::{into_writer_canonical, to_vec, to_vec_canonical};
121130

122131
#[cfg(feature = "std")]
123132
#[doc(inline)]

0 commit comments

Comments
 (0)