Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Tid::now and Tid::from_datetime constructors #277

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 50 additions & 1 deletion atrium-api/src/types/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,22 @@ use regex::Regex;
use serde::{de::Error, Deserialize, Deserializer, Serialize, Serializer};
use std::{cmp, ops::Deref, str::FromStr, sync::OnceLock};

// Reference: https://github.com/bluesky-social/indigo/blob/9e3b84fdbb20ca4ac397a549e1c176b308f7a6e1/repo/tid.go#L11-L19
fn s32_encode(mut i: u64) -> String {
const S32_CHAR: &[u8] = b"234567abcdefghijklmnopqrstuvwxyz";

let mut s = String::new();
DrChat marked this conversation as resolved.
Show resolved Hide resolved
for _ in 0..13 {
let c = i & 0x1F;
s.push(S32_CHAR[c as usize] as char);

i >>= 5;
}

// Reverse the string to convert it to big-endian format.
s.as_str().chars().rev().collect()
DrChat marked this conversation as resolved.
Show resolved Hide resolved
}

/// Common trait implementations for Lexicon string formats that are newtype wrappers
/// around `String`.
macro_rules! string_newtype {
Expand Down Expand Up @@ -410,7 +426,7 @@ impl Serialize for Language {

/// A [Timestamp Identifier].
///
/// [Timestamp Identifier]: https://atproto.com/specs/record-key#record-key-type-tid
/// [Timestamp Identifier]: https://atproto.com/specs/tid
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Hash)]
#[serde(transparent)]
pub struct Tid(String);
Expand All @@ -436,6 +452,27 @@ impl Tid {
}
}

/// Construct a new timestamp with the specified clock ID.
///
/// Clock IDs 0-31 can be used as an ad-hoc clock ID if you are not concerned
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the Clock ID 0-31? I am wondering because the specs say 10bit and it seems to be a value in the range 0-1023. If you know, I would like to know.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's some more info that I've found online: bluesky-social/atproto#1160 (comment)
It appears that the clock ID partitioning did not make it into the specification - but based on this, clocks 0-31 are ad-hoc identifiers (and one is randomly chosen by the reference implementation).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reply! I don't think we need to bother writing about the 0-31 range as long as it is not explicitly stated in the specification.

It seems more important to have a mechanism to ensure that a value larger than the previously issued timestamp is generated in order to avoid collisions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

It seems more important to have a mechanism to ensure that a value larger than the previously issued timestamp is generated in order to avoid collisions.

Agreed, though I'm still up in the air about who should be responsible for this.
The upstream libraries written by Bluesky handle this because they have millisecond-level precision and conflicts are very likely to occur at that level.
However, they do this by maintaining global state, which is far less than ideal.

There are other factors as well - like I'm not really sure if a conflict matters if an application is writing separate records.
E.g. AFAIK, you could have a com.example.foo and a com.example.bar with the same record key.

I'm wondering if we should just advise application developers about this hazard and have them handle conflicts, such as repeatedly calling Tid::now() if it returns the same value.
Or maybe provide a wrapper function or something?

Let me keep thinking on this...

/// with this parameter.
pub fn from_datetime(cid: u32, time: chrono::DateTime<chrono::Utc>) -> Self {
DrChat marked this conversation as resolved.
Show resolved Hide resolved
DrChat marked this conversation as resolved.
Show resolved Hide resolved
let time = time.timestamp_micros() as u64;

// The TID is laid out as follows:
// 0TTTTTTTTTTTTTTT TTTTTTTTTTTTTTTT TTTTTTTTTTTTTTTT TTTTTTCCCCCCCCCC
let tid = (time << 10) & 0x7FFF_FFFF_FFFF_FC00 | (cid as u64) & 0x3FF;
Self(s32_encode(tid))
}

/// Construct a new [Tid] that represents the current time.
///
/// Clock IDs 0-31 can be used as an ad-hoc clock ID if you are not concerned
/// with this parameter.
pub fn now(cid: u32) -> Self {
Self::from_datetime(cid, chrono::Utc::now())
}

/// Returns the TID as a string slice.
pub fn as_str(&self) -> &str {
self.0.as_str()
Expand Down Expand Up @@ -766,6 +803,18 @@ mod tests {
}
}

#[test]
fn tid_encode() {
assert_eq!(s32_encode(0), "2222222222222");
assert_eq!(s32_encode(1), "2222222222223");
}

#[test]
fn tid_construct() {
let tid = Tid::from_datetime(0, chrono::DateTime::from_timestamp(1738430999, 0).unwrap());
assert_eq!(tid.as_str(), "3lh5234mwy222");
}

#[test]
fn valid_tid() {
for valid in ["3jzfcijpj2z2a", "7777777777777", "3zzzzzzzzzzzz"] {
Expand Down