Skip to content

Commit

Permalink
fix
Browse files Browse the repository at this point in the history
  • Loading branch information
staffik committed Jan 30, 2025
1 parent becbde0 commit 61ef63f
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions neps/nep-0568.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,15 +116,15 @@ This approach minimizes complexity while maintaining consistency across hot and

Since [Stateless Validation][NEP-508], all shards tracking is no longer required. Currently, shard cleanup has not been implemented. We propose a cleanup mechanism that will also handle post-resharding cleanup.

When garbage collection clears the last block of an epoch on the canonical chain, we check what shards were tracked in that epoch by checking what shards exist in `TrieChanges` at this block. In the same way, we collect shards that were tracked in following epochs up to the current one. We only remove shard State if:
- It was tracked in the old epoch (for which we just garbage collected the last block).
- It was not tracked later, it is not tracked currently, and it won't be tracked in the next epoch.
When garbage collection removes the last block of an epoch from the canonical chain, we determine which shards were tracked during that epoch by examining the shards present in `TrieChanges` at that block. Similarly, we collect information on shards tracked in subsequent epochs, up to the present one. A shard State is removed only if:
- It was tracked in the old epoch (for which the last block has just been garbage collected).
- It was not tracked in later epochs, is not currently tracked, and will not be tracked in the next epoch.

To support resharding, we track `ShardUId` prefixes instead of tracked shards. A parent shard's State remains until it is no longer referenced in `DBCol::StateShardUIdMapping` by any descendant shard. And when we stop tracking all descendant shards, then we clean up the parent shard's (and all its descendants) State, and remove all mappings to the parent from `DBCol::StateShardUIdMapping`.
To ensure compatibility with resharding, instead of checking tracked shards directly, we analyze the `ShardUId` prefixes they use. A parent shard's state is retained as long as it remains referenced in `DBCol::StateShardUIdMapping` by any descendant shard. Once all descendant shards are no longer tracked, we clean up the parent shard's state (along with its descendants) and remove all mappings to the parent from `DBCol::StateShardUIdMapping`.

#### Negative refcounts

Note that some trie keys (e.g. `TrieKey::DelayedReceipt`) are duplicated among children shards, but the corresponding State is not duplicated. `DBCol::State` column is reference counted, which means some data will be reference counted once, but referenced by both children. That would result in negative refcounts when the data is later removed by both children. To mitigate it, we change the RocksDB `refcount_merge` behavior so that negative refcounts are clamped to 0.
Some trie keys (e.g. `TrieKey::DelayedReceipt`) are shared among child shards, but their corresponding State is not duplicated. The `DBCol::State` column uses reference counting, meaning some data is counted only once, despite being referenced by multiple child shards. This can result in negative refcounts when the data is later removed. To mitigate this, we modify the RocksDB `refcount_merge` behavior so that negative refcounts are clamped to 0.

### Stateless Validation

Expand Down Expand Up @@ -535,7 +535,7 @@ fn set_shard_uid_mapping(&mut self, child_shard_uid: ShardUId, parent_shard_uid:
}
```

When a node stops tracking all descendants of a shard, garbage collection would eventually clear the last block of the last epoch where the last descendant was tracked. The descendant would then appear in the result of:
When a node stops tracking all descendants of a shard, garbage collection will eventually clear the last block of the last epoch in which the last descendant was tracked. The descendant will then appear in the result of:

```rust
fn get_potential_shards_for_cleanup(..., last_block_of_gced_epoch) -> Result<Vec<ShardUId>> {
Expand All @@ -552,7 +552,7 @@ fn get_potential_shards_for_cleanup(..., last_block_of_gced_epoch) -> Result<Vec
}
```

Then, `gc_state()` would be called, it would map the descendant `ShardUId` to the parent `ShardUId`, so that now the parent shard is a potential shard for cleanup. Then we would detect that since `gced_epoch` nobody used the parent `ShardUId` as a database key prefix, so we can remove the State under this prefix (including parent and all descendants) and associated entries from `DBCol::StateShardUIdMapping`.
Then, `gc_state()` is called, mapping the descendant `ShardUId` to the parent `ShardUId`, making the parent shard a candidate for cleanup. We then detect that since `gced_epoch`, the parent `ShardUId` has not been used as a database key prefix. As a result, we can safely remove the state under this prefix (including the parent and all descendants) along with the associated entries from `DBCol::StateShardUIdMapping`.

```rust
fn gc_state(potential_shards_for_cleanup, gced_epoch, shard_tracker, store_update) {
Expand Down

0 comments on commit 61ef63f

Please sign in to comment.