More lattice technology updates

mikera · mikera · commit 34da14381278 · 2024-09-24T11:49:25.000+01:00
diff --git a/docs/overview/lattice.md b/docs/overview/lattice.md
@@ -93,18 +93,26 @@ Lattice values are a map of public keys to signed and timestamped metadata descr
 
 The P2P lattice operates in a manner similar to [Kademlia](https://en.wikipedia.org/wiki/Kademlia), allowing the location of arbitrary peers on the Internet without depending on any decentralised location service. In the Kademlia model, peers only need to store metadata for other peers that they are relatively "near" to in cryptographic space, making this a highly efficient and fault-tolerant decentralised service.
 
-### Efficiency and scalability
+## Efficiency and scalability
 
-How do we make a global, decentralised data structure of unlimited scale fast (or even possible)? There are a number of key engineering ideas here.
+How do we build a global, decentralised data structure of unlimited scale? How do we make it fast? Or even feasible? 
 
-**Structural sharing** - in immutable persistent data structures means that when changes to a large lattice value are made, a new lattice value is produced which shares most of its data with the original value. This means that storage and processing is only required 
+There are a number of key engineering ideas here. We've been building and stress-testing lattice technology for 5+ years which has given use some unique implementation advantages and insights:
 
-**Delta transmission** - building upon structural sharing, it is possible to only transmit the deltas (changes) when a new lattice value is communicated. This assumes that the recipient will have the original data, but this is a good assumption if they are already participating in the same lattice (and if they don't they can simply acquire it again...). This means that network / communication requirements are also only proportional to the number of changes
+**Structural sharing** - using immutable [persistent data structures](https://en.wikipedia.org/wiki/Persistent_data_structure) means that when changes to a large lattice value are made, a new lattice value is produced which shares most of its data with the original value. This means that storage and processing is only required.
 
-**Merge coalescing** - A node may receive multiple lattice values from different sources in a short amount of time. With a series of repeated merges, it produces a new lattice value incorporation all of these updates. It then only needs to produce and transmit one new lattice value (typically with a much smaller much smaller delta than the sum of those received). This coalescing behaviour therefore automatically reduces traffic and scales the load to the amount that nodes can handle.
+**Selective attention** - nodes may select whichever subsets of the lattice they are interested in handling on a self-sovereign basis. This means that participants can scale their resource usage based on their own needs. For example, a Convex peer operator might elect only to participate in the the Convex consensus lattice and a small subset of DLFS drives representing data that the peer operator needs to access and maintain.
 
-**Embedded encodings** - Merkle trees have the disadvantage that they require the computation of a cryptographic hash at every branch of the tree. This can become expensive with a large number of small branches, so the Lattice makes use of a novel efficient encoding scheme that compresses multiple values into a single Merkle tree branch (but while still maintaining the property of having a unique encoding with a content-addressable hash). 
+**Delta transmission** - building upon structural sharing, it is possible to only transmit the deltas (changes) when a new lattice value is communicated. This assumes that the recipient will have the original data, but this is a good assumption if they are already participating in the same lattice (and if they don't they can simply acquire it again...). This means that network / communication requirements are only ever (at most)proportional to the number of changes made in regions of the Lattice that a specific node has chosen to participate in.
+
+**Merge coalescing** - A node may receive multiple lattice values from different sources in a short amount of time. With a series of repeated merges, it produces a new lattice value incorporation all of these updates. It then only needs to produce and transmit one new lattice value (typically with a much smaller much smaller delta than the sum of those received). This coalescing behaviour therefore automatically reduces traffic and scales the load to the amount that nodes can individually handle (typically, network transmission bandwidth will be the primary bottleneck since local lattice value merges are very fast).
+
+**Embedded encodings** - Merkle trees have the disadvantage that they require the computation of a cryptographic hash at every branch of the tree. This can become expensive with a large number of small branches, so the Lattice makes use of a novel efficient encoding scheme (outlined in [CAD003](../cad/003_encoding/README.md)) that compresses multiple embedded values into a single Merkle tree branch (while still maintaining the important property of having a unique encoding with a content-addressable hash). Typical branches might be around 1000 bytes on average (and never less than 141 bytes), which ensures efficiency from a hashing perspective and also keeps overall storage requirements near-optimal.
+
+**Branching factor** - there is a trade-off with branching factors in Merkle trees. Too low, and your tree becomes excessively deep with a lot of extra intermediate hashes to store and compute. Too high, and the encoding of a single branch becomes large, meaning that small changes result in a lot of redundant copying. Lattice values are optimised to provide efficient branching ratios for different use cases (typically ~10). In all cases: The number of branches, encoding size and cost of navigating to a direct child branch are guaranteed to be `O(1)` by design.
 
 **Orthogonal persistence** - Lattice values can exist in memory on other storage media (typically local disks). From a developer perspective, these are effectively identical, there is no need to treat in-memory and externally stored values differently. However, values are efficiently loaded from storage and cached on demand, so that most of the time the lattice behaves like a very fast in-memory database despite being potentially much larger than local RAM.
 
-**Fast comparison** - Lattice values enable some extremely quick comparison algorithms which lattice technology fully exploits. Most simply, checking the identity of any two values is simply the comparison of two cryptographic hashes, which can be done in `O(1)` time. More sophisticated comparisons include computing differences between lattice data structures (typically `O(n)` or `O(n log n)` where `n` is the number of changes). 
+**Fast comparison** - Lattice values enable some extremely quick comparison algorithms which lattice technology fully exploits. Most simply, checking the identity of any two values is simply the comparison of two cryptographic hashes, which can be done in `O(1)` time. Perhaps surprisingly, computing the common prefix of two vectors of arbitrary length is also just `O(1)`, which is heavily exploited to compare transaction orderings efficiently in CPoS. More sophisticated comparisons include computing differences between multiple lattice data structures (typically `O(n)` or `O(n log n)` where `n` is the size of differences). It is thanks to these comparison algorithms that we are able to implement extremely fast lattice merge operations.
+
+**Garbage collection** - lattice values work *extremely* well with a model of lazy garbage collection. Technically, you can keep lattice values as long as you like (they are immutable and content-addressable after all, so never go stale). However, sooner or later you are likely to hit storage constraints. In this case, you can simply retain the subset of the lattice(s) you are interested in as identified by current root value(s) and discard all other values. This works both for both in-memory caches (e.g. leveraging the JVM GC) and long term storage (e.g. `convex etch gc`).