Mutable Data (Naming, Real-Time, Guarantees)

Short Description

In one sentence or paragraph.

Enabling a multitude of different patterns of interactions between users, machines and both. In other words, what are the essential primitives that must be provided for dynamic applications to exist, what are the guarantees they require (consistency, availability, persistancy, authenticity, etc) from the underlying layer in order create powerful and complete applications in the Distributed Web.

Long Description

Mutable Data in a Permanent Datastore? WAT?! If this is your reaction, don't despair, we've been there too. However, the good news is that achieving mutability over immutable datastores is not a new idea; in fact, if you are checking this document through Git/GitHub, you are experiencing just that.

The trick is to use pointers. While each data structure has its own reference (in the case of IPFS, that reference will be immutable and represented by a Content Identifier (CID)), we can achieve mutability by creating a meta data structure, known as a record, that points to the latest version of the data that the user is requesting. This is analogous to a developer updating the master branch to point to the latest commit (e.g. when a pull request is merged).

Mutable Data was an essential component of the Web 2.0 revolution, when websites stopped being simple static documents and became fully fledged applications. Now, achieving scalable and secure ways to mutate data wil enable a whole set of new applications for the Web 3.0.

These applications will have different types of requirements, from millions of records being updated every second to millions of users interacting and mutating a data source. Creating a sound solution to achieve Mutable Data for the dWeb requires considering the three parts of the problem:

Update Propagation - As noted, different applications will have different requirements, e.g. real-time delivery, delivery at least/most once, confirmation of delivery, conflict resolution, etc.
Update Authentication - How to trust the parties that are propagating the updates, how to trust that they are not hiding any important update
Interface/API - Supporting a database-link experience, including APIs that enable a user to query, update and search over multiple records

All of these problems should be solved in a manner that is tolerant of unreliable connections and offline operation.

State of the Art

This survey on the State of the Art is not by any means complete, however, it should provide a good entry point to learn what are the existing work. If you have something that is fundamentally missing, please consider submitting a PR to augment this survey.

Within the IPFS Ecosystem

Existing attempts and strategies

IPNS, the InterPlanteary Naming System

IPNS is the solution baked into IPFS Core that handles naming. IPNS takes inspiration from the Self-Certified FileSystem (SFS) and uses the concept of signed records and a distributed record store for storing and sharing them. When a user gets a signed record, that user can verify its signature, check the updated pointer and resolve it accordingly. Today IPNS can be used over multiple routers, DHT, PubSub and Cloudfare WebWorkers.

https://docs.ipfs.io/guides/concepts/ipns

dnslink

DNSLink leverages the DNSSystem to store a TXT record that contains a hash. This is equivalent to having an A record that stores an IP, in this case, the TXT record is resolved by an IPFS node, fetching the hash and resolving the content for the user. All IPFS project websites are loaded through this naming system (e.g. https://ipfs.io)

https://docs.ipfs.io/guides/concepts/dnslink

libp2p's floodsub/gossipsub

libp2p currently offers two PubSub implementations, floodsub and gossipsub.

https://github.com/libp2p/specs/blob/master/pubsub/README.md
https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/README.md

peer-base's CRDTs implementations

CRDTs are an extraordinary building block for data mutability without requiring central mediation. The field of CRDT is vast and there has been many developments. Below you can find some of the implementations done to work with IPFS.

https://github.com/ipfs-shipyard/js-delta-crdts
https://github.com/ipfs-shipyard/peer-crdt
https://github.com/ipfs-shipyard/peer-crdt-ipfs
https://orbitdb.org/

Textile's Hub, ThreadsDB and Powergate

Textile offers a set of tools for creating decentralized applications built on top of IPFS, Libp2p and Filecoin. Their documentation is pretty amazing and keeps evolving, to learn more follow the links below:

https://docs.textile.io/hub/introduction/
https://docs.textile.io/threads/introduction/
https://docs.textile.io/powergate/introduction/
Tutorial on how to use Textile form IPFS Camp (Outdated)

Within the broad Research Ecosystem

How do people try to solve this problem?

There are multiple decades of Research and Projects to solve this Open Problem. Don't feel discouraged by the sheer amount of articles, it is really high valuable work!

Data Structures
Access Control
Applications & Libraries

Known shortcomings of existing solutions

What are the limitations on those solutions?

Given the sheer amount of solutions and experiments to solve the multiple challenges of this Open Problem, one common shortcoming is actually a qualitative and quantitative assessment of what is possible today and how these stack up to existing centralized versions and if they are capable or not to take on the load to fulfil the needs of the multiple use cases.

Solving this Open Problem

What is the impact

Mutable Data is a fundamental building block for creating applications. Improving the existing solutions and creating new ones that support a large user set in a distributed context will enable the growing number of Internet users to be part of the dWeb.

What defines a complete solution?

What hard constraints should it obey? Are there additional soft constraints that a solution would ideally obey?

We are led to believe that a complete solution for the Mutable Data Open Problem is not only one, but a set of solutions that are capable of adjusting for the type of use case.

Rach of the solutions proposed should present and repetably demonstrate how it handles the most typical use cases and how they handle different loads under multiple network. And overview of these typical use cases are:

writers (i.e. publishers) to readers (i.e. consumers)
- 1 to many - Blog
- some to many - Code Repository
writers/readers to writers/readers
- 1 to 1 - Chat
- few to many - Collaborative Documents
- many to many - Social Network, Forums, Chat Rooms

More use cases defined at https://github.com/libp2p/research-pubsub/blob/master/USECASES.md

In addition, a solution should also contemplate answers to the following questions:

What are the guarantees that users can expect?
Does the system rely in a centralized compontent for it to work?

Additionally, we leave here a set of questions to spark the thinking:

What form do Access Control and/or Capabilities take in a context non mediated by a centralized point?
Should the system allow for these Access Control and/or Capabilities to be mutable, transferable or revocable?
What kind of cryptography needs to exist for access and revocation to happen?

Other

Existing Conversations/Threads

IPNS Improvement Design Exploration
DNS over IPFS
IPFS database: pubsub, consistency and persistence
Implement databases over IPFS using Persistent Balanced Trees
Petnames for multihashes
pluggable resolvers
Optimizing IPNS
Aggregation --> CRDTs discussion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MUTABLE_DATA.md

MUTABLE_DATA.md

Mutable Data (Naming, Real-Time, Guarantees)

Short Description

Long Description

State of the Art

Within the IPFS Ecosystem

IPNS, the InterPlanteary Naming System

dnslink

libp2p's floodsub/gossipsub

peer-base's CRDTs implementations

Textile's Hub, ThreadsDB and Powergate

Within the broad Research Ecosystem

Known shortcomings of existing solutions

Solving this Open Problem

What is the impact

What defines a complete solution?

Other

Existing Conversations/Threads

Extra notes

Files

MUTABLE_DATA.md

Latest commit

History

MUTABLE_DATA.md

File metadata and controls

Mutable Data (Naming, Real-Time, Guarantees)

Short Description

Long Description

State of the Art

Within the IPFS Ecosystem

IPNS, the InterPlanteary Naming System

dnslink

libp2p's floodsub/gossipsub

peer-base's CRDTs implementations

Textile's Hub, ThreadsDB and Powergate

Within the broad Research Ecosystem

Known shortcomings of existing solutions

Solving this Open Problem

What is the impact

What defines a complete solution?

Other

Existing Conversations/Threads

Extra notes