TypeCellOS · nperez0111 · Mar 10, 2025 · Mar 10, 2025 · Mar 10, 2025 · Mar 10, 2025
diff --git a/adr/2025_03_10-suggestions/README.md b/adr/2025_03_10-suggestions/README.md
@@ -0,0 +1,172 @@
+# Versioning & Suggestions
+
+Due to the similarity between document versioning, red lining, track changes and suggestions, it is worth seeing an overview of all of their requirements to make sure we consider solutions that resolve multiple of them.
+
+- A diff between two versions of a document, is essentially a set of suggestions that takes your previous document, and transforms it into the more recent document.
+- Red-lining and suggestions are the same, red-lining just allows you to attach metadata to the suggestion, like a comment of why the change was made.
+- Track changes, is entering a mode where any edit you make, becomes a suggestion to the document.
+
+## Versioning Overview
+
+Versioning has already been implemented for Y.js & y-prosemirror, but to be explicit about the sort of thing we are looking for it is good to explicitly state the feature set versioning should have:
+
+- A user can take _snapshots_ of the current document state, and store that in a separate data store
+- These snapshots can be viewed independently at any point in time later
+- These snapshots can be _compared_ with each other
+  - This allows the user to see what changed between the two snapshots
+  - Changes should show:
+    - The content that was changed (i.e. additions or deletions)
+    - Updates to content (i.e. changing an attribute)
+    - Which user identifiers modified the content
+    - Ideally, a timestamp of when the content was modified
+    - Changes should only show content that was present in both documents (i.e. if some content was added and subsequently deleted in some version between the two snapshots, because it is not present in the final document of either snapshot, it should not show)
+  - These changes can be visualized in a number of ways: as a unified diff, split diff, or even stacked diff. _See [Appendix](#displaying-changes) for more information_
+
+**Prior Art:**
+
+- Google Docs [Video here]
+
+### Suggestions Overview
+
+Suggestions are the ability to make changes to a document and have those changes tracked separately from the document content. This allows modifications to the document in a way that can be audited, and either approved or rejected by another party.
+
+- A user can turn on a mode, suggestion mode, which makes all of their edits to a document be tracked, but stored separately from the document itself.
+  - The original document should be unmodified when suggestion mode is turned back off.
+- These stored changes, can be reviewed, and either approved or rejected separately.
+  - Approving a suggestion should apply the change to the document, Rejecting should drop that suggestion
+- A suggestion may become "invalid" if the document that the change has been modified in a way that the change can no longer be applied (e.g. an insertion into a deleted paragraph, or a deletion of an already deleted paragraph)
+- A suggestion should store:
+  - The content of the suggestion based on the type of suggestion
+  - **insertions:** where to insert the change, the content to insert
+  - **deletions:** where to delete the content, the content to be deleted
+  - **updates:** where to update the content, what key to update, what value to set the key to
+  - A timestamp of when the suggestion was proposed
+  - Which user proposed the suggestion
+  - An optional thread identifier, for discussion around why the change was made
+
+**Prior Art:**
+
+- Google Docs: [video here]
+
+### Similarities
+
+Versioning and suggestions have quite a bit of overlap between them, they both rely on having a set of changes which store:
+
+- user: who made the change (the change's attribution)
+- timestamp: of when the change was made
+- content: what changed & it's location in the document
+
+Ideally, we could re-use some of the logic and data structures between them for both features, since they overlap in representation and visualization to users.
+
+## What we need to decide
+
+### How should Y.js snapshots be compared
+
+Y.js already has versioning in the form of snapshots of the Y.js document, at a point in time. In Y.js, the snapshots have the nice property of having some form of a history associated to those edits. To figure out what changed, you could calculate a diff between the two histories of the document. Doing a comparison based on the document content alone, would not meet our requirement of being able to display who made the change and when. Luckily, in Y.js, there is additional metadata that we can extract from Y.js which can tell us which `clientID` made the change (`clientID` being an ID a user assumes when making edits to a document). More on how to calculate that diff here: [Diff two Y.js Snapshots](./diff-yjs-snapshots.md)
+
+This would be the most straightforward way to capture changes across versions of a document. The question then becomes how to display those changes within a prosemirror-based editor, which sort of depends on the approach taken for suggestions.
+
+### How should suggestions be stored
+
+There are a few different mental models we can use to represent or derive these changes:
+
+- [_Branched document histories_, and diffs between them](./suggestions-as-history.md)
+- [Stored _patches_ and later applying that patch to a document](./suggestions-as-patches.md)
+- [New _content types_ which represent the changes within the document content](./suggestions-as-content.md)
+
+### AttributionManager
+
+Currently, Y.js has been using a `PermanentUserData` class to keep track and store the `clientID`s that a specific user has used previously. Kevin has some ideas on an improved system for this, which he can elaborate on in detail, later. This will be crucial to how we can attribute users in prior versions of the document.
+
+If I were to share my wish-list for features this should have, it would be able to answer questions like:
+
+- Given a `clientID`, what `userID` made this change?
+- At what time was this `clientID` first used?
+  - This gives approximate times for attributing content to a time, without having to store an additional timestamp per change).
+  - Most people don't need the specific time, they just want to know relative time "Is this days, months or years old?"
+  - Auditing should be done at a different level, and not in a user writable data structure like a YDoc
+
+I'd like to have Kevin give more details on the sort of features/properties we can expect from this. It will be a crucial feature to the implementation, and likely to be worked on first since it will be foundational to how the versioning works.
+
+## Summary
+
+This document was purposefully written from the perspective that Y.js is one of many document storage and synchronization mechanisms. And, we would like to keep our editor de-coupled from these specific implementation details as much as possible to encourage code re-use between different implementations.
+
+## Appendix
+
+This is a collection of scattered thoughts on the _diff_-erent aspects of diffing  (sorry, had to), and their complications.
+
+### Diffing HTML
+
+Diffing HTML is fundamentally different than diffing by the semantics of a document. This, is probably best described in an example:
+
+Take the following document:
+
+```html
+<p>Some text</p>
+<p>More text</p>
+```
+
+Let's say that the user deletes the range of text from "text" and collapses the two paragraphs into 1 paragraph, resulting in this document:
+
+```html
+<p>Some More text<p>
+```
+
+If we examine the changes, this is actually multiple operations:
+
+- deletion of the word "text"
+- insertion of the words "More text" at the position where "text" was
+- deletion of 2nd paragraph
+
+If we were to represent this change using HTML elements it was meant to look like this:
+
+```html
+<p>Some <del>text<p>
+<p></del>More text</p>
+```
+
+Yet, a diff would represent it like this:
+
+```html
+<p>Some <del>text</del><ins>More text</ins></p>
+<del><p>More text</p></del>
+```
+
+They ultimately result in the same in the final document but, the difference is in their semantics and user expectations. There are probably ways to simplify these changes in such a way that it is closer to what the user actually intended with the change (e.g. Prosemirror's [simplifyChanges](https://github.com/ProseMirror/prosemirror-changeset)), but it gets complicated rather quickly and we have to face that it may not always be easy to represent/display.
+
+### Displaying changes
+
+There are actually a few different ways to display changes in a document. Above, I was describing changes in what is called a _unified diff_. where changes are inter-leaved into the document using inline content and markers for where the inserted and deleted content exists. But, that is just one way to display a diff, you can also display diffs using a _split diff_ like so:
+
+```txt
+❯ delta a b 
+1/a ⟶   2/b
+──────────────────────────────────────────────────────
+
+───┐
+1: │
+───┘
+│  1 │<p>Some text</p>     │    │
+│  2 │<p>More text</p>     │    │
+│    │                     │  1 │<p>Some More text</p>
+```
+
+And, there is _stacked diffs_, too!
+
+```txt
+❯ diff a b   
+1,2c1
+< <p>Some text</p>
+< <p>More text</p>
+\ No newline at end of file
+---
+> <p>Some More text</p>
+\ No newline at end of file
+```
+
+The nice thing about these different ways of displaying diffs, is that they are better at representing certain kinds of changes (especially when they are colorized unlike these examples). For example, for block-level changes, the unified diff seems to capture this change the best, it becomes clear that two lines became one.
+
+So, there are multiple ways to display diffs, and they can be advantageous in different situations. Ideally, our solution would have flexibility in displaying changes in any of these ways.
+
+> As a matter of practicality, we are also looking into using these different ways of displaying diffs for things like displaying LLM responses, like if an LLM were to rewrite some content, it could display in a split diff view to show the changes the LLM actually made. So, this is not just a theoretical want, but something we would want to build this year.
diff --git a/adr/2025_03_10-suggestions/diff-yjs-snapshots.md b/adr/2025_03_10-suggestions/diff-yjs-snapshots.md
@@ -0,0 +1,32 @@
+
+# Diff two Y.js snapshots
+
+Y.js keeps track of metadata around what changes have been made to a document in order to efficiently synchronize changes. So, to compare a document with it's previous version, you can leverage this metadata by taking a snapshot of the full history of a document, and compare the insertions and deletions that have been made between the two snapshots. Because they share history, you can figure out what has been modified from _this point_ in history compared to _that point_ in history. For clarity, let's call this a `y-diff` for now to make sure it is different than a full content diff.
+
+## Changes as _attributes_
+
+Kevin has already implemented a preliminary version of the `y-diff` between two documents. Specifically, by using `ychange` attributes to add metadata onto elements to indicate whether they were modified between the two versions. This approach works for simple documents at the moment, with some limitations around:
+
+- Certain features are missing (due to this being preliminary work) like
+  - XYZ
+- Controlling the display of the diff:
+  - y-prosemirror is controlling how the diffs are rendered so doing something like a stacked diff would not be possible
+  - Trying to simplify changes to a more human understandable version as described in the HTML diffing section would not be possible, since y-prosemirror directly writes the changes it finds to elements
+- Inability to edit content while displaying diffs
+  - For something like suggestions to work in the future, there needs to be a way to modify the document & still present the suggestions that have been made
+
+## Changes as _marks_
+
+Another approach, would be to model what is currently being represented as attributes, but instead use marks similar to [`@handlewithcare/prosemirror-suggest-changes`'s schema](https://github.com/handlewithcarecollective/prosemirror-suggest-changes?tab=readme-ov-file#schema) where the content changes are represented as marks. This would allow flexibility in display (because of support for [markviews](https://prosemirror.net/docs/ref/#view.MarkView) which can decouple content from presentation).
+
+Though, I question whether it _should_ be the responsibility of `y-prosemirror` to be both finding the difference and modifying the content of the document to display those differences. It feels like `y-prosemirror` should be scoped to the integration of yjs & prosemirror, and this is somewhat out of scope for it. I would be okay with `y-prosemirror` providing utilities to find the `y-diff` between documents in the y-prosemirror format, but the responsibility of how that integrates with prosemirror should be the responsibility of the caller of that function, rather than, built into the `y-sync` plugin.
+
+## Changes as a _list of diffs_
+
+One approach for this would be for y-prosemirror to provide the `y-diff` between two documents as a list of modifications, which can then be interpreted by another program to choose how to display or render those sets of changes. This is similar in approach to the snapshot compare extension that I implemented.
+
+This is non-trivial, because the shape of the document is nested and almost arbitrary. Difficult, but doable - this is exactly the sort of approach that [Snapshot Compare feature](https://tiptap.dev/docs/collaboration/documents/snapshot-compare#page-title) works on. It diffs two snapshots of Y.js documents, and is able to create a list of diffs between them, which can be displayed within the editor content. But, this feature is view-only, how would we be able to apply only specific segments?
+
+> Also, it is worth noting that this feature not only did the diff based on the document content, but actually by diffing the Y.js data structure itself, to attribute specific changes to specific users.
+
+There is a neat little trick in how these diffs are generated, it was to not just a list of changes, but actually as granular Prosemirror steps. Why do it like this? This means, that to apply only a single diff, would be as easy as applying only a single step of a transform! I never actually got around to implementing that part of the feature for time constraints, but that should 'just work' ™. Another nice property of this, is that, theoretically, you could also replace `y-sync` to do diffs with the new update that came in and have granular updates to a prosemirror document, which gives a much nicer experience for prosemirror devs whose plugins do not have to be aware of Yjs replacing the document content on each update.
diff --git a/adr/2025_03_10-suggestions/images/doc+suggestions.png b/adr/2025_03_10-suggestions/images/doc+suggestions.png
diff --git a/adr/2025_03_10-suggestions/images/doc+suggestions_merged.png b/adr/2025_03_10-suggestions/images/doc+suggestions_merged.png
diff --git a/adr/2025_03_10-suggestions/images/doc+suggestions_patch.png b/adr/2025_03_10-suggestions/images/doc+suggestions_patch.png
diff --git a/adr/2025_03_10-suggestions/suggestions-as-content.md b/adr/2025_03_10-suggestions/suggestions-as-content.md
@@ -0,0 +1,22 @@
+# Storing changes as _content_ within the document
+
+Another way to represent changes to content is actually to embed markers of where the change was made directly into the content. This is actually how Microsoft Office implements their [Tracked Changes feature](https://www.oxygenxml.com/doc/versions/27.0/ug-editor/topics/author-managing-changes.html), which should give you an idea of the viability of such an approach. By storing the changes directly in the document, you have flexibility on whether to display the changes, and accept or reject them, since all states are present already in the document content.
+
+## Implementation details
+
+This would introduce 3 new marks for content in the editor, `insertion`s `deletion`s and `modification`s. When entering suggestion mode:
+
+- a change which would normally result in a deletion is instead marking that content with the `deletion` mark
+- an insertion would both insert the new content, but also wrap it with a `insertion` mark
+- any change to attributes would mark the content with a `modification`
+There will be additional metadata that could be tracked as attributes to these marks too, such as timestamp, or even a reference to a comment.
+
+## Pros
+
+- [There is prior art](https://github.com/handlewithcarecollective/prosemirror-suggest-changes) that we can leverage (and [this one](https://github.com/Atypon-OpenSource/manuscripts-track-changes-plugin) and [this one](https://github.com/davefowler/prosemirror-suggestion-mode/) and [this one](https://gitlab.coko.foundation/wax/wax-prosemirror/-/blob/master/wax-prosemirror-core/src/utilities/track-changes/trackedTransaction.js?ref_type=heads))
+- Importing from something like DOCX, would be a 1-1 mapping as you can just map the DOCX insertions and deletions to our equivalents
+
+## Cons
+
+- Adds complexity to displaying content in the editor, may need to rely on things like decorations to hide content we do not want to display
+- Since it is not a separate document, the permissions have to be the same as having write permissions to the document
diff --git a/adr/2025_03_10-suggestions/suggestions-as-history.md b/adr/2025_03_10-suggestions/suggestions-as-history.md
@@ -0,0 +1,31 @@
+# Storing changes as separate _document histories_ and diffs between them
+
+Branched document histories is the idea that you can think of a suggestion of the document as having a "different history" of a document; where, a user made some changes to a copy of the document, which can later be merged back into the document somehow.
+
+![Branched document histories](./images/doc+suggestions.png)
+
+This would work very similarly to how the version diffing would work, and is intended to mostly re-use that logic.
+
+## Implementation
+
+To implement this, each user who suggests changes is given a separate document which they would have access to modify the contents of. This allows the user to own that document, while guaranteeing the original document stays unmodified.
+
+![Branched document histories merged](./images/doc+suggestions_merged.png)
+
+When viewing what suggestions are available on a document, the frontend would have to "collect" the suggestions from each suggestion document, apply an updates those documents may not have seen yet, diff the documents to see what changes are in the suggestion document compared to the original document and finally display those changes to the user in the original document somehow.
+
+## Pros
+
+- Allows reuse of the version comparison logic
+- A suggestion-only user is easy to implement, since they can be given access to a copy of the Ydoc, without affecting the original document
+- Guaranteed that a suggestion cannot affect the original document without any server-side inspection of the content
+
+## Cons
+
+- Requires diffing an entire YDoc to derive what the change actually was
+  - Performance might be an issue especially since it scales by number of users who suggest
+- Unclear whether to diff the content at Y.js level or to diff the JSON document
+  - If diffing at the Y.js level, not everyone uses Y.js and we may not want to pay that bundle/complexity cost
+  - If diffing on the JSON, are we just recreating the version snapshot comparison and duplicating the work involved?
+- Complicated to setup
+  - Storage of the branched versions need to be kept up to date (i.e. must apply new updates since the branch) and need to store that document separately