Skip to content

Latest commit

 

History

History
419 lines (301 loc) · 10.4 KB

format.adoc

File metadata and controls

419 lines (301 loc) · 10.4 KB

Mendoza patch format

Introduction

A Mendoza patch is quite different from a patch produced by tools like diff, and to understand Mendoza it helps to understand why diff produces patches in the way it does. A patch produced by diff

  • … is made for a human to read and understand. Therefore it’s based on simple operations (keep text, insert text, delete text).

  • … can be applied even if the source has been changed a bit. This is accomplished by including parts of the context around every part.

  • … is designed for text, not structured documents.

Mendoza on the other hand (as mentioned in the README) is designed to be consumed by computers which works on exact versions of documents. As such, Mendoza has more parallels to compression algorithms than to diffing algorithms: Compression algorithms are all about being able to reconstruct the target and less about describing changes. You can look at a Mendoza patch as a program which executes with the left-side as input and produces the right-side as output, and a Mendoza decoder is a virtual machine which runs this program.

Patch format

Conceptually, a Mendoza patch is a list of operations:

type Patch = Operation[]

type Operation = {
  opcode: Opcode
  params: Param[]
}

type Opcode = int8

type Param = string | uint | JSON

Every operation is identified by an opcode (an 8-bit number) and has a fixed number of parameters. Parameters are either strings, positive numbers, or JSON values. Some sequences of operations are very common (e.g. a PushField followed by Copy) and therefore Mendoza also includes composite operations (e.g. PushFieldCopy). Composite operations are merely shortcuts for multiple primitive operations to make the patches a bit smaller.

JSON representation

To minimize the space, Mendoza uses a single flat array when representing a patch as JSON. Each operation is encoded with its opcode followed by its parameters.

A Mendoza patch in JSON representation
[
  18,      // DeleteField
  0,       //   … at index 0
  10,      // PushFieldCopy
  0,       //   … at index 0
  14,      // ReturnIntoObjectPop
  "name"   //   … with "name"
]

Execution model

Applying a patch involves executing the operations while maintaining the following state:

1. State used in a Mendoza decoder
  • An input stack, used for traversing the left document. Every entry of the input stack also stores the key of where it came from.

  • An output stack, used for producing the right document.

2. Decoding of a Mendoza patch
  • Place the left document on the input and output stack.

  • Execute each operation.

  • The top value on the output stack is now the result (i.e. the right document).

Note that this means an empty patch (i.e. no operations) will produce a right document which is equivalent to the left document.

Examples / tutorial

Here are some examples which also serves as a mini tutorial.

Step-by-step example

Let’s look at the following Mendoza patch

  • ObjectDeleteField("name")

  • ObjectSetFieldValue("age", 30)

  • PushFieldCopy("name")

  • ReturnIntoObjectPop("fullName")

applied on the following document

{
  "name": "Michael Bluth",
  "age": 20,
}
Initial state
  • Input stack: [root]

  • Output stack: [root]

After applying ObjectDeleteField("name")
  • Input stack: [root]

  • Output stack: [{"age": 20}]

After applying ObjectSetFieldValue("age", 30)
  • Input stack: [root]

  • Output stack: [{"age": 30}]

After applying PushFieldCopy("name")
  • Input stack: [root, "Michael Bluth"]

  • Output stack: [{"age": 30}, "Michael Bluth"]

After applying ReturnIntoObjectPop("fullName")
  • Input stack: [root]

  • Output stack: [{"age": 30, "fullName" "Michael Bluth"}]

Starting from scratch

Sometimes it’s better to start with a blank object and copy over the fields you need:

  • Blank()

  • ObjectCopyField("name")

  • ObjectSetFieldValue("age", 30)

Pushing fields

The PushField operation is used for entering fields in objects.

The following example will modify the zip code in a nested object:

  • PushFieldCopy("user")

  • PushFieldCopy("address")

  • SetFieldValue("zip", 1234)

  • ReturnIntoObjectSameKeyPop()

  • ReturnIntoObjectSameKeyPop()

Note that entering a field remembers the key where it come from, which ReturnIntoObjectSameKey() then uses to set it.

Dealing with arrays

Arrays are typically dealt with by pushing with a blank value and then using ArrayAppendValue and ArrayAppendValue. ArrayAppendSlice refers to indices in the old array (e.g. the input value).

  • PushFieldBlank("skills")

  • ArrayAppendSlice(0, 2)

  • ArrayAppendValue("Go")

  • ReturnIntoObjectSameKeyPop()

List of primitive operations

In this section we’ll use these additional terms:

  • The input value is the value at the top of the input stack.

  • The output value is the value at the top of the output stack.

Value operation

Parameters
  • value: JSON

The Value operation pushes a new value onto the output stack.

Copy operation

Parameters

None

The Copy operation pushes the input value onto the output stack.

Blank operation

Parameters

None

The Blank operation pushes an empty value onto the output stack. This empty value will be treated as either a string, array, or object depending on the next operations.

ReturnIntoArray operation

Parameters

None

The ReturnIntoArray operation takes the current output value, pops the output stack, and then pushes it onto the new output value (i.e. the value before it in the stack). The new output value must be an array.

ReturnIntoObject operation

Parameters
  • key: string

The OpReturnIntoObject operation takes the current output value, pops the output stack, and then stores it on the new output value (i.e. the value before it in the stack) with the given key. The new output value must be an object.

ReturnIntoObjectSameKey operation

Parameters

None

The OpReturnIntoObjectSameKey operation first finds the key that was used to push the current input value (see PushField), then it takes the current output value, pops the output stack, and stores it on the new output value (i.e. the value before it in the stack) with the given key. The new output value must be an object.

PushField operation

Parameters
  • keyIdx: uint

The PushField operation looks up a field in the input value (which must an object), and then pushes the value onto the input stack. keyIdx refers to the nth key (after you sort them lexically) in the object. The key is stored together with the value in the input stack so that ReturnIntoObjectSameKey can access it later.

PushElement operation

Parameters

None

The PushElement operation looks up an element in the input value (which must an array), and then pushes the value onto the input stack.

PushParent operation

Parameters
  • pos: uint

The PushParent operation looks up a value earlier in the input stack and pushes it onto the input stack. pos=0 pushes the parent, pos=1 pushes the grand parent, and so forth. There’s no way of duplicating the current input value.

Pop operation

Parameters

None

The Pop operation pops the input stack.

ObjectDeleteField operation

Parameters
  • key: string

The ObjectDeleteField operation deletes a field in the output value (which must be an object).

ArrayAppendValue operation

Parameters
  • key: JSON

The ArrayAppendValue operation appends a JSON value to the output value (which must be an array).

ArrayAppendSlice operation

Parameters
  • left: uint

  • right: uint

The ArrayAppendSlice operation slices the input value (which must be an array) and appends it to the output value (which must also be an array). The left index is inclusive and the right index is exclusive (i.e. left=3, right=5 slices two values).

StringAppendString operation

Parameters
  • value: string

The StringAppendString operation appends a string value to the output value (which must be a string).

StringAppendSlice operation

Parameters
  • left: uint

  • right: uint

The StringAppendSlice operation slices the input value (which must be a string) and appends it to the output value (which must also be a string). The left index is inclusive and the right index is exclusive (i.e. left=3, right=5 slices two values). The indices refers to byte offsets in UTF-8 encoding.

Overview over operations with opcodes

Opcode (8-bit) Name Type Description

0

Value

Output

1

Copy

Output

2

Blank

Output

3

ReturnIntoArray

Output

4

ReturnIntoObject

Output

5

ReturnIntoObjectSameKey

Output

6

PushField

Input

7

PushElement

Input

8

PushParent

Input

9

Pop

Input

10

PushFieldCopy

Composite

PushField + Copy

11

PushFieldBlank

Composite

PushField + Blank

12

PushElementCopy

Composite

PushElement + Copy

13

PushElementBlank

Composite

PushElement + Blank

14

ReturnIntoObjectPop

Composite

ReturnIntoObject + Pop

15

ReturnIntoObjectSameKeyPop

Composite

ReturnIntoObjectSameKey + Pop

16

ReturnIntoArrayPop

Composite

ReturnIntoArray + Pop

17

ObjectSetFieldValue

Composite

Value + ReturnIntoObject

18

ObjectCopyField

Composite

PushField + Copy + ReturnIntoObjectSameKey + Pop

19

ObjectDeleteField

Output

20

ArrayAppendValue

Output

21

ArrayAppendSlice

Output

22

StringAppendString

Output

23

StringAppendSlice

Output