Skip to content

Latest commit

 

History

History
486 lines (404 loc) · 17.9 KB

schemas.md

File metadata and controls

486 lines (404 loc) · 17.9 KB

Schemas

Table of Contents

Overview

A Schema is a runtime representation of a dataset schema. Abstractly, a schema is a set of classes, which are analogous to tables in SQL. Each class has a key and a type. Keys are absolute URIs, and types are terms in a grammar of algebraic data types generated by the primitive types and two kinds of composite types (sums and products).

The tasl JavaScript library represents regular ES6 class Schema at the top level. To instantiate a schema using the class constructor, we need to pass in a runtime representation of each class's type. Types are represented as regular JavaScript objects in the types namespace, each discriminated by a .kind property.

declare class Schema {
  constructor(readonly classes: Record<string, types.Type>)
  count(): number
  get(key: string): types.Type
  has(key: string): boolean
  keys(): Iterable<string>
  values(): Iterable<types.Type>
  entries(): Iterable<[string, types.Type, number]>
  isEqualTo(schema: Schema): boolean
}

namespace types {
  type Type = URI | Literal | Product | Coproduct | Reference

  type URI = { kind: "uri" }
  type Literal = { kind: "literal"; datatype: string }
  type Product = { kind: "product"; components: Record<string, Type> }
  type Coproduct = { kind: "coproduct"; options: Record<string, Type> }
  type Reference = { kind: "reference"; key: string }
}

Notice that the Schema class and types namespace are two separate top-level exports. types contains TypeScript types and utility methods for working with the building blocks of schemas, while the Schema class is mostly treated as an opaque object once instantiated. This is a pattern that the other data structures follow as well - the Instance constructor takes values from the values namespace, and the Mapping constructor takes values from the expressions namespace.

Here's an example schema.

import { Schema } from "tasl"

const schema = new Schema({
  "http://schema.org/Person": {
    kind: "product",
    components: {
      "http://schema.org/name": {
        kind: "product",
        components: {
          "http://schema.org/givenName": {
            kind: "literal",
            datatype: "http://www.w3.org/2001/XMLSchema#string",
          },
          "http://schema.org/familyName": {
            kind: "literal",
            datatype: "http://www.w3.org/2001/XMLSchema#string",
          },
        },
      },
      "http://schema.org/email": { kind: "uri" },
    },
  },
  "http://schema.org/Book": {
    kind: "product",
    components: {
      "http://schema.org/name": {
        kind: "literal",
        datatype: "http://www.w3.org/2001/XMLSchema#string",
      },
      "http://schema.org/identifier": { kind: "uri" },
      "http://schema.org/author": {
        kind: "reference",
        key: "http://schema.org/Person",
      },
    },
  },
})

Type factory methods

Our example is very structured but also very verbose. The types namespace has factory methods for each kind of type that can help us simply this.

declare namespace types {
  function uri(): URI
  function literal(datatype: string): Literal
  function product(components: Record<string, Type>): Product
  function coproduct(options: Record<string, Type>): Coproduct
  function reference(key: string): Reference
}

Here's the same example schema re-written using these factory methods.

import { Schema, types } from "tasl"

const schema = new Schema({
  "http://schema.org/Person": types.product({
    "http://schema.org/name": types.product({
      "http://schema.org/givenName": types.literal(
        "http://www.w3.org/2001/XMLSchema#string"
      ),
      "http://schema.org/familyName": types.literal(
        "http://www.w3.org/2001/XMLSchema#string"
      ),
    }),
    "http://schema.org/email": types.uri(),
  }),
  "http://schema.org/Book": types.product({
    "http://schema.org/name": types.literal(
      "http://www.w3.org/2001/XMLSchema#string"
    ),
    "http://schema.org/identifier": types.uri(),
    "http://schema.org/author": types.reference("http://schema.org/Person"),
  }),
})

Standard type constants

Still, passing explicit datatype URIs into types.literal(...) for every literal type is still a huge hassle. In addition to the five factory methods for each kind of type, the types namespace also defines constants for the unit type (the product type with no components), strings, booleans, 32- and 64-bit floats, 8-, 16-, 32- and 64-bit signed and unsigned integers, byte arrays, and JSON values.

This is essentially the standard library of common types that should cover the needs of most schemas.

declare namespace types {
  const unit: Product

  const string: Literal
  const boolean: Literal
  const f32: Literal
  const f64: Literal
  const i64: Literal
  const i32: Literal
  const i16: Literal
  const i8: Literal
  const u64: Literal
  const u32: Literal
  const u16: Literal
  const u8: Literal
  const bytes: Literal
  const JSON: Literal
}

The datatypes that these literals use are from the XSD namespace, with the exception of JSON, which (confusingly) is defined in the JSON-LD spec as a term in the rdf namespace.

name datatype
string http://www.w3.org/2001/XMLSchema#string
boolean http://www.w3.org/2001/XMLSchema#boolean
f32 http://www.w3.org/2001/XMLSchema#float
f64 http://www.w3.org/2001/XMLSchema#double
i64 http://www.w3.org/2001/XMLSchema#long
i32 http://www.w3.org/2001/XMLSchema#int
i16 http://www.w3.org/2001/XMLSchema#short
i8 http://www.w3.org/2001/XMLSchema#byte
u64 http://www.w3.org/2001/XMLSchema#unsignedLong
u32 http://www.w3.org/2001/XMLSchema#unsignedInt
u16 http://www.w3.org/2001/XMLSchema#unsignedShort
u8 http://www.w3.org/2001/XMLSchema#unsignedByte
bytes http://www.w3.org/2001/XMLSchema#hexBinary
JSON http://www.w3.org/1999/02/22-rdf-syntax-ns#JSON

Here's the same example schema rewritten to use the types.string constant instead of the types.literal(...) factory.

import { Schema, types } from "tasl"

const schema = new Schema({
  "http://schema.org/Person": types.product({
    "http://schema.org/name": types.product({
      "http://schema.org/givenName": types.string,
      "http://schema.org/familyName": types.string,
    }),
    "http://schema.org/email": types.uri(),
  }),
  "http://schema.org/Book": types.product({
    "http://schema.org/name": types.string,
    "http://schema.org/identifier": types.uri(),
    "http://schema.org/author": types.reference("http://schema.org/Person"),
  }),
})

.tasl DSL

An even more concise way to instantiate schemas is to use the .tasl DSL with the parseSchema method. The DSL supports comments and URI namespaces, which dramatically improve readability.

The DSL is documented at https://tasl.io.

declare function parseSchema(input: string): Schema
import { parseSchema } from "tasl"

parseSchema(`
namespace s http://schema.org/

class s:Person {
  s:name -> {
    s:familyName -> string
    s:givenName -> string
  }
  s:email -> uri
}

class s:Book {
  s:name -> string
  s:identifier -> uri
  s:author -> * s:Person
}
`)
// Schema {
//   classes: {
//     'http://schema.org/Person': { kind: 'product', components: [Object] },
//     'http://schema.org/Book': { kind: 'product', components: [Object] }
//   }
// }

Binary codec

Schemas can be encoded and decoded from Uint8Arrays with the top-level encodeSchema and decodeSchema methods.

declare function encodeSchema(schema: Schema): Uint8Array
declare function decodeSchema(data: Uint8Array): Schema
import { parseSchema, encodeSchema, decodeSchema } from "tasl"

const schema = parseSchema(`
namespace s http://schema.org/

class s:Person {
  s:name -> string
  s:email -> uri
}
`)

encodeSchema(schema)
// Uint8Array(124) [
//     1,   1,  24, 104, 116, 116, 112,  58,  47,  47, 115,  99,
//   104, 101, 109,  97,  46, 111, 114, 103,  47,  80, 101, 114,
//   115, 111, 110,   2,   0,   2,  23, 104, 116, 116, 112,  58,
//    47,  47, 115,  99, 104, 101, 109,  97,  46, 111, 114, 103,
//    47, 101, 109,  97, 105, 108,   0,   4,  22, 104, 116, 116,
//   112,  58,  47,  47, 115,  99, 104, 101, 109,  97,  46, 111,
//   114, 103,  47, 110,  97, 109, 101,   0,   1,  39, 104, 116,
//   116, 112,  58,  47,  47, 119, 119, 119,  46, 119,  51,  46,
//   111, 114, 103,  47,
//   ... 24 more items
// ]

decodeSchema(encodeSchema(schema))
// Schema {
//   classes: {
//     'http://schema.org/Person': { kind: 'product', components: [Object] }
//   }
// }

schema.isEqualTo(decodeSchema(encodeSchema(schema)))
// true

Advanced type utilities

The types namespace also has methods implementing the subtype relation over types as well as for computing the infima and suprema operations over the induced partial order.

Type comparison methods

We can compare types with types.isSubtypeOf and types.isEqualTo.

declare namespace types {
  function isSubtypeOf(x: Type, y: Type): boolean
  function isEqualTo(x: Type, y: Type): boolean
}

The subtype relation (denoted ≤ in writing) is defined by cases:

  • The URI type is a subtype of itself
  • A literal type X is a subtype of a literal type Y if and only if X and Y have the same datatype
  • A product type X is a subtype of the product type Y if and only if
    • for every component key K in X, Y has a component with key K, and the type X(K) is a subtype of the type Y(K)
  • A coproduct type X is a subtype of the coproduct type Y if and only if
    • for every option key K in Y, X has an option with key K, and the type X(K) is a subtype of the type Y(K)
  • A reference type X is a subtype of a reference type Y if and only if X and Y reference the same class
  • If two types X and Y are of different kinds, then neither X ≤ Y nor Y ≤ X

Intuitively, a type X could be a subtype of a type Y if it is missing some product components and has some extra coproduct options but otherwise structurally matches Y.

import { types } from "tasl"

types.isSubtypeOf(types.uri(), types.uri()) // true

types.isSubtypeOf(types.uri(), types.string) // false

types.isSubtypeOf(
  types.product({}),
  types.product({ "http://schema.org/name": types.string })
) // true

types.isSubtypeOf(
  types.product({ "http://schema.org/name": types.string }),
  types.product({})
) // false

types.isSubtypeOf(
  types.product({ "http://schema.org/name": types.string }),
  types.product({ "http://schema.org/name": types.boolean })
) // false

types.isSubtypeOf(
  types.product({ "http://schema.org/name": types.string }),
  types.product({
    "http://schema.org/name": types.product({
      "http://schema.org/givenName": types.string,
      "http://schema.org/familyName": types.string,
    }),
  })
) // false

types.isSubtypeOf(
  types.product({
    "http://schema.org/gender": types.coproduct({
      "http://schema.org/Male": types.unit,
      "http://schema.org/Female": types.unit,
      "http://schema.org/value": types.string,
    }),
  }),
  types.product({
    "http://schema.org/gender": types.coproduct({
      "http://schema.org/Male": types.unit,
      "http://schema.org/Female": types.unit,
    }),
  })
) // true

types.isSubtypeOf(
  types.product({
    "http://schema.org/gender": types.coproduct({
      "http://schema.org/Male": types.unit,
      "http://schema.org/Female": types.unit,
    }),
  }),
  types.product({
    "http://schema.org/gender": types.coproduct({
      "http://schema.org/Male": types.unit,
      "http://schema.org/Female": types.unit,
      "http://schema.org/value": types.string,
    }),
  })
) // false

types.isSubtypeOf(
  types.product({
    "http://schema.org/author": types.reference("http://schema.org/Person"),
  }),
  types.product({
    "http://schema.org/name": types.string,
    "http://schema.org/author": types.reference("http://schema.org/Person"),
  })
) // true

The subtype relation is reflexive (X ≤ X), transitive (if X ≤ Y and Y ≤ Z then X ≤ Z), and antisymmetric (if X ≤ Y and Y ≤ X then X = Y), which means the subtype relation forms a preorder over types. Every two types X and Y are related in one of four ways:

  1. X is a strict subtype of Y ((X ≤ Y) ∧ ¬(Y ≤ X))
  2. Y is a strict subtype of X ((Y ≤ X) ∧ ¬(X ≤ Y))
  3. X and Y are equal ((X ≤ Y) ∧ (Y ≤ X))
  4. X and Y are incomparable (¬(X ≤ Y) ∧ ¬(Y ≤ X))

types.isEqualTo(x, y) is equivalent to types.isSubtypeOf(x, y) && types.isSubtypeOf(y, x).

Type bound methods

Lastly, the types namespace also has methods for computing the greatest common subtype and least common supertype of types with respect to the subtype relation. These are more formally known as infimum and supremum, respectively.

The greatest common subtype of types X and Y is a maximal type Z such that Z is a subtype of both X and Y. Conversely, the least common supertype of types X and Y is a minimal type Z such that X and Y are both subtypes of Z.

declare namespace types {
  function hasCommonBounds(x: Type, y: Type): boolean
  function greatestCommonSubtype(x: Type, y: Type): Type
  function leastCommonSupertype(x: Type, y: Type): Type
}

In general, the infima and suprema of arbitrary types X and Y are not guaranteed to exist. The method types.hasCommonBounds checks whether two types have an infimum and supremum (if they have one then they also have the other). types.greatestCommonSubtype and types.leastCommonSupertype will throw an error if called with types that do not have common bounds.

Intuitively, types.greatestCommonSubtype and types.leastCommonSupertype are two complementary ways of "merging" two types by either discarding extra product components and keeping extra coproduct options, or keeping extra product components and discarding extra coproduct options, respectively.

import { types } from "tasl"

types.greatestCommonSubtype(types.uri(), types.uri()) // { kind: "uri" }
types.leastCommonSupertype(types.uri(), types.uri()) // { kind: "uri" }

types.greatestCommonSubtype(
  types.product({ "http://schema.org/name": types.string }),
  types.product({ "http://schema.org/email": types.uri() })
) // { kind: 'product', components: {} }

types.leastCommonSupertype(
  types.product({ "http://schema.org/name": types.string }),
  types.product({ "http://schema.org/email": types.uri() })
)
// {
//   kind: 'product',
//   components: {
//     'http://schema.org/email': { kind: 'uri' },
//     'http://schema.org/name': {
//       kind: 'literal',
//       datatype: 'http://www.w3.org/2001/XMLSchema#string'
//     }
//   }
// }

types.greatestCommonSubtype(
  types.coproduct({
    "http://example.com/foo": types.unit,
    "http://example.com/bar": types.unit,
  }),
  types.coproduct({
    "http://example.com/foo": types.unit,
    "http://example.com/baz": types.unit,
  })
)
// {
//   kind: 'coproduct',
//   options: {
//     'http://example.com/baz': { kind: 'product', components: {} },
//     'http://example.com/foo': { kind: 'product', components: {} },
//     'http://example.com/bar': { kind: 'product', components: {} }
//   }
// }

types.leastCommonSupertype(
  types.coproduct({
    "http://example.com/foo": types.unit,
    "http://example.com/bar": types.unit,
  }),
  types.coproduct({
    "http://example.com/foo": types.unit,
    "http://example.com/baz": types.unit,
  })
)
// {
//   kind: 'coproduct',
//   options: { 'http://example.com/foo': { kind: 'product', components: {} } }
// }

types.greatestCommonSubtype(types.string, types.boolean)
// Uncaught Error: cannot unify unequal literal types

types.greatestCommonSubtype(
  types.product({ "http://schema.org/name": types.string }),
  types.product({
    "http://schema.org/name": types.product({
      "http://schema.org/givenName": types.string,
      "http://schema.org/familyName": types.string,
    }),
  })
)
// Uncaught Error: cannot unify types of different kinds

The operations types.greatestCommonSubtype and types.leastCommonSupertype are both associative and commutative. The relation types.hasCommonBounds is reflexive and symmetric, but not necessarily transitive.

If X ≤ Y then their greatest common subtype is X and least common supertype is Y. There are many situations where types that are incomparable (neither X ≤ Y nor Y ≤ X) do have common bounds - types.hasCommonBounds(x, y) is not equivalent to types.isSubtypeOf(x, y) || types.isSubtypeOf(y, x).