Core concepts

A while back ago I started to fiddle with Microsofts CTP edition of code first in Entity framework 4. The product is great but I wanted something else, I wanted something “more” schemaless. I turned to MongoDB and wrote together an open source driver targeting .Net 4. Still, there was things that I didn’t like so I built a document DB over Lucene. I relatively quickly discovered that I missed all the great infrastructure that SQL-server provides. Security, replication, scheduler etc, so I prototyped a solution that used JSON to create a document/structure provider over SQL-server, namely: Simple Structure Oriented Db (SisDb).

How is data stored?

All objects (structures) you pass to SisoDb is stored as JSON and as key-value indexes. SisoDb stores your POCO-graphs (plain old clr objects) using JSON which enables us to go from a POCO to persistable JSON. For each type of entity there will be three tables (tables are created on the fly). One table, the Structure-table, is to be seen as the master table and it holds the Id of the entity and the JSON-representation. The second table, Indexes-table, holds the value of each property in the object graph, with the purpose of providing a more performant and query friendly representation of the entity. The third table, Uniques-table, holds values of the scalar properties that you mark as unique. Unique constraints. To enforce data integrity, there's of course relationships between all tables.

The complete graph is serialized and deserialized so there's no magical lazy loading and proxies. Instead you have to think a bit before designing deep graphs and have to learn your self in thinking in documents. It isn't a relational data model you are working in. You can off course reference other documents and include them in the single query to reduce the amount of roundtrips and avoid n+1 scenarios. But when working with documents, duplication of data is more "ok" than with RDMS models.

Tables

The tables in the database that are required for an entity are created on the fly and is nothing you should have to care about. For each structure there will be three tables created. For the structure "Customer" the following tables will be created:

CustomerStructure
CustomerIndexes
CustomerUniques

CustomerStructure

Holds the id and the data for the entity. The data is currently represented as JSON, handled by the serialization framework from ServiceStack.

CustomerIndexes

The primary concern of this table is for querying. All Sql-queries generated by SisoDb, unless you query by id, are translated and executed against this table. It holds the member paths and the typed values of each scalar property in the object hierarchy. From v4.0, the table grows vertically, in a key-value fashion.

If a member is part of something that is enumerable, e.g: ProductNo of an OrderLine in Order - will result in several rows with the same member path in the Indexes-table. The member path will be: Order.Lines.ProductNo.

CustomerUniques

As with the Indexes-table, it is key-value oriented, but the value is always a checksum generated for each scalar property that has been marked with the SisoDb.Annotations.UniqueAttribute. This attribute can be used in one of two ways. Either you mark a scalar property to be unique per instance (e.g. OrderLine.ProductNo) or per type (e.g. Order.OrderNo).

JSON

The JSON representation of the structure is what's being deserialized back when you query for your documents. This gives you opportunities to store an structure as a class or interface and then return it as something completely else. You can read more about it here: Store as X, return as Y.

Serializer

The default JSON serializer being used is one of the most performant serializer I've found, namely: ServiceStack.Text. Read more about a comparison here: http://daniel.wertheim.se/2011/02/07/json-net-vs-servicestack/

Key-value indexes

The primary concern of this table is for querying. All Sql-queries generated by SisoDb, unless you query by id, are translated and executed against this table. It gets created the first time you use execute a command against a UnitOfWork or QueryEngine for a certain type of structure (Person, Customer, Order, ...) a schema for that structure is constructed, and cached in the Database instance you created via your SisoDbFactory. This schema is essentially made up of property accessors, that via IL EMITs accesses the values of your structures and gives them a key. The key is the complete member path of the property.

public class Customer
{
    public Guid StructureId { get; set; }
    public int CustomerNo { get; set; }
    public Address Address { get; set; }
}

public class Address
{
    public string Street { get; set; }
    public string Zip { get; set; }
    public string City { get; set; }
}

This will create six different cached index accessors, with the member path and type info:

StructureId : Guid
CustomerNo : int
Address.Street : string
Address.Zip : string
Address.City : string

Now, everytime you insert or update a structure into the database, this cached accessors will be used to extract the values of the structure. Each value will be stored using the member path as the key and the value as the typed value in a certain indexes table. If you look at JSON and the indexes as data, this distinction of the JSON being stored in one set and the indexes in another set. The store the JSON in Essent and the Indexes in Lucene. I'm looking at giving you an option to select whether you want to store the indexes in the SQL-server or in Lucene.

Control what to index

Per default every scalar property in the object graph is extracted and given a key and a value, but you can control what to index. Read more about it here: Control what to index

Unique constraints

There's one attribute you can use in SisoDb, and it's the UniqueAttribute. Using it you can mark a scalar property as being unique, either per instance or per type.

public class Customer
{
    public Guid StructureId { get; set; }

    [Unique(UniqueModes.PerType)]
    public int CustomerNo { get; set; }

    ...
}

When doing this, a checksum is generated for the CustomerNo and is inserted in the Unques-table, which has some unique constraints in it.

Model updates

Well if you keep your self to the open-closed principle and just add new properties nothing is needed, it all works. You also don't have to do anything if you drop properties. So, if you remove a member from the model, values for the member is deleted from the indexes-table and the uniques-table. The structure-table will of course contain the "truth", the JSON, which will not be updated until the structure is "touched" - fetched and re-saved.

More complex model updates?

There is the concept of using a structure set updater to handle more complex model updates. Read more about it here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly