-
Notifications
You must be signed in to change notification settings - Fork 25
Core concepts
A while back ago I started to fiddle with Microsofts CTP edition of code first in Entity framework 4. The product is great but I wanted something else, I wanted something “more” schemaless. I turned to MongoDB and wrote together an open source driver targeting .Net 4. Still, there was things that I didn’t like so I built a document DB over Lucene. I relatively quickly discovered that I missed all the great infrastructure that SQL-server provides. Security, replication, scheduler etc, so I prototyped a solution that used JSON to create a document/structure provider over SQL-server, namely: Simple Structure Oriented Db (SisDb).
All objects (structures) you pass to SisoDb is stored as JSON and as key-value indexes. SisoDb stores your POCO-graphs (plain old clr objects) using JSON which enables us to go from a POCO to persistable JSON. For each type of entity there will be three tables (tables are created on the fly). One table, the Structure-table, is to be seen as the master table and it holds the Id of the entity and the JSON-representation. The second table, Indexes-table, holds the value of each property in the object graph, with the purpose of providing a more performant and query friendly representation of the entity. The third table, Uniques-table, holds values of the scalar properties that you mark as unique. Unique constraints. To enforce data integrity, there's of course relationships between all tables.
The complete graph is serialized and deserialized so there's no magical lazy loading and proxies. Instead you have to think a bit before designing deep graphs and have to learn your self in thinking in documents. It isn't a relational data model you are working in. You can off course reference other documents and include them in the single query to reduce the amount of roundtrips and avoid n+1 scenarios. But when working with documents, duplication of data is more "ok" than with RDMS models.
The tables in the database that are required for an entity are created on the fly and is nothing you should have to care about. For each structure there will be three tables created. For the structure "Customer" the following tables will be created:
- CustomerStructure
- CustomerIndexes
- CustomerUniques
Holds the id and the data for the entity. The data is currently represented as JSON, handled by the serialization framework from ServiceStack.
The primary concern of this table is for querying. All Sql-queries generated by SisoDb, unless you query by id, are translated and executed against this table. It holds the member paths and the typed values of each scalar property in the object hierarchy. From v4.0, the table grows vertically, in a key-value fashion.
If a member is part of something that is enumerable, e.g: ProductNo
of an OrderLine
in Order
- will result in several rows with the same member path in the Indexes-table. The member path will be: Order.Lines.ProductNo
.
As with the Indexes-table, it is key-value oriented, but the value is always a checksum generated for each scalar property that has been marked with the SisoDb.Annotations.UniqueAttribute
. This attribute can be used in one of two ways. Either you mark a scalar property to be unique per instance (e.g. OrderLine.ProductNo) or per type (e.g. Order.OrderNo).
The JSON representation of the structure is what's being deserialized back when you query for your documents. This gives you opportunities to store an structure as a class or interface and then return it as something completely else. You can read more about it here: Store as X, return as Y.
The default JSON serializer being used is one of the most performant serializer I've found, namely: ServiceStack.Text. Read more about a comparison here: http://daniel.wertheim.se/2011/02/07/json-net-vs-servicestack/
The primary concern of this table is for querying. All Sql-queries generated by SisoDb, unless you query by id, are translated and executed against this table. It gets created the first time you use execute a command against a UnitOfWork or QueryEngine for a certain type of structure (Person, Customer, Order, ...) a schema for that structure is constructed, and cached in the Database instance you created via your SisoDbFactory. This schema is essentially made up of property accessors, that via IL EMITs accesses the values of your structures and gives them a key. The key is the complete member path of the property.
public class Customer
{
public Guid StructureId { get; set; }
public int CustomerNo { get; set; }
public Address Address { get; set; }
}
public class Address
{
public string Street { get; set; }
public string Zip { get; set; }
public string City { get; set; }
}
This will create six different cached index accessors, with the member path and type info:
- StructureId : Guid
- CustomerNo : int
- Address.Street : string
- Address.Zip : string
- Address.City : string
Now, everytime you insert or update a structure into the database, this cached accessors will be used to extract the values of the structure. Each value will be stored using the member path as the key and the value as the typed value in a certain indexes table. If you look at JSON and the indexes as data, this distinction of the JSON being stored in one set and the indexes in another set. The store the JSON in Essent and the Indexes in Lucene. I'm looking at giving you an option to select whether you want to store the indexes in the SQL-server or in Lucene.
Per default every scalar property in the object graph is extracted and given a key and a value, but you can control what to index. Read more about it here: Control what to index
There's one attribute you can use in SisoDb, and it's the UniqueAttribute. Using it you can mark a scalar property as being unique, either per instance or per type.
public class Customer
{
public Guid StructureId { get; set; }
[Unique(UniqueModes.PerType)]
public int CustomerNo { get; set; }
...
}
When doing this, a checksum is generated for the CustomerNo and is inserted in the Unques-table, which has some unique constraints in it.
Well if you keep your self to the open-closed principle and just add new properties nothing is needed, it all works. You also don't have to do anything if you drop properties. So, if you remove a member from the model, values for the member is deleted from the indexes-table and the uniques-table. The structure-table will of course contain the "truth", the JSON, which will not be updated until the structure is "touched" - fetched and re-saved.
There is the concept of using a structure set updater to handle more complex model updates. Read more about it here