LumoSQL · May 6, 2020
diff --git a/‎README.md
+36-36 b/‎README.md
+36-36
diff --git a/‎images/lumo-architecture-intro.jpg
553 KB b/‎images/lumo-architecture-intro.jpg
553 KB
diff --git a/‎images/lumo-architecture-lumosql-theoretical-future.svg
+160 b/‎images/lumo-architecture-lumosql-theoretical-future.svg
+160
diff --git a/‎images/lumo-architecture-online-db-server-scale.svg
+208 b/‎images/lumo-architecture-online-db-server-scale.svg
+208
diff --git a/‎images/lumo-architecture-online-db-server.svg
+157 b/‎images/lumo-architecture-online-db-server.svg
+157
diff --git a/‎images/lumo-architecture-sqlite-overview.svg
+158 b/‎images/lumo-architecture-sqlite-overview.svg
+158
diff --git a/‎images/lumo-architecture-sqlite-parts.svg
+111 b/‎images/lumo-architecture-sqlite-parts.svg
+111
diff --git a/‎images/lumo-diagram-library.odg
20.7 KB b/‎images/lumo-diagram-library.odg
20.7 KB
diff --git a/‎images/lumo-doc-standards-intro.jpg
85 KB b/‎images/lumo-doc-standards-intro.jpg
85 KB
diff --git a/‎images/lumo-ecosystem-intro.png
190 KB b/‎images/lumo-ecosystem-intro.png
190 KB
diff --git a/‎images/lumo-implementation-intro.jpg
188 KB b/‎images/lumo-implementation-intro.jpg
188 KB
diff --git a/‎images/lumo-logo-temp.svg
+288 b/‎images/lumo-logo-temp.svg
+288
diff --git a/‎images/lumo-logo.png
3.66 KB b/‎images/lumo-logo.png
3.66 KB
diff --git a/‎images/lumo-project-aims-intro.jpg
93 KB b/‎images/lumo-project-aims-intro.jpg
93 KB
diff --git a/‎images/lumo-relevant-codebases-intro.jpg
4.42 MB b/‎images/lumo-relevant-codebases-intro.jpg
4.42 MB
diff --git a/‎images/lumo-signature.svg
+599 b/‎images/lumo-signature.svg
+599
diff --git a/‎lumo-architecture.md
+219 b/‎lumo-architecture.md
+219
diff --git a/‎lumo-benchmarking.md
+345 b/‎lumo-benchmarking.md
+345
diff --git a/‎lumo-corruption-detection-and-magic.md
+143 b/‎lumo-corruption-detection-and-magic.md
+143
diff --git a/‎lumo-doc-standards.md
+290 b/‎lumo-doc-standards.md
+290
diff --git a/‎lumo-help-alien-language.md
+27 b/‎lumo-help-alien-language.md
+27
diff --git a/‎lumo-implementation.md
+145 b/‎lumo-implementation.md
+145
diff --git a/‎lumo-landscape.md
+410 b/‎lumo-landscape.md
+410
diff --git a/‎lumo-legal-aspects.md
+177 b/‎lumo-legal-aspects.md
+177
diff --git a/‎lumo-not-forking.md
+301 b/‎lumo-not-forking.md
+301
diff --git a/‎lumo-project-aims.md
+117 b/‎lumo-project-aims.md
+117
diff --git a/‎lumo-quickstart.md
+253 b/‎lumo-quickstart.md
+253
diff --git a/‎lumo-relevant-codebases.md
+169 b/‎lumo-relevant-codebases.md
+169
diff --git a/‎lumo-relevant-knowledgebase.md
+122 b/‎lumo-relevant-knowledgebase.md
+122
@@ -1,37 +1,37 @@
-## Welcome to GitHub Pages
+<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
+<!-- SPDX-FileCopyrightText: 2020 The LumoSQL Authors -->
+<!-- SPDX-ArtifactOfProjectName: LumoSQL -->
+<!-- SPDX-FileType: Documentation -->
+<!-- SPDX-FileComment: Original by Dan Shearer, 2020 -->
+
+
+LumoSQL
+=======
+
+![](./images/lumo-logo-temp.svg "LumoSQL logo")
+
+
+Table of Contents
+=================
+
+Welcome to the LumoSQL project, which builds on the excellent
+[SQLite](https://sqlite.org/) project without forking it.  LumoSQL is an SQL database
+which can be used in embedded applications identically to SQLite, but also
+optionally with different storage backends and other additional behaviour.
+LumoSQL emphasises benchmarking, code reuse and modern database implementation.
+
+* [Quick Start](./lumo-quickstart.md)
+* [LumoSQL Project Aims](./lumo-project-aims.md)
+* Creating a LumoSQL Ecosystem
+    + [The LumoSQL Landscape](./lumo-landscape.md)
+    + [Codebases relevant to LumoSQL](./lumo-relevant-codebases.md)
+    + [Full Knowledgebase Relevant to LumoSQL](./lumo-relevant-knowledgebase.md)
+* LumoSQL in Technical Detail
+    + [Architecture](./lumo-architecture.md)
+    + [Implementation](./lumo-implementation.md)
+        * [Not-forking Scheme](./lumo-not-forking.md)
+        * [Corruption Detection and Magic](./lumo-corruption-detection-and-magic.md)
+* [Benchmarking](./lumo-benchmarking.md)
+* [Legal Aspects](./lumo-legal-aspects.md)
+* [LumoSQL Documentation Standards](./lumo-doc-standards.md)
 
-You can use the [editor on GitHub](https://github.com/LumoSQL/lumosql.github.io/edit/master/README.md) to maintain and preview the content for your website in Markdown files.
-
-Whenever you commit to this repository, GitHub Pages will run [Jekyll](https://jekyllrb.com/) to rebuild the pages in your site, from the content in your Markdown files.
-
-### Markdown
-
-Markdown is a lightweight and easy-to-use syntax for styling your writing. It includes conventions for
-
-```markdown
-Syntax highlighted code block
-
-# Header 1
-## Header 2
-### Header 3
-
-- Bulleted
-- List
-
-1. Numbered
-2. List
-
-**Bold** and _Italic_ and `Code` text
-
-[Link](url) and ![Image](src)
-```
-
-For more details see [GitHub Flavored Markdown](https://guides.github.com/features/mastering-markdown/).
-
-### Jekyll Themes
-
-Your Pages site will use the layout and styles from the Jekyll theme you have selected in your [repository settings](https://github.com/LumoSQL/lumosql.github.io/settings). The name of this theme is saved in the Jekyll `_config.yml` configuration file.
-
-### Support or Contact
-
-Having trouble with Pages? Check out our [documentation](https://help.github.com/categories/github-pages-basics/) or [contact support](https://github.com/contact) and we’ll help you sort it out.
@@ -0,0 +1,219 @@
+<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
+<!-- SPDX-FileCopyrightText: 2020 The LumoSQL Authors -->
+<!-- SPDX-ArtifactOfProjectName: LumoSQL -->
+<!-- SPDX-FileType: Documentation -->
+<!-- SPDX-FileComment: Original by Dan Shearer, 2020 -->
+
+
+Table of Contents
+=================
+
+   * [LumoSQL Architecture](#lumosql-architecture)
+   * [Table of Contents](#table-of-contents)
+   * [Online Database Servers](#online-database-servers)
+   * [SQLite as an Embedded Database](#sqlite-as-an-embedded-database)
+   * [LumoSQL Architecture](#lumosql-architecture-1)
+   * [Database Storage Systems](#database-storage-systems)
+      * [WALs in SQLite](#wals-in-sqlite)
+      * [Single-level Store](#single-level-store)
+
+LumoSQL Architecture
+====================
+
+![](./images/lumo-architecture-intro.jpg "Shanghai Skyline from Pxfuel, CC0 license, https://www.pxfuel.com/en/free-photo-oyvbv")
+
+
+# Online Database Servers
+
+All of the most-used databases other than SQLite work over a network, here
+called "online databases". This includes Postgresql, MariaDB, MySQL, SQLServer,
+Oracle, and so on.
+
+![](./images/lumo-architecture-online-db-server.svg "What an online server database looks like")
+
+An online database server has clients that connect to the server over a
+network. Once a network connection is opened, SQL queries are made by the
+client and data is returned from the server. Although all databases use one of
+the variants of the same SQL language, the means of connection is specific to each
+database. 
+
+For example, on a typical Debian Linux server there are these well-known ports:
+
+```
+foo@zanahoria:/etc$ grep sql /etc/services
+
+ms-sql-s        1433/tcp                        # Microsoft SQL Server
+ms-sql-m        1434/tcp                        # Microsoft SQL Monitor
+mysql           3306/tcp                        # MySQL
+postgresql      5432/tcp                        # PostgreSQL Database
+mysql-proxy     6446/tcp                        # MySQL Proxy
+```
+
+with many other port assignments for other databases.
+
+In the diagram above, each UserApp has a network connection to the SQL Database
+Server on TCP port, for example 5432 if it is Postgresql. The UserApps could be
+running from anywhere on the internet, including on mobile devices. There is a
+limit to how many users one single database server can serve, in the many
+thousands at least, but often reached for internet applications.
+
+![](./images/lumo-architecture-online-db-server-scale.svg "How an online database server scales")
+
+The most obvious way to scale an online database is to add more RAM, CPU and storage to a single server. This way all code runs in a single address space and is called "Scaling Up". The alternative is to add more servers, and distribute queries between them. This is called "Scale Out".
+
+Nati Shalom describes the difference in the [article Scale-Out vs Scale-Up](http://ht.ly/cAhPe):
+
+> One of the common ways to best utilize multi-core architecture in a context
+> of a single application is through concurrent programming. Concurrent
+> programming on multi-core machines (scale-up) is often done through
+> multi-threading and in-process message passing also known as the Actor
+> model.Distributed programming does something similar by distributing jobs
+> across machines over the network. There are different patterns associated
+> with this model such as Master/Worker, Tuple Spaces, BlackBoard, and
+> MapReduce. This type of pattern is often referred to as scale-out
+> (distributed).
+>
+> Conceptually, the two models are almost identical as in both cases we break a
+> sequential piece of logic into smaller pieces that can be executed in
+> parallel. Practically, however, the two models are fairly different from an
+> implementation and performance perspective. The root of the difference is the
+> existence (or lack) of a shared address space. In a multi-threaded scenario
+> you can assume the existence of a shared address space, and therefore data
+> sharing and message passing can be done simply by passing a reference. In
+> distributed computing, the lack of a shared address space makes this type of
+> operation significantly more complex. Once you cross the boundaries of a
+> single process you need to deal with partial failure and consistency. Also,
+> the fact that you can’t simply pass an object by reference makes the process
+> of sharing, passing or updating data significantly more costly (compared with
+> in-process reference passing), as you have to deal with passing of copies of
+> the data which involves additional network and serialization and
+> de-serialization overhead.
+
+# SQLite as an Database Library
+
+The user applications are tightly connected to the SQLite library. Whether by
+dynamic linking to a copy of the library shared across the whole operating
+system, or static linking so that it is part of the same program as the user
+application, there is no networking involved. Making an SQL query and getting a
+response involves a cascade of function calls from the app to the library to
+the operating system and back again, typically taking less than 10 milliseconds
+at most depending on the hardware used. An online database cannot expect to get
+faster results than 100 milliseconds, often much more depending on network and
+hardware. And online database relies on the execution of hundreds of millions
+of more lines of code on at least two computers, whereas SQLite relies on the
+execution of some hundreds of thousand on just one computer.
+
+![](./images/lumo-architecture-sqlite-overview.svg "Overview of a SQLite being an embedded database server")
+
+
+![](./images/lumo-architecture-sqlite-parts.svg "The simplest view of the three parts to SQLite in typical embedded use")
+
+
+# How LumoSQL Architecture Differs from SQLite
+
+![](./images/lumo-architecture-lumosql-theoretical-future.svg "Where LumoSQL architecture is headed")
+
+# Database Storage Systems
+
+LumoSQL has several features that are in advance of every other
+widely-used database. With the first prototype complete with an LMDB backend,
+LumoSQL is already the first major SQL database to move away from batch
+processing, since it has a backend that does not use Write-Ahead Logs.  LumoSQL
+also needs to be able to use both the original SQLite and additional storage
+mechanisms, and any or all of these storage backends at once. Not all future
+storage will be on local disk, or btree key-values.
+
+[Write-ahead Logging in Transactional Databases](https://en.wikipedia.org/wiki/Write-ahead_logging) has been the only
+way since the 1990s that atomicity and durability are provided in
+databases. A version of same technique is used in filesystems, where is is
+called [journalling](https://en.wikipedia.org/wiki/Journaling_file_system).
+Write-ahead Logging (WAL) is a method of making sure that all modifications to
+a file are first written to a separate log, and then they are merged (or
+updated) into a master file in a later step. If this update operation is
+aborted or interrupted, the log has enough information to undo the updates and
+reset the database to the state before the update began. Implementations need
+to solve the problem of WAL files growing without bound, which means some kind
+of whole-database snapshot or checkpoint is required.
+
+WALs seek to address issues with concurrent transactions, and reliability in
+the face of crashes or errors. There are decades of theory around how to
+implement WAL, and it is a significant part of any University course in
+database internals. As well as somewhat-reliable commit and rollback, it is the
+WAL that lets all the main databases in use offer online backup features, and
+point-in-time recovery. Every WAL feature and benefit comes down to being able
+to have a stream of atomic operations that can be replayed forwards or
+backwards.
+
+WAL is inherently batch-oriented. The more a WAL-based database tries to be to
+real time, the more expensive it is to keep all WAL functionality working. 
+
+The WAL implementation in the most common networked databases is comprehensive
+and usually kept as a rarely-seen technical feature. Postgresql is an exception, 
+going out of its way to inform administrators how the WAL system works and what 
+can be done with access to the log files.
+
+All the most common networked databases describe their WAL implementation and
+most offer some degree of control over it:
+
+* [Postgresql](https://www.postgresql.org/docs/12/wal-intro.html)
+* [SQL Server](https://docs.microsoft.com/en-us/sql/relational-databases/sql-server-transaction-log-architecture-and-management-guide?view=sql-server-ver15)
+* [Oracle Log Writer Process](https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/process-architecture.html#GUID-B6BE2C31-1543-4504-9763-6FFBBF99DC85)
+* [MySQL ReDo Log](https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-logging.html)
+* [MariaDB Undo Log](https://mariadb.com/kb/en/library/innodb-undo-log/)
+
+Companies have invested billions of Euros into these codebases, with stability
+and reliability as their first goal. And yet even with all the runtime
+advantages of huge resources and stable datacentre environments - even these
+companies can't make WALs fully deliver on reliability. 
+
+These issues are well-described in the case of Postgresql. Postgresql has an
+easier task than SQLite in the sense it is not intended for unpredictable
+embedded use cases, and also that Postgresql has a large amount of code
+dedicated to safe WAL handling.  Even so, Postgresql still requires its users
+to make compromises regarding reliability. For example [this WAL mitigation
+article](https://dzone.com/articles/postgresql- why-and-how-wal-bloats)
+describes a few of the tradeoffs of merge frequency vs reliability in the case
+of a crash. This is a very real problem for every traditional database and that
+includes SQLite - which does not have a fraction of the WAL-handling code of
+the large databases, and which is frequently deployed in embedded use cases
+where crashes and resets happen very frequently.
+
+## WALs in SQLite 
+
+SQLite WALs are special.
+
+The [SQLite WAL]( https://www.sqlite.org/draft/wal.html) requires multiple
+files to be maintained in synch, otherwise there will be corruption. Unlike the
+other databases listed here, SQLite has no pre-emptive corruption detection and
+only fairly basic on-demand detection.
+
+## Single-level Store
+
+Single-level store concepts are well-explained in [Howard Chu's 2013 MDB Paper](./lumo-relevant-knowledgebase.md#list-of-sqlite-code-related-knowledge):
+
+> One fundamental concept behind the MDB approach is known as "Single-Level
+> Store". The basic idea is to treat all of computer memory as a single address
+> space. Pages of storage may reside in primary storage (RAM) or in secondary
+> storage (disk) but the actual location is unimportant to the application. If
+> a referenced page is currently in primary storage the application can use it
+> immediately, if not a page fault occurs and the operating system brings the
+> page into primary storage. The concept was introduced in 1964 in the Multics
+> operating system but was generally abandoned by the early 1990s as data
+> volumes surpassed the capacity of 32 bit address spaces. (We last knew of it
+> in the Apollo DOMAIN operating system, though many other Multics-influenced
+> designs carried it on.) With the ubiquity of 64 bit processors today this
+> concept can again be put to good use. (Given a virtual address space limit of
+> 63 bits that puts the upper bound of database size at 8 exabytes. Commonly
+> available processors today only implement 48 bit address spaces, limiting us
+> to 47 bits or 128 terabytes.) Another operating system requirement for this
+> approach to be viable is a Unified BufferCache. While most POSIX-based
+> operating systems have supported an mmap() system call for many years, their
+> initial implementations kept memory managed by the VM subsystem separate from
+> memory managed by the filesystem cache. This was not only wasteful
+> (again, keeping data cached in two places at once) but also led to coherency
+> problems - data modified through a memory map was not visible using
+> filesystem read() calls, or data modified through a filesystem write() was not
+> visible in the memory map. Most modern operating systems now have filesystem
+> and VM paging unified, so this should not be a concern in most deployments.
+
+
@@ -0,0 +1,143 @@
+<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
+<!-- SPDX-FileCopyrightText: 2020 The LumoSQL Authors -->
+<!-- SPDX-ArtifactOfProjectName: LumoSQL -->
+<!-- SPDX-FileType: Documentation -->
+<!-- SPDX-FileComment: Original by Dan Shearer, 2020 -->
+
+Table of Contents
+=================
+
+   * [Summary of SQL Database Corruption Detection](#summary-of-sql-database-corruption-detection)
+   * [SQLite and Integrity Checking](#sqlite-and-integrity-checking)
+   * [LumoSQL Checksums and the SQLite On-disk File Format](#lumosql-checksums-and-the-sqlite-on-disk-file-format)
+   * [Design for Corruption Detection](#design-for-corruption-detection)
+
+
+![](./images/lumo-corruption-detection-and-magic-intro.png "XXXXXXXX")
+
+# Summary of SQL Database Corruption Detection
+
+One of the short-term goals stated in the [LumoSQL Project Aims](./lumo-project-aims.md) is:
+
+> LumoSQL will improve SQLite quality and privacy compliance by introducing
+> optional on-disk checksums for storage backends including to the original
+> SQLite btree format. This will give real-time row-level corruption detection.
+
+It seems quite extraordinary that in 2020 none of the major online databases -
+not Posgresql, Oracle, MariaDB, SQLServer or others - have the ability to check
+during a SELECT operation that the row being read from disk is exactly the row
+that was previously written. There are many reasons why data can get modified,
+deleted or overwritten outwith the control of the database, and the ideal way
+to respond to this is to notify the database when a corrupt row is accessed.
+All that is needed is for a hash of the row to be stored with the row when it
+is written.
+
+All the major online databases have the capacity for an external process to
+check disk files for database corruption, as does SQLite. This is very
+different from real-time integrity checking, and cannot be done in real time.
+
+Knowing that a corruption problem is limited to a row or an itemised
+list of rows reduces a general "database corruption problem" down to a bounded
+reconstruction task. Users can have confidence in the remainder of a database
+even if there is corruption found in some rows.
+
+This problem has been recognised and solved inefficiently at the SQL level by various projects. Two of these are
+[Periscope Data's Per-table Multi-database Solution](https://www.periscopedata.com/blog/hashing-tables-to-ensure-consistency-in-postgres-redshift-and-mysql) and [Percona's Postgresql Public Key Row Tracking](https://www.percona.com/blog/2018/10/12/track-postgresql-row-changes-using-public-private-key-signing/). By using SQL code rather than modifying the database internals there is a performance hit. Both these companies specialise in performance optimisation but choose not to apply it to this feature, suggesting they are not convinced of high demand from users.
+
+Interestingly, all the big online databases have row-level security, which has many similarities to the problem of corruption detection. 
+
+For those databases that offer encryption, this is effectively page-level or
+column-based hashes and therefore there is corruption detection by implication.
+However this is not row-based checksumming, and it is not on by default in any
+of the most common databases.
+
+It is possible to introduce a checksum on database pages more easily than for
+every row, and transparently to database users. However, knowing a database
+page is corrupt isn't much help to the user, because there could be many rows
+in a single page.
+
+# SQLite and Integrity Checking
+
+The SQLite developers go to great lengths to avoid database corruption, within their project goals. Nevertheless, corrupted SQLite databases are an everyday occurance.
+
+SQLite does have checksums already in some places:
+
+* for the journal transaction log (superceded by the Write Ahead Log system)
+* for each database page when using the closed-source SQLite Encryption Extension
+* for each page in a WAL file
+
+SQLite also has [PRAGMA integrity_check](https://www.sqlite.org/pragma.html#pragma_integrity_check) and
+[PRAGMA quick_check](https://www.sqlite.org/pragma.html#pragma_quick_check)
+which do partial checking, and which do not require exclusive access to the
+database. These checks have to scan the database file sequentially and verify
+the logic of its structure, because there are no checksums available to make it
+work more quickly.
+
+None of these are even close to the accuracy, reliability and speed of row-level corruption detection. 
+
+SQLite does have a file change counter in its database header, in 
+[offset 24 of the official file format](https://www.sqlite.org/fileformat.html), however this
+is not itself subject to integrity checks nor does it contain information about the rest of the file,
+so it is a hint rather than a guarantee.
+
+SQLite needs row-level integrity checking even more than the online databases because:
+
+* SQLite embedded and IoT use cases often involve frequent power loss, which is the most likely time for corruption to occur.
+* an SQLite database is an ordinary filesystem disk file stored wherever the user decided, which can often be deleted or overwritten by any unprivileged process.
+* it is easy to backup an SQLite database partway through a transaction, meaning that the restore will be corrupted
+* SQLite does not have robust locking mechanisms available for access by multiple processes at once, since it relies on lockfiles and Posix advisory locking 
+* SQLite provides the [VFS API Interface](https://www.sqlite.org/vfs.html) which users can easily misuse to ignore locking via the sql3_*v2 APIs
+* the on-disk file format is seemingly often corrupted regardless of use case. Better evidence on this is needed but authors of SQLite data file recovery software (see listing in [SQLite Relevant Knowledgebase](./lumo-relevant-knowledebase)) indicates high demand for their services. Informal shows of hands at conferences indicates that SQLite users expect corruption.
+
+sqlite.org has a much more detailed, but still incomplete, summary of [How to Corrupt an SQLite Database](https://www.sqlite.org/howtocorrupt.html).
+
+# LumoSQL Checksums and the SQLite On-disk File Format 
+
+The SQLite database format is widely used as a defacto standard. LumoSQL ships
+with the lumo-backend-mdb-traditional which is the unmodified SQLite on-disk
+format, the same code generating the same data. There is no corruption
+detection included in the file format for this backend.  However corruption
+detection is available for the traditional backend, and other backends that do
+not have scope for checksums in their headers. For all of these backends,
+LumoSQL offers a separate metadata file containing integrity information.
+
+The new backend lumo-backend-mdb-updated adds row-level checksums in the header
+but is otherwise identical to the traditional SQLite MDB format. 
+
+There is an argument that any change at all is the same as having a completely
+different format.  This is not a strong argument against adding checksums to
+the traditional SQLite on-disk format because with encryption increasingly
+becoming mandatory, the standard cannot apply. The sqlite.org closed-source SSE
+solution is described as "All database content, including the metadata, is
+encrypted so that to an outside observer the database appears to be white
+noise." Other solutions are possible involving metadata that is not encrypted
+(but definitely checksummed), but in any case, there is no on-disk standard for
+SQLite databases with encryption.
+
+# Design for Corruption Detection
+
+All LumoSQL backends can have corruption detection enabled, with the metadata
+stored either directly in the backend database files, or in a separate file.
+When a user switches on checksums for a database, metadata needs to be stored.
+
+This depends on two new functions needed in any case for labelling LumoSQL
+databases provided by backend-magic.c: lumosql_set_magic() and lumosql_get_magic(). These functions add and
+read a unique metadata signature to a LumoSQL database.
+
+1. if possible magic is inserted into the existing header
+
+2. if not a separate "metadata" b-tree is created which contains a key "magic"
+and the appropriate value. get_magic() will look for the special metadata
+b-tree and the "magic" key
+
+After LumoSQL has determined how and where metadata will be stored, the high-level design for row-level checksums is:
+
+1. an internally maintained row hash updated with every change to a row
+2. If a corruption is detected on read, LumoSQL should make maximum relevant fuss. At minimum, [error code 11 is SQLITE_CORRUPT](https://www.sqlite.org/rescode.html#corrupt)
+3. An additional SQL user command is added that exposes this hash in a column so that user-level logic can do not only corruption detection, but also change detection.
+
+At a later stage a column checksum can be added giving change detection on a table, or corruption detection for read-only tables.
+
+In the case where there is a separate metadata file, a function pair in lumo-backend-magic.c reads and writes a whole-of-file checksum for the database. This can't be done for where metadata is stored in the main database file.
+
+
@@ -0,0 +1,290 @@
+<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
+<!-- SPDX-FileCopyrightText: 2020 The LumoSQL Authors -->
+<!-- SPDX-ArtifactOfProjectName: LumoSQL -->
+<!-- SPDX-FileType: Documentation -->
+<!-- SPDX-FileComment: Original by Dan Shearer, 2020 -->
+
+
+Table of Contents
+=================
+
+   * [LumoSQL Documentation Standards](#lumosql-documentation-standards)
+   * [Contributions to LumoSQL Documentation are Welcome](#contributions-to-lumosql-documentation-are-welcome)
+   * [LumoSQL Respects Documentation for SQLite, LMDB and More](#lumosql-respects-documentation-for-sqlite-lmdb-and-more)
+   * [Text Standards and Tools](#text-standards-and-tools)
+   * [Diagram Standards and Tools](#diagram-standards-and-tools)
+      * [LumoSQL Diagram Signature](#lumosql-diagram-signature)
+      * [Using the LumoSQL Diagram Library](#using-the-lumosql-diagram-library)
+      * [Adding Diagrams](#adding-diagrams)
+      * [Diagram Style Guide](#diagram-style-guide)
+   * [Image Standards and Tools](#image-standards-and-tools)
+   * [Previewing Markdown before Pushing](#previewing-markdown-before-pushing)
+   * [Copyright for LumoSQL Documentation](#copyright-for-lumosql-documentation)
+   * [Metadata Header for Text Files](#metadata-header-for-text-files)
+   * [Human Languages - 人类语言](#human-languages---人类语言)
+   * [Creating and Maintaining Table of Contents](#creating-and-maintaining-table-of-contents)
+   * [Tidying Markdown (mostly not required)](#tidying-markdown-mostly-not-required)
+
+
+LumoSQL Documentation Standards
+===============================
+
+This chapter covers how LumoSQL documentation should be written and maintained. 
+
+![](./images/lumo-doc-standards-intro.jpg "Image from Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Chinese_books_at_a_library.jpg")
+
+# Contributions to LumoSQL Documentation are Welcome
+
+The first rule of LumoSQL documentation is "Yes please, we'd be delighted to
+receive patches and pull requests, in any way you want to make them". Anyone
+who has gone to the trouble to write down something useful about LumoSQL is our
+friend. We know there's a lot to fix.
+
+If you want to make a quick documentation fix, then edit the Markdown and send
+it to us by any means you like, especially a Github Issue or Pull Request. You
+might just want to send us some improved paragraphs on their own. If this
+sounds like you, stop reading now and get on with sending us text :-)
+
+If you want to do something more serious with the documentation then you need
+to read on, learning about our standards, recommended tools and processes. 
+
+* The main website text, under the directory `doc/` .
+** Text, such as this document you are reading, stored in the directory `doc/www`
+** Images, such as PNG or JPEG format, stored in `doc/www/images`
+** Images that are captured from videos and in the docs as thumbnails, also in `doc/www/images`
+
+The Markdown files are standalone and complete - you can read them online just as they are.
+
+The file `doc/www/Makefile` is an evolving tool to test these Markdown files, and soon will also
+be for generating images and probably the tables of contents.
+
+# LumoSQL Respects Documentation for SQLite, LMDB and More
+
+LumoSQL Documentation is standalone in evey way, including formats, tools and standards.
+
+However, LumoSQL documentation refers to and should be consulted together with the [SQLite
+documentation](https://www.sqlite.org/docs.html), because with the following
+exceptions, LumoSQL works (or should work) in exactly the same way as SQLite.
+LumoSQL definitely not want to duplicate SQLite documentation, and regards the
+excellent SQLite documentation as definitive except where indicated. 
+
+Differences with SQLite arise:
+
+* Where there is an extra/different storage backend to the SQLite Btree storage system
+* Where there are extra parameters in the user interface (commandline, API, pragmas) for another backend
+* When describing how the LumoSQL source tree works
+* When LumoSQL is working as other than an embedded library
+* When LumoSQL has an extra/different frontend to the SQLite SQL processor
+
+It isn't only SQLite documentation that LumoSQL embraces. There is also [LMDB
+Documentation](http://www.lmdb.tech/doc/), and more to come as LumoSQL integrates more
+components. It is very important that LumoSQL not attempt to replicate these
+other documentation efforts that are kept up to date along with the corresponding code.
+
+# Text Standards and Tools
+
+LumoSQL documentation will be written in [Github-flavoured
+Markdown](https://github.github.com/gfm/) as supported by many tools including
+the well-known [Pandoc](https://pandoc.org), version 2.0 or higher. LumoSQL documentation will not be
+highly specific to any system. The main extension Github-flavoured Markdown
+(GFM) adds is tables and code blocks, and a single switch in Pandoc can change
+that dependency.
+
+Text encoding will be [UTF-8](https://en.wikipedia.org/wiki/UTF-8) . Here is
+one [expert anecdote about why UTF-8 matters](https://yihui.org/en/2018/11/biggest-regret-knitr/).
+
+Versions of Pandoc earlier than 2.0 did not support Markdown well as an output format, and the 
+Lua extension system was insufficient for LumoSQL's HTML generation needs.
+
+One difference between Pandoc Markdown and GFM is the number of spaces for nested lists. Two
+spaces are sufficient for GFM, but Pandoc requires four spaces.
+
+# Diagram Standards and Tools
+
+## LumoSQL Diagram Signature
+
+The LumoSQL Diagram Signature is identical to the LumoSQL image signature. It should be 
+placed on the bottom right hand corner of all diagrams created for LumoSQL, but not on
+diagrams from other sources unless modified for LumoSQL.
+
+## Using the LumoSQL Diagram Library
+
+The file images/lumo-diagram-library.odg is a LibreOffice Draw document containing all 
+the elements likely needed for LumoSQL technical diagrams. If you find that you need to
+add a new element when making a diagram, you should also add it to this document.
+
+The lumo-signature file is to be added to the base of all LumoSQL diagrams and images.
+It contains the logo and copyright string.
+
+All other diagrams in images/ are PNG format final diagrams and SVG format process
+diagrams kept for ease of editing, as exported by LibreOffice, inkscape and others.
+
+## Adding Diagrams
+
+The current process for making diagrams is as follows.
+
+1. Make in LibreOffice Draw.
+1.1 Reset corners of box elements to their proper radii (LibreOffice modifies this when scaling boxes).
+  1.2. Export as SVG.
+2. Convert to png and add signature.
+  4.1 Trim borders and output: `$ convert -density 200 -trim MyLbreOfficeOutput.svg MyNewDiagram.png`
+  4.2 Re-border with space for the logo(adjust border as required if the signature doesn't fit): `$ convert MyNewDiagram.png -bordercolor white -border 40x40 -gravity south -splice 0x80 MyNewDiagram.png`
+  4.3 Add logo and copyright information: `$ composite -density 200 -gravity SouthEast lumo-signature.svg MyNewDiagram.png MyNewDiagram.png`
+
+## Diagram Style Guide
+
+Colour palette: Libreoffice 'standard'.
+Fonts: *Source (Han) Sans Medium* or *Noto Sans Medium* due to their on-screen clarity and good language support (both are 100% compatible)
+Corner radii: OS and large container boxes: 0.4, small box elements: 0.25
+
+# Image Standards and Tools
+
+Images for LumoSQL documentation will be stored in /images/ and the
+filenames should start with `lumo-` . PNG should be the default image format, 
+followed by JPG. 
+
+Include attribution in the alt-text tag. All images should have attribution,
+even if the LumoSQL project provided them.  The caption should be left out if
+the image is self-evident and the alt-text also explains what the image is, 
+This example is approximately from the top of this chapter:
+
+```
+![Optional caption, eg "Chart of Badgers vs Profit"](./images/lumo-doc-standards-intro.jpg "Image from Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Chinese_books_at_a_library.jpg")
+```
+
+# Previewing Markdown before Pushing
+
+It's best to check syntax before pushing changes, which means rendering
+Markdown into HTML that is hopefully close to what Github produces. Here are three ways of doing that:
+
+* The Makefile and support files in bin/ uses Pandoc to render the GFM to HTML in /tmp .
+* The excellent [Editor.md](https://github.com/pandao/editor.md) does a great job of rendering,
+as can be seen at [The Online Installation](https://pandao.github.io/editor.md/en.html) . You can paste GFM into it and see it rendered, WYSIWYG-style. You can download the HTML for
+Editor.md and run it locally. (Editor.md is also an editor, and it adds its own features, but you don't need to use it for that.)
+* You can use the Preview button on the Github user interface, for people whose workflow that suits.
+
+# Copyright for LumoSQL Documentation
+
+LumoSQL documentation is original and copyrighted under the 
+[Creative Commons By-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/legalcode), 
+except where indicated. Mostly it's better to link to the original, but if you
+need to cite paragraphs of someone else's documentation then attribute, and if
+more, check the license on the original.
+
+The Creative Commons copyright applies to all LumoSQL documentation media.
+
+Some documentation or media brings conditions of use with it, especially
+attribution, and this must be respected.
+
+# Metadata Header for Text Files
+
+The first lines of all LumoSQL documentation files should always be something like this:
+
+```
+   <!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
+   <!-- SPDX-FileCopyrightText: 2020 The LumoSQL Authors -->
+   <!-- SPDX-ArtifactOfProjectName: LumoSQL -->
+   <!-- SPDX-FileType: Documentation -->
+   <!-- SPDX-FileComment: Original by Dan Shearer, 2020 -->
+```
+
+# Human Languages - 人类语言
+
+English is currently the main documentation language. Others are welcome, and
+not just as translations. For example, embedded SQL is particularly important
+in China and we welcome original content. To make it feel welcoming, we have tried
+to make all the illustrative images in LumoSQL inclusive of chinese language.
+
+# Creating and Maintaining Table of Contents
+
+LumoSQL had to make a decision about creating navigable ToC indexes. We would rather not 
+write our own tools or scripts. At the moment the following is what we have.
+
+The problem we have is summarised in a [well-known Github bug report](https://github.com/isaacs/github/issues/215):
+
+> When I see a manually generated table of contents, it makes me sad.
+> When I see a huge README that is impossible to navigate without it, it makes me even sadder.
+> LaTeX has it. Gollum has it. Pandoc has it. So why not Github Format Markdown?
+
+**LumoSQL Decision as of March 2020**: ToC Markdown must appear in the raw markdown. That means a TOC
+needs to be created and then inserted into the original source markdown file
+rather than automatically generated as part of an online rendering process or offline pipeline.
+
+**Non-markdown metadata won't work:** With Pandoc, when writing, say, a report
+in Markdown, a tiny bit of metadata at the top of the file allows us to say
+`\tableofcontents`  and `/usr/bin/pandoc` will then produce a beautiful PDF,
+and also other formats such as HTML.  However, LumoSQL documentation needs to
+be processed by renderers that are a lot less sophisticated than Pandoc,
+including the Github markup processor. So we can't rely on metadata.
+
+**Pandoc's Markdown output is improving but not yet good enough:** Pandoc can 
+read Markdown and output Markdown, including a ToC.  A command such as 
+
+```pandoc --standalone -f gfm -t gfm --toc -o lumo-output.md -i lumo-input.md```
+
+is supposed to work and probably does, we just haven't seen it yet. Pandoc's Markdown
+output used to be poor, but since version 2.0 is has improved a lot. Pandoc --toc is
+hopefully the eventual answer, although as of 2.9 it doesn't seem to work at all, despite 
+the documentation claiming it does.
+
+**We are left with ad-hoc processing solutions for now:**
+
+* Use the Github API: The most practical solution we have for now is the
+[github-markdown-toc](https://github.com/ekalinin/github-markdown-toc) bash
+script:
+
+```
+    $ https://raw.githubusercontent.com/ekalinin/github-markdown-toc/master/gh-md-toc
+    $ ./gh-md-toc some-lumosql-document.md > /tmp/toc.md
+```
+
+Then insert the file /tmp/toc.md into the document using your editor. It's not
+a pretty operation but given all the other advantages of Markdown it seems a
+small price to pay. This script can now be found in ```www/bin/gh-md-toc``` .
+It uses the Github API and therefore produces canonical results, so that means
+it needs internet access. After more testing, perhaps we can trust the
+`--insert` option and then include gd-md-toc in the documentation Makefile.
+
+The way API works is made clear in the comments:
+
+	# Converts local md file into html by GitHub
+	# $ curl -X POST --data '{"text": "Hello world github/linguist#1 **cool**, and #1!"}' https://api.github.com/markdown
+	# <p>Hello world github/linguist#1 <strong>cool</strong>, and #1!</p>'"
+
+gh-md-toc will insert a TOC between these markers:
+
+```
+    <!--ts-->
+    <!--te-->
+```
+
+meaning TOC could be handled in the Makefile, but that requires further thought.
+
+* There are also options for doing Markdown TOC in editors such as vim, for example [vim-markdown-toc](https://github.com/mzlogin/vim-markdown-toc)
+
+* Editor.md, referred to in the "Previewing Markdown Before Pushing" section
+above, will generate a table of contents where it sees the token `[TOC]` and a
+dropdown index TOC menu where it sees `[TOCM`. However since the output is HTML
+not markdown it is not so useful to LumoSQL (but it is very beautiful.)
+
+# Tidying Markdown (mostly not required)
+
+Tidying is about automatically adjusting the whitespace, pagebreaks and general formatting 
+to be neat and consistent. But maybe you don't even need to, just write tidy 
+text in the first place. 
+
+If you want to clean up someone else's Markdown, then stop and ask first.
+Automated cleanups and prettiers change hundreds of lines in a file without any
+effect on the output, and that makes a diff impossible to review, effectively
+rebasing it and destroying the history.
+
+The documentation Makefile is not going to include any Markdown tidying because
+of the potential for making things worse. As of version 2.0 Pandoc works better
+for cleaning up markdown but isn't perfect. Parameters to experiment with include:
+
+```
+  -t gfm            (triggers a few defaults, including headers in ATX style)
+  --wrap=preserve   (mostly limits changes to making headings ATX style)
+  --columns=85      (stops most links breaking in editors doing syntax highlighting)
+```
+  
@@ -0,0 +1,27 @@
+<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
+<!-- SPDX-FileCopyrightText: 2020 The LumoSQL Authors -->
+<!-- SPDX-ArtifactOfProjectName: LumoSQL -->
+<!-- SPDX-FileType: Documentation -->
+<!-- SPDX-FileComment: Original by Dan Shearer, 2020 -->
+
+
+ ![](./images/lumo-alien-language-intro.jpg "Confused Red Panda")
+
+All You Need to Know
+====================
+
+At the LumoSQL project we are improving something called SQLite that most people
+depend on, but are not aware of. 
+
+Choose Your Own Adventure
+=========================
+
+If you want to be happier, you can see [More Pictures of Cute Red Pandas]()
+
+> Or, you can choose to [Learn a bit about SQLite]()
+
+If you want to 
+
+
+
+
@@ -0,0 +1,145 @@
+<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
+<!-- SPDX-FileCopyrightText: 2020 The LumoSQL Authors -->
+<!-- SPDX-ArtifactOfProjectName: LumoSQL -->
+<!-- SPDX-FileType: Documentation -->
+<!-- SPDX-FileComment: Original by Dan Shearer, 2020 -->
+
+
+Table of Contents
+=================
+
+   * [LumoSQL Implementation](#lumosql-implementation)
+   * [Table of Contents](#table-of-contents)
+   * [Changes to SQLite](#changes-to-sqlite)
+      * [Lockfile/tempfile Pushed to Backend](#lockfiletempfile-pushed-to-backend)
+      * [SQLite API Interception Points](#sqlite-api-interception-points)
+      * [SQLite Virtual Machine Layer](#sqlite-virtual-machine-layer)
+
+
+LumoSQL Implementation
+======================
+
+![](./images/lumo-implementation-intro.jpg "Metro Station Construction Futian Shenzhen China, CC license, https://www.flickr.com/photos/dcmaster/36740345496")
+
+
+# Changes to SQLite
+
+## Lockfile/tempfile Pushed to Backend
+
+SQLite API Interception Points
+------------------------------
+
+The process LumoSQL has largely completed as of March 2020 is:
+
+1. Identify the correct API choke points to control, then
+2. Find useful chunks of code we want to switch between at these choke
+points to demonstrate the design.
+
+   The API interception points are:
+
+  1. Setup APIs/commandline/Pragmas, where we pass in info about what
+front/backends we want to use or initialise. Noting that SQLite is
+zero-config and so supplying no information to LumoSQL must always be an option.
+Nevertheless, if a user wants to select a particular backend, or have
+encryption or networking etc there will be some setup. Sqlite.org provides a
+large number of controls in pragmas and the commandline already.
+
+  2. SQL processing front ends. Code exists (see [Relevant Codebases](./lumo-relevant-codebases.md)
+that implements MySQL-like behaviour in parallel with supporting SQLite semantics.
+There is a choice codebases to do that with, covering different approaches to the problem.
+
+  3. Transaction interception and handling, which in the case of the LMDB
+backend will be pass-through but in other backends may be for replicated
+storage, or backup. This interception point would be in ```wal.c``` if all
+backends used a writeahead log and used it in a similar way, but they do not.
+Instead this is where the new ```backend.c``` API interception point will be
+used - see further down in this document.  This is where, for example, we can
+choose to add replication features to the standard SQLite btree storage
+backend.
+
+  4. Storage backends, being a choice of native SQLite btree or LMDB today, and
+swiftly after that other K-V stores. This is the choke point where we expect to
+introduce [libkv](./lumo-relevant-codebases#libkv), or a modification of libkv.
+
+  5. Network layers, which will be at all of the above, depending whether they
+are for client access to the parser, or replicating transactions, or being
+plain remote storage etc.
+
+In most if not all cases it needs to be possible to have multiple choices
+active at once, including the obvious cases of multiple parsers and multiple
+storage backends, for example. This is because one of the important new use
+cases for LumoSQL will be conversion between formats, dialects and protocols.
+
+Having designed the API architecture we can then produce a single LumoSQL tree
+with these choke point APIs in place and proof of two things:
+
+1. ability to have stock-standard identical SQLite APIs and on-disk
+btree format, and
+
+2. an example of an alternative chunk of code at each choke point:
+MySQL; T-pipe writing out the transaction log in a text file; LMDB .
+Not necessarily with the full flexibility of having all code active at
+once if that's too hard (ie able to take any input SQL and store in
+any backend)
+
+   and then, having demonstrated we have a major step forward for the entire world,
+
+3. Identify what chunks of SQLite we really don't want to support any more.
+   Like maybe the ramdisk pragma given that we can/should/might have an
+in-memory storage backend, which initially might just be LMDB with overcommit
+switched off. This is where testing and benchmarking really matters.
+
+SQLite Virtual Machine Layer
+----------------------------
+
+In order to support multiple backends, LumoSQL needs to have a more general way
+of matching capabilities to what is available, whether a superset or a subset of
+what SQLite currently does. This needs to be done in such a way that it remains
+easy to track upstream SQLite.
+
+The SQLite architecture has the SQL virtual machine in the middle of everything:
+
+`vdbeapi.c` has all the functions called by the parser
+`vdbe.c` is the implementation of the virtual machine, and and it is
+from here that calls are made into btree.c
+
+All changes to SQLite storage code will be in vdbe.c , to insert an
+API shim layer for arbitary backends. All BtreeXX function calls will
+be replaced with backendXX calls.
+
+`lumo-backend.c` will contain:
+
+* a switch between different backends
+* a virtual method table of function calls that can be stacked, for
+layering some generic functionality on any backends that need it as
+follows
+
+`lumo-index-handler.c` is for backends that need help with index
+and/or key handling. For example some cannot have arbitary length
+keys, like LMDB. RocksDB and others do not suffer from this.
+`lumo-transaction-handler.c` is for backends that do not have full
+transaction support. RocksDB for example is not MVCC, and this will
+add that layer. Similarly this is where we can implement functionality
+to upgrade RO transactions to RW with a commit counter.
+`lumo-crypto.c` provides encryption services transparently backends
+depending on a decision made in lumo-backend.c, which will cover
+everything except backend-specific metadata. Full disk encryption of
+everything has to happen at a much lower layer, like SQLite's idea of
+a VFS. The VFS concept will not translate entirely, because the very first
+alternative backend is based on mmap, and which will need special handling. So we are for now expecting to implement a lumo-vfs-mmap.c and a lumo-vfs.c .
+`lumo-vfs.c` provides VFS services to backends, and is invoked by
+backends. `lumo-vfs.c` may call lumo-crypto for full file encryption
+including backend metadata depending on the VFS being implemented.
+
+Backend implementations will be in files such as `backend-lmdb.c`,
+`backend-btree.c`, `backend-rocksdb.c` etc.
+
+This new architecture means:
+
+1. Features such as WALs or paging or network paging etc are specific to the backend, and invisible to any other LumoSQL or SQLite code.
+2. Bug-for-bug compatibility with the orginal SQLite btree.c can be maintained (except in the case of encryption, which no open source users have access to anyway.)
+3. New backends with novel features (and LMDB is novel enough, for a first example!) can be introduced without disturbing other code, and being able to be benchmarked and tested safely.
+
+
+
+
@@ -0,0 +1,177 @@
+<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
+<!-- SPDX-FileCopyrightText: 2020 The LumoSQL Authors -->
+<!-- SPDX-ArtifactOfProjectName: LumoSQL -->
+<!-- SPDX-FileType: Documentation -->
+<!-- SPDX-FileComment: Original by Dan Shearer, 2020 -->
+
+![](./images/lumo-legal-aspects-intro.png "XXXXXXXX")
+
+Table of Contents
+=================
+
+   * [Table of Contents](#table-of-contents)
+   * [LumoSQL Licensing](#lumosql-licensing)
+   * [Why MIT? Why Not MIT?](#why-mit-why-not-mit)
+   * [In Detail: Patents, MIT and Apache 2.0](#in-detail-patents-mit-and-apache-20)
+   * [In Detail: the SQLite Public Domain Licensing Problem](#in-detail-the-sqlite-public-domain-licensing-problem)
+   * [History and Rationale](#history-and-rationale)
+   * [Encryption Legal Issues](#encryption-legal-issues)
+   * [LumoSQL Requirements and Decisions](#lumosql-requirements-and-decisions)
+
+# LumoSQL Licensing
+
+SQLite is released as [Public Domain](https://www.sqlite.org/copyright.html).
+In order to both respect and improve on this, the [LumoSQL Project Aims](lumo-projet-aims.md) make this promise to SQLite users:
+
+> LumoSQL will not come with legal terms less favourable than SQLite. LumoSQL
+> will try to improve the legal standing and safety worldwide as compared to
+> SQLite.
+
+To achieve this LumoSQL has made these policy decisions:
+
+* New LumoSQL code is licensed under the [MIT License](https://opensource.org/licenses/MIT), as used by many large corporations worldwide
+* LumoSQL documentation is licensed under the [Creative Commons](https://creativecommons.org/licenses/by-sa/4.0/)
+* Existing and future SQLite code is relicenced by the act of being distributed under the terms of the MIT license
+* Open Source code from elsewhere, such as backend data stores, remain under the terms of the original license except where distribution under MIT effectively relicenses it
+* Open Content documentation from elsewhere remains under the terms of the original license. No documentation is used in LumoSQL unless it can be freely mixed with any other documentation. 
+
+The effect of these policy decisions are:
+
+* LumoSQL users gain certainty as compared with SQLite users because they have a
+license that is recognised in jurisdictions worldwide. 
+
+* LumoSQL users do not lose any rights. For example, the MIT license permits use
+with fully proprietary software, by anyone. Whatever users do today with
+SQLite they can continue to do with LumoSQL. 
+
+* While MIT does require users to include a copy of the license and the
+copyright notice, the MIT license also permits the user to remove the
+sentence requiring this from the license (thus re-licensing LumoSQL.) 
+
+# Why MIT? Why Not MIT?
+
+Github's [License Chooser for MIT](https://choosealicense.com/licenses/mit/) describes the MIT as:
+
+> A short and simple permissive license with conditions only requiring
+> preservation of copyright and license notices. Licensed works, modifications,
+> and larger works may be distributed under different terms and without source
+> code. 
+
+The MIT license aims to get out of the way of software developers, and despite
+some flaws it appears to do so reliably.
+
+In addition, MIT is popular. As documented [on Wikipedia](https://en.wikipedia.org/wiki/MIT_License) MIT appears to be the most-used open source licenses. Popularity matters, because all licenses are in part a matter of community belief and momentum.  Microsoft releasedi
+ [.NET Core](https://en.wikipedia.org/wiki/.NET_Core) and Facebook released
+[React](https://en.wikipedia.org/wiki/React_(web_framework)) under the MIT, and
+these companies are very cautious about the validity of the licenses they use.
+
+In a forensic article analysing [the 171 words of the MIT license](https://writing.kemitchell.com/2016/09/21/MIT-License-Line-by-Line.html) as they apply in the US, lawyer Kyle E. Mitchell writes in his conclusion:
+
+> The MIT License is a legal classic. The MIT License works. It is by no means
+> a panacea for all software IP ills, in particular the software patent
+> scourge, which it predates by decades. But MIT-style licenses have served
+> admirably... We’ve seen that despite some crusty verbiage and lawyerly
+> affectation, one hundred and seventy one little words can get a hell of a lot
+> of legal work done, clearing a path for open-source software through a dense
+> underbrush of intellectual property and contract.
+
+Overall, in LumoSQL we have concluded that the MIT license is solid and it is
+better than any other mainstream license for existing SQLite users. It is
+certainly better than the SQLite Public Domain terms.
+
+# In Detail: Patents, MIT and Apache 2.0
+
+LumoSQL has a narrower range of possible licenses because of its nature as an
+embedded library, where it is tightly combined with users' code. This means
+that the terms and conditions for using LumoSQL have to be as open as possible
+to accommodate all the different legal statuses of software that users combine
+with LumoSQL. And the status that worries corporate lawyers the most is
+"unknown". What if you aren't completely sure of the patent status of the
+software, or the intentions of your company? And where there is uncertainty,
+users are wise not to commit.
+
+LumoSQL has tried hard to bring more certainty, not less, and this is tricky when it comes to patents.
+
+Software patents are an issue in many jurisdictions. The MIT license includes a
+grant of patents to its users, as [explained by the Open Source Initiative](https://opensource.com/article/18/3/patent-grant-mit-license),
+including in the grant "... to deal in the software without restriction." While the
+Apache 2.0 license specifically grants patent rights (as do the GPL and MPL), they are not more generous than the MIT license. There is some debate that varies by jurisdiction about exactly how clear the patent grant is, as documented in [the patent section on Wikipedia](https://en.wikipedia.org/wiki/MIT_License#Relation_to_patents).
+
+The difficulty is that the Apache 2.0 (similar to the GPL and MPL) license also
+includes a *patent retaliation* clause:
+
+> If You institute patent litigation against any entity (including a
+> cross-claim or counterclaim in a lawsuit) alleging that the Work or a
+> Contribution incorporated within the Work constitutes direct or contributory
+> patent infringement, then any patent licenses granted to You under this
+> License for that Work shall terminate as of the date such litigation is
+> filed.  
+
+The intention is progressive and seemingly a Good Thing - after all, unless you
+are a patent troll who wants more pointless patent litigation? However the
+effect is that the Apache 2.0 license brings with it the requirement to check
+for patent issues in any code it is connected to. It also is possible that the
+company using LumoSQL actually does want the liberty to take software patent
+action in court. So whether by the risk or the constraint, Apache 2.0 brings with it
+significant change compared to SQLite's license terms in countries that recognise them. 
+
+MIT has only a patent grant, not retaliation. That is why LumoSQL does not use the Apache 2.0 license.
+
+
+# In Detail: the SQLite Public Domain Licensing Problem
+
+There are numerous reasons other than licensing why SQLite is less open source
+than it appears, and these are covered in the [LumoSQL Landscape](./lumo-landscape.md). As to licensing, SQLite is distributed as
+Public Domain software, and this is mentioned by D Richard Hipp in his [2016 Changelog Podcast Interview](https://changelog.com/podcast/201). Although he is aware of the problems, Hipp has decided not to introduce changes.
+
+The [Open Source Initiative](https://opensource.org/node/878) explains the Public Domain problem like this:
+
+> “Public Domain” means software (or indeed anything else that could be
+> copyrighted) that is not restricted by copyright. It may be this way because
+> the copyright has expired, or because the person entitled to control the
+> copyright has disclaimed that right. Disclaiming copyright is only possible
+> in some countries, and copyright expiration happens at different times in
+> different jurisdictions (and usually after such a long time as to be
+> irrelevant for software). As a consequence, it’s impossible to make a
+> globally applicable statement that a certain piece of software is in the
+> public domain.
+
+Germany and Australia are examples of countries in which Public Domain is not
+normally recognised which means that legal certainty is not possible for users
+in these countries who need it or want it. This is why the Open Source
+Initiative does not recommend it and nor does it appear on the [SPDX License List](https://spdx.org/licenses/).
+
+The SPDX License List is a tool used by many organisations to understand where they stand legally with the millions of lines of code they are using. David A Wheeler has produced a helpful [SPDX Tutorial](https://github.com/david-a-wheeler/spdx-tutorial) . All code and documentation developed by the LumoSQL project has a SPDX identifier.
+
+# History and Rationale
+
+SQLite Version 1 used the gdbm key-value store. This was under the GPL and
+therefore so was SQLite. gdbm is limited, and is not a binary tree. When
+Richard Hipp replaced it for SQLite version 2, he also dropped the GPL. SQLite
+has been released as "Public Domain"
+
+
+# Encryption Legal Issues
+
+SQLite is not available with encryption. There are two common ways of adding encryption to SQLite, both of which have legal implications: 
+
+1. Purchasing the [SQLite Encryption Extension](https://www.hwaci.com/sw/sqlite/see.html)(SEE) from Richard Hipp's company Hwaci. The SEE is proprietary software, and cannot be used with open source applications.
+2. [SQLcipher](https://www.zetetic.net/sqlcipher/) which has a open core model. The BSD-licensed open source version requires users to publish copyright notices, and the more capable commercial editions are available on similar terms to SEE, and therefore cannot be used with open source applications. 
+
+There are many other ways of adding encryption to SQLite, some of which are listed in the [Knowledgebase Relevant to LumoSQL](./lumo-relevant-knowledgebase.md).
+
+The legal issues addressed in LumoSQL encryption include:
+
+* Usability. Encryption should be available with LumoSQL in the core source code without having to consider any additional legal considerations.
+* Unemcumbered. No encryption code is used that may reasonably be subject to action by companies (eg copyright claims) or governments (eg export regulations). Crypto code will be reused from known-safe sources.
+* Compliant with minimum requirements in various jurisdictions. With encryption being legally mandated or strongly recommended in many jurisdictions for particular use cases (banking, handling personal data, government data, etc) there are also minimum requirements. LumoSQL will not ship crypto code that fails minimum crypto requirements.
+* Conspicuously *non-compliant* with maximum requirements in any jurisdiction. LumoSQL will not limit its encryption mechanisms or strength to comply with any legal restrictions, in common with other critical open source infrastructure. LumoSQL crypto tries to be as hard to break as possible regardless of the use case or jurisdiction.
+
+
+Local laws
+EU laws
+Facts of Privacy and security
+
+# LumoSQL Requirements and Decisions
+
+
@@ -0,0 +1,301 @@
+<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
+<!-- SPDX-FileCopyrightText: 2020 The LumoSQL Authors -->
+<!-- SPDX-ArtifactOfProjectName: LumoSQL -->
+<!-- SPDX-FileType: Documentation -->
+<!-- SPDX-FileComment: Original by Claudio Calvelli, March 2020 -->
+
+
+Table of Contents
+=================
+
+   * [Not-Forking Upstream Source Code Tracker](#not-forking-upstream-source-code-tracker)
+   * [Table of contents](#table-of-contents)
+   * [Upstream definition file <a name="user-content-upstream"></a>](#upstream-definition-file-)
+      * [git](#git)
+      * [download](#download)
+   * [Modification definition file <a name="user-content-modification"></a>](#modification-definition-file-)
+   * [Example Configuration directory <a name="user-content-example"></a>](#example-configuration-directory-)
+   * [Not-forking tool <a name="user-content-tool"></a>](#not-forking-tool-)
+
+Not-Forking Upstream Source Code Tracker
+========================================
+
+The LumoSQL project incorporates software from other projects and some of that
+software needs some modifications.  Rather than fork our own version, we have
+developed a mechanism which we call "not-forking" to semi-automatically track
+upstream changes.
+
+The mechanism is similar to applying patches; however patches need to be
+constantly updated as upstream sources changes, and the not-forking mechanism
+helps with that. The overall effect is something like git cherry-picking, 
+except that it also copes with:
+* human-style software versioning
+* code that is not maintained in the same git repo
+* code that is not maintained in git, but is just patches or in some other VCS
+* custom processing that is needed to be run for a specific patch
+* failing with an error asking for human intervention to solve differences with upstream
+
+etc.
+
+Each project tracked by not-forking needs to define what to track, and what
+changes to apply. This is done by providing a number of files in a directory;
+the minimum requirement is an upstream definition file; other files can also be
+present indicating what modifications to apply (if none are provided, the
+upstream sources are used unchanged).
+
+# Upstream definition file <a name="upstream"></a>
+
+The file `upstream.conf` has a simple "key = value" format with one such
+key, value pair per line: blank lines and lines whose first nonblank
+character is a hash (`#`) are ignored; long lines can be split into multiple
+lines by ending a line with a backslash meaning continuation into the
+next line.
+
+There is a special line format to indicate conditionals; currently, the
+only condition which can be tested is whether the version number is in
+a specified range, using the syntax:
+
+```
+if version \[>\[=\] FIRST\_VERSION\] \[<\[=\] LAST\_VERSION\]
+...
+[else ...]
+endif
+```
+
+If a key is present more than once, the last value seen wins; therefore,
+it is possible to define a key inside a conditional block, and then to
+define it again outside the block to provide a default value.
+
+The only key which must be present is `vcs`, and there is no default.
+It indicates what kind of version control system to use to obtain upstream
+sources; the value is the name of a version control module defined by the
+not-forking mechanism; at the time of writing `git` and `download` are valid
+values; in general, the documentation for the corresponding version control
+module defines what else is present in the `upstream.conf` file; this document
+describes briefly the configuration for the above two modules.
+
+Optionally, two other keys can be present: `compare` and `subtree`.
+
+The `compare` key indicates what method to use to compare two different
+version numbers; if omitted, it default to `version` which compares
+"normal" software version numbers: sequences of digits compare
+numerically, and sequences of letters compare alphabetically, with the
+exception that a suffix "-alpha" or "-beta" cause the version to be
+considered before the string without such suffix: examples of version
+numbers in order are:
+
+- `0.9a` < `0.9z` < `0.10` < `1.0` < `1.1-alpha` < `1.1-beta` < `1.1` < `1.1a`
+
+This definition will even cope with the numbering scheme used by TeX and
+METAFONT which are "Pi" and "e" respectively. The definition can be extended to
+deal with version numbering schemes used by normal software, however it will
+never work correctly with the version numbers used by some software such as the
+[CLC-INTERCAL](https://en.wikipedia.org/wiki/INTERCAL#Version_Numbers)
+compiler.
+
+The `subtree` key indicates a directory inside the sources to use instead
+of the top level.
+
+## git
+
+The upstream sources are available via a public git repository; the following
+keys need to be present:
+
+- `repos` (or `repository`) is a valid argument to the `git clone` command.
+- optionally, `branch` to select a branch within the repository.
+- optionally, `version` to convert a version string to a tag: the value is
+either a single string which is prefixed to the version number, or two
+strings separated by space, the first one is prefixed and the second appended.
+- optionally, `user` and `password` can be specified to obtain access to the
+repository (this is currently not implemented, all repositories must be
+accessible without authentication).
+
+A software version can be identified by a generic git commit ID, or by a
+version string similar to the one described for the `compare` key, if the
+repository offers that as an option.
+
+## download
+
+The upstream sources are released as published versions and downloaded
+directly; the following keys need to be present:
+
+- `uri` indicates where to obtain these sources, and can contain the special
+symbol `%V` to indicate the version or `%%` to indicate just a percentage
+sign (`%`)
+
+TBC - we also need to say how to unpack the sources etc
+
+# Modification definition file <a name="modification"></a>
+
+There can be zero or more modification definition files in the configuration
+directory; each file has a name ending in `.mod` and they are processed
+in lexycographic order according to the "C" locale (rather than the current
+locale, to guarantee consistent ordering). Note that only files are
+considered; if the configuration directory contains subdirectories, these
+are ignored, but files in there can be referenced by the `.mod` files.
+
+The contents of each modification definition file are an initial part with
+format similar to the Upstream definition file described above ("key = value"
+pair, possibly with conditional blocks); this initial part ends with a line
+containing just dashes and the rest of the file, referred to as "final
+part", is interpreted based on information from the initial part.
+
+The following keys are currently understood:
+
+- `version`: the value has the same format as the condition on the
+`if version` specification in the Upstream definition file: one or two
+strings separated by whitespace, one of the strings starting with `<`
+or `<=` and the other starting with `>` or `>=` to indicate a maximum,
+minimum or range of versions.  One use of this key is to indicate that
+a modification is only necessary up to a particular version, because
+for example that modification has been accepted by upstream and is
+no longer necessary.  Another use of this key is to identify versions
+in which substantial upstream changes make it difficult to specify a
+modification which works for every possible version. Specifying this
+keyword is essentially equivalent to put the whole `.mod` file in
+a conditional.
+- `method`; the method used to specify the modification; currently, the
+value can be either `patch`, indicating that the final part of the file is
+in a format suitable for passing as standard input to the "patch" program;
+or `replace` indicating that one or more files in the upstream must be
+completely replaced; the final part of the file contains one or more
+lines with format "old-file = new-file", where both are relative paths,
+the first relative to the root of the extracted upstream sources; the
+second path is relative to the configuration directory.
+
+Other keys are interpreted depending on the value of `method`; there are
+currently no other keys for the `replace` method, and the following for
+the `patch` method:
+
+- `options`: options to pass to the "patch" program (default: "-Nsp1")
+- `list`: extra options to the "patch" program to list what it would do
+instead of actually doing it (this is used internally to figure out
+what changes; the default currently assumes the "patch" program provided
+by most Linux distributions)
+
+# Example Configuration directory <a name="example"></a>
+
+Obtaining SQLite sources and replacing btree.c and btreeInt.h with the ones
+from sqlightning, and applying a patch to vdbeaux.c:
+
+File `upstream.conf`:
+
+```
+vcs   = git
+repos = https://github.com/sqlite/sqlite.git
+```
+
+File `btree.mod`:
+
+```
+method = replace
+--
+src/btree.c    = files/btree.c
+src/btreeInt.h = files/btreeInt.h
+```
+
+File `vdbeaux.mod`:
+```
+method = patch
+--
+--- sqlite-git/src/vdbeaux.c    2020-02-17 19:53:07.030886721 +0100
++++ new/src/vdbeaux.c      2020-03-21 13:52:24.861586555 +0100
+@@ -2778,7 +2778,7 @@
+      for(i=0; i<db->nDb; i++){
+        Btree *pBt = db->aDb[i].pBt;
+        if( sqlite3BtreeIsInTrans(pBt) ){
+-        char const *zFile = sqlite3BtreeGetJournalname(pBt);
++        char const *zFile = BackendGetJournal(pBt);
+          if( zFile==0 ){
+            continue;  /* Ignore TEMP and :memory: databases */
+          }
+```
+
+Files `files/btree.c` and `files/btreeInt.h`: the new contents.
+
+A more complete example can be found in the directory "not-fork.d/sqlite"
+which tracks upstream updates from SQLite.
+
+# Not-forking tool <a name="tool"></a>
+
+The `tool` directory contain a script, `not-fork` which runs the not-forking
+mechanism on a directory.  Usage is:
+
+not-fork \[OPTIONS\] \[NAME\]...
+
+where the following options are available:
+
+- `-i`INPUT\_DIRECTORY (or `--input=`INPUT\_DIRECTORY)
+is a not-forking configuration directory as specified
+in this document; default is `not-fork.d` within the current directory
+- `-o`OUTPUT\_DIRECTORY (or `--output=`OUTPUT\_DIRECTORY)
+is the place where the modified upstream sources will
+be stored, and it can be either a directory created by a previous run of
+this tool, or a new directory (missing or empty directory); default is
+`sources` within the current directory; note that existing sources in
+this directory may be overwritten or deleted by the tool
+- `-c`CACHE\_DIRECTORY (or `--cache=CACHE\_DIRECTORY`)
+is a place used by the program to keep downloads
+and working copies; it must be either a new (missing or empty) directory
+or a directory created by a orevious run of the tool; default is
+`.cache/LumoSQL/not-fork` inside the user's home directory
+- `-v`VERSION (or `--version=`VERSION) will retrieve the specified VERSION
+of the next NAME (this option must be repeated for each NAME, in the
+assumption that different projects have different version numbering)
+- `-c`COMMIT\_ID (or `--commit=`COMMIT\_ID) is similar to `-v` but
+only works for version control modules which support commit identifiers,
+and will retrieve the corresponding commit for the next NAME, whether
+or not it has an official version number; this is incompatible with `-v`
+- `-q` (or `--query`) completes all necessary downloads but do not
+extract the sources and apply modifications, instead it shows some
+information about what has been downloaded, including a version number
+if available.
+
+If neither VERSION nor COMMIT\_ID is specified, the default is the latest
+available version, if it can be determined, or else an error message.
+If more than one NAME is specified, VERSION and COMMIT\_ID need to
+be provided before each NAME: the assumption is that different
+software projects use different version numbers.
+
+If one or more NAMEs are specified, the tool will obtain the upstream
+sources as described in INPUT\_DIRECTORY/NAME for each of the NAMEs
+specified, and attempt to apply all the required modifications; if that
+succeeds, OUTPUT\_DIRECTORY/NAME will contain the modified sources ready
+to use; if that fails, an error message will explain the problem and if
+possible suggest corrective action (for example, if `patch` determines
+that a file has changed too much that it cannot figure out how to apply
+a patch supplied, the error message will indicate this and suggest to
+obtain a new patch for that version of the sources).
+
+If no NAMEs are specified, the tool, will process all subdirectories
+of INPUT\_DIRECTORY. In this special case, any VERSION or COMMIT\_ID
+specified will apply to all rather than just the name immediately
+following them.
+
+The tool looks for a configuration file located at
+`$HOME/.config/LumoSQL/not-fork.conf` to read defaults; if the file exists
+and is readable, any non-comment, non-empty lines are processed before
+any command-line options with an implicit `--` prepended and with spaces
+around the first `=` removed, if present: so for example a file containing:
+
+```
+cache = /var/cache/LumoSQL/not-fork
+```
+
+would change the default cache from `.cache/LumoSQL/not-fork` in the user's
+home directory to the above directory inside `/var/cache`; it can still
+be overridden by specifying `-c`/`--cache` on the command line.
+
+The program will refuse to overwrite the output directory if it cannot
+determine that it has been created by a previous run and that files have
+not been modified since; in this case, delete the output directory
+completely, or rename it to something else, and run the program again.
+There is currently no option to override this safety feature.
+
+We plan to add logging to the not-forking tool, in which all messages are
+written to a log file (under control of configuration), while the subset
+of messages selected by the verbosity setting will go to standard output;
+this will allow us to increase the amount of information provided and make
+it available if there is a processing error; however in the current version
+this is just planned, and not yet implemented.
+
@@ -0,0 +1,117 @@
+<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
+<!-- SPDX-FileCopyrightText: 2020 The LumoSQL Authors -->
+<!-- SPDX-ArtifactOfProjectName: LumoSQL -->
+<!-- SPDX-FileType: Documentation -->
+<!-- SPDX-FileComment: Original by Dan Shearer, 2020 -->
+
+
+Table of Contents
+=================
+
+   * [Overall Objective of LumoSQL](#overall-objective-of-lumosql)
+   * [Table of Contents](#table-of-contents)
+   * [Aims](#aims)
+   * [Short Term Goals](#short-term-goals)
+
+
+![](./images/lumo-project-aims-intro.jpg "Mongolian horseback archery, rights request pending from https://www.toursmongolia.com/")
+
+Overall Objective of LumoSQL
+============================
+
+	To create Privacy-compliant Open Source Database Platform with Modern Design and Benchmarking,
+	usable either embedded or online.
+
+This is the guide for every aspect of the project, which will ensure that
+LumoSQL offers features that money can't buy, and drawing together an
+SQLite-related ecosystem.
+
+The rest of this document will be updated frequently in 2020, and over time
+will become more strategic and with less listing of specific new features.
+
+Aims
+====
+
+* SQLite upstream promise: LumoSQL will not fork SQLite, and will offer 100%
+  compatibility with SQLite by default, and contribute to SQLite where possible.
+  This especially includes the SQLite user interface mechanisms of pragmas, 
+  library APIs, and commandline parameters.
+
+* Legal promise: LumoSQL will not come with legal terms less favourable than 
+  SQLite. LumoSQL will try to improve the legal standing and safety worldwide
+  as compared to SQLite.
+
+* Developer contract: LumoSQL will have stable APIs ([Application Programming Interfaces](https://en.wikipedia.org/wiki/Application_programming_interface#Libraries_and_frameworks)) for features found in multiple unrelated SQLite downstream projects:
+  backends, frontends, encryption, networking and more. 
+
+* Devops contract: LumoSQL will reduce risk by making it possible to omit
+  compliation of many features, and will have stable ABIs ([Application Binary Interfaces](https://en.wikipedia.org/wiki/Application_binary_interface))so as to not break dynamically-linked applications.
+
+* Ecosystem creation: LumoSQL will offer consolidated contact, code curation, bug tracking,
+  licensing, and community communications across all these features from
+  other projects. Bringing together SQLite code contributions under one umbrella reduces 
+  technical risk in many ways, from inconsistent use of threads to tracking updated versions.
+
+
+Short Term Goals
+================
+
+* LumoSQL will have three canonical and initial backends: btree (the existing
+SQLite btree, ported to a new backend system); a test backend such as text or
+csv; and the LMDB backend. Control over these interfaces will be through the
+same user interface mechanisms as the rest of LumoSQL, and SQLite.
+
+* LumoSQL will improve SQLite quality and privacy compliance by introducing
+optional on-disk checksums for storage backends including to the original
+SQLite btree format.  This will give real-time row-level corruption detection.
+
+* LumoSQL will improve SQLite quality and privacy compliance by introducing
+optional storage backends that are more crash-resistent, starting with LMDB
+followed by others.
+
+* LumoSQL will improve SQLite integrity in persistent storage by introducing
+optional row-level checksums.
+
+* LumoSQL will provide the benefits of Open Source and an open project
+by continuing to accept and review contributions in an open way, using
+github and having diverse contributors, and being careful to use open
+source licenses
+
+* LumoSQL will improve SQLite design by intercepting APIs at a very small
+number of critical choke-points, and giving the user optional choices at
+these choke points. The choices will be for alternative storage backends,
+front end parsers, encryption, networking and more, all without removing
+the zero-config and embedded advantages of SQLite
+
+* LumoSQL will provide a means of tracking upstream SQLite, by making
+sure that anything other than the API chokepoints can be synched at each
+release, or more often if need be
+
+* LumoSQL will provide updated, public testing tools, with results published
+and instructions for reproducing the test results. This also means
+excluding parts of the LumoSQL test suite that don't apply to new backends
+
+* LumoSQL will provide benchmarking tools, otherwise as per the testing
+tools
+
+* LumoSQL will ensure that new code remains optional by means of modularity at
+compiletime and also runtime. By illustration of modularity, at compiletime
+nearly all 30 million lines of the Linux kernel can be exclude giving just 200k
+lines. Runtime modularity will be controlled through the same user interfaces 
+as the rest of LumoSQL.
+
+* LumoSQL will ensure that new code can all be active at once, eg
+multiple backends or frontends for conversion between/upgrading from one
+format or protocol to another. This is crucial to provide continuity and
+supported upgrade paths for users, for example, users who want to become
+privacy-compliant without disrupting their end users
+
+* Over time, LumoSQL will carefully consider the potential benefits of dropping
+some of the most ancient parts of SQLite when merging from upstream, provided
+it does not conflict with any of the other goals in this document. Eliminating 
+SQLite code can be done by a similar non-forking mechanism as used to keep in synch
+with the SQLite upstream.
+
+
+
+
@@ -0,0 +1,253 @@
+<!-- SPDX-License-Identifier: AGPL-3.0-only -->
+<!-- SPDX-FileCopyrightText: 2020 The LumoSQL Authors, 2019 Oracle -->
+<!-- SPDX-ArtifactOfProjectName: LumoSQL -->
+<!-- SPDX-FileType: Documentation -->
+<!-- SPDX-FileComment: Original by Dan Shearer, 2020 -->
+
+
+Table of Contents
+=================
+
+   * [About LumoSQL](#about-lumosql)
+   * [About the LumoSQL Project](#about-the-lumosql-project)
+   * [LumoSQL Interfaces Are Almost the Same as SQLite](#lumosql-interfaces-are-almost-the-same-as-sqlite)
+   * [Building and Installing LumoSQL](#building-and-installing-lumosql)
+      * [Directory layout](#directory-layout)
+      * [Linux/Unix](#linuxunix)
+         * [Build environment](#build-environment)
+         * [Using the Makefile tool](#using-the-makefile-tool)
+   * [Running LumoSQL](#running-lumosql)
+      * [Windows](#windows)
+      * [Android](#android)
+   * [Speed tests / benchmarking](#speed-tests--benchmarking)
+   * [Which LMDB version?](#which-lmdb-version)
+   * [References](#references)
+
+
+About LumoSQL
+=============
+
+LumoSQL is a combination of two embedded data storage C language libraries:
+[SQLite](https://sqlite.org) and [LMDB](https://github.com/LMDB/lmdb). LumoSQL
+is an updated version of Howard Chu's 2013
+[proof of concept](https://github.com/LMDB/sqlightning) combining the codebases.
+Howard's LMDB library has become an ubiquitous replacement for
+[bdb](https://sleepycat.com/) on the basis of performance, reliability, and
+license so the 2013 claims of it greatly increasing the performance of SQLite
+seemed credible. D Richard Hipp's SQLite is used in thousands of software
+projects, and since three of them are Google's Android, Mozilla's Firefox and
+Apple's iOS, an improved version of SQLite will benefit billions of people.
+
+About the LumoSQL Project
+=========================
+
+LumoSQL was started in December 2019 by Dan Shearer, who did the original source
+tree archaeology, patching and test builds. Keith Maxwell joined shortly after
+and contributed version management to the Makefile and the benchmarking tools.
+
+A main goal of the LumoSQL Project is to create and maintain an improved version of
+SQLite without forking it, although there are other goals as well.
+
+LumoSQL is supported by the [NLNet Foundation](https://nlnet.nl).
+
+If you are interesting in contributing to LumoSQL please see [CONTRIBUTING](/CONTRIBUTING.md).
+
+
+
+LumoSQL Interfaces Are Almost the Same as SQLite
+================================================
+
+Your interaction with the LumoSQL interface (commandline, PRAGMAs and API) is
+almost identical to SQLite. You use the same APIs, the same command shell
+environment, the same SQL statements, and the same PRAGMAs to work with the
+database created by LumoSQL as you would if you were using SQLite.
+
+To learn how to use SQLite, see the [SQLite Documentation](https://sqlite.org/docs.html).
+
+That said, there are a few small differences between the two interfaces.
+
+# Building and Installing LumoSQL
+
+## Directory layout
+
+In order to build LumoSQL and SQLite and to used different versions of the LMDB
+library, we use the following directory layout:
+
+```
+.
+├── bld-LMDB_?.?.?    Build artifacts for LumoSQL (src and src-lmdb)
+├── bld-SQLite-?.?.?  Build artifacts for sqlite (src-sqlite)
+├── LICENSES          License files, in line with https://reuse.software/spec/
+├── lmdb-backend      C source code to use SQLite with an LMDB backend
+├── src-lmdb          Clone of LMDB source code
+├── src-sqlite        Clone of sqlite.org git mirror
+└── tool              Cut down version of speedtest.tcl
+```
+
+## Linux/Unix
+
+
+### Build environment
+
+On Ubuntu 18.0.4 LTS, Debian Stable (buster), and on any reasonably recent
+Debian or Ubuntu-derived distribution, you need only:
+
+```sh
+sudo apt install git build-essential tcl
+sudo apt build-dep sqlite3
+```
+
+(`apt build-dep` requires `deb-src` lines uncommented in /etc/apt/sources.list).
+
+On Fedora 30, and on any reasonably recent Fedora-derived distribution:
+
+```sh
+sudo dnf install --assumeyes \
+  git make gcc ncurses-devel readline-devel glibc-devel autoconf tcl-devel
+```
+
+The maintainers test building LumoSQL on Debian, Fedora, Gentoo and Ubuntu.
+Container images with the dependencies installed are available at
+<https://quay.io/repository/keith_maxwell/lumosql-build> and the build steps are
+in <https://github.com/maxwell-k/containers>.
+
+### Using the Makefile tool
+
+Start with a clone of this repository as the current directory:
+
+    ```git clone https://github.com/LumoSQL/LumoSQL.git```
+
+To build either (a) specific versions of SQLite or (b) sqlightning using
+different versions of LMDB, use commands like those below changing the version
+numbers to suit. A list of tested version numbers is in the table
+[below](#which-lmdb-version).
+
+```sh
+make bld-SQLite-3.7.17
+make bld-LMDB_0.9.9
+```
+# Running LumoSQL
+
+libraries and a command line shell are built with the following names:
+
+    ```lumosql```
+
+    This is the command line shell. It operates identically to the SQLite sqlite3 shell.
+
+    ```liblumosql```
+
+    This is the library that provides the LumoSQL SQL interface. It is the equivalent of the SQLite libsqlite3 library.
+
+## Windows
+
+LumoSQL is not supported on Windows as of March 2020. We are aiming for May 2020. Want to help?
+
+## Android
+
+LumoSQL is not supported on Android as of March 2020. We are aiming for July 2020. Want to help?
+
+# Speed tests / benchmarking
+
+To benchmark a single binary takes approximately 4 minutes to complete depending
+on hardware.
+
+The instructions in this section explain how to benchmark four different
+versions:
+
+| V.  | SQLite | LMDB   | Repository | Report filename    |
+| --- | ------ | ------ | ---------- | ------------------ |
+| A.  | 3.7.17 | -      | SQLite     | SQLite-3.7.17.html |
+| B.  | 3.30.1 | -      | SQLite     | SQLite-3.30.1.html |
+| C.  | 3.7.17 | 0.9.9  | LumoSQL    | LMDB_0.9.9.html    |
+| D.  | 3.7.17 | 0.9.16 | LumoSQL    | LMDB_0.9.16.html   |
+
+To benchmark the four versions above use:
+
+```sh
+make benchmark
+```
+
+The "Repository" column means:
+
+<dl>
+<dt>SQLite</dt>
+<dd>
+
+<https://github.com/sqlite/sqlite>
+
+</dd>
+<dt>LumoSQL</dt>
+<dd>
+
+<https://github.com/LumoSQL/LumoSQL> (this repository)
+
+</dd>
+</dl>
+
+# Which LMDB version?
+
+`mc_orig` was removed and `mc_backup` added to `mdb.c` in
+<https://github.com/LMDB/lmdb/commit/be47ca766713f55e5b3abd18120514fdad7d90f2>
+first released in `LMDB_0.9.7` on 14 August 2013. `LMDB_0.9.8` was 9 September
+2013 and `LMDB_0.9.9` was 24 October 2013.
+<https://github.com/LMDB/sqlightning/commit/58b473f3d5570fca94b88398e0e4314208a077cd>
+adapted `sqlightning` to this change on 12 September 2013. So first try
+`LMDB_0.9.8`, but this fails with:
+`sqlite3.c:38156:2: error: unknown type name ‘mdb_hash_t’`.
+
+Likely need
+[this commit](https://github.com/LMDB/lmdb/commit/01dfb2083dd690707a062cabb03801bfad1a6859),
+found through a
+[GitHub comparison](https://github.com/LMDB/lmdb/compare/LMDB_0.9.8...LMDB_0.9.9).
+
+| Tag         | Date       | Compiles | Speed test | Files | Ins. | De. |
+| ----------- | ---------- | -------- | ---------- | ----: | ---: | --: |
+| LMDB_0.9.8  | 2013-09-09 | ✗        | -          |     - |    - |   - |
+| LMDB_0.9.9  | 2013-10-24 | ✓        | ✓          |     6 |  577 | 540 |
+| LMDB_0.9.10 | 2013-11-12 | ✓        | ✓          |     5 |  216 | 121 |
+| LMDB_0.9.11 | 2014-01-15 | ✓        | ✓          |     6 |  443 | 273 |
+| LMDB_0.9.12 | 2014-06-18 | ✓        | ✓          |    12 |  516 | 333 |
+| LMDB_0.9.13 | 2014-06-18 | ✓        | ✓          |     3 |   28 |  22 |
+| LMDB_0.9.14 | 2014-09-20 | ✓        | ✓          |    23 | 2331 | 441 |
+| LMDB_0.9.15 | 2015-06-19 | ✓        | ✓          |    24 |  388 | 187 |
+| LMDB_0.9.16 | 2015-08-14 | ✓        | ✓          |     5 |   44 |  19 |
+| LMDB_0.9.17 | 2015-11-30 | ✓        | ✗          |    10 | 1072 | 565 |
+| LMDB_0.9.18 | 2016-02-05 | ✓        | ✗          |    24 |  303 |  57 |
+| LMDB_0.9.19 | 2016-12-28 | ✓        | ✗          |     6 |  684 | 447 |
+| LMDB_0.9.21 | 2017-06-01 | ✓        | ✗          |    23 |   81 |  50 |
+| LMDB_0.9.22 | 2018-03-22 | ✓        | ✗          |    23 |   74 |  58 |
+| LMDB_0.9.23 | 2018-12-19 | ✓        | ✗          |     4 |   52 |   9 |
+| LMDB_0.9.24 | 2019-07-19 | ✓        | ✗          |     6 |   16 |  11 |
+
+The [GitHub LMDB mirror](https://github.com/LMDB/lmdb/releases) does not include
+a release `LMDB_0.9.20`, releases before 0.9.8 are not shown.
+
+<dl>
+<dt>Compiles</dt>
+<dd>✓ means the process documented above completes successfully.</dd>
+<dt>Speed test<dt>
+<dd>✓ means the cut down version of speed test passes in `./tool/speedtest.tcl`
+passes.</dd>
+<dt>Files</dt>
+<dd>The number of files changed between the previous release and this one, as
+reported by <code>git diff --shortstat</code>.</dd>
+<dt>Ins.</dt>
+<dd>The number of insertions as for the "Files" column.</dd>
+<dt>De.</dt>
+<dd>The number of deletions as for the "Files" column.</dd>
+</dl>
+
+A **?** means that this has not been tested, and a **-** means that it is not
+applicable at present.
+
+# References
+
+- The
+  [Fedora Spec file for "sqlite3"](https://apps.fedoraproject.org/packages/sqlite/sources/)
+  lists dependencies.
+- The [documentation](https://sqlite.org/whynotgit.html#getthecode) linking to
+  the [official SQLite GitHub mirror](https://github.com/sqlite/sqlite)
+- ["sqlightning" repository](https://github.com/LMDB/sqlightning)
+- Early benchmarking by Howard Chu of <https://pastebin.com/B5SfEieL> of 3.7.17
+- Benchmarking
+  <https://github.com/google/leveldb/blob/master/benchmarks/db_bench_sqlite3.cc>
@@ -0,0 +1,122 @@
+<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
+<!-- SPDX-FileCopyrightText: 2020 The LumoSQL Authors -->
+<!-- SPDX-ArtifactOfProjectName: LumoSQL -->
+<!-- SPDX-FileType: Documentation -->
+<!-- SPDX-FileComment: Original by Dan Shearer, 2019 -->
+
+
+Table of Contents
+=================
+
+   * [Knowledge Relevant to LumoSQL](#knowledge-relevant-to-lumosql)
+   * [List of SQLite Code-related Knowledge](#list-of-sqlite-code-related-knowledge)
+   * [List of On-disk File Format-related Knowledge](#list-of-on-disk-file-format-related-knowledge)
+   * [List of Relevant Benchmarking and Test Knowledge](#list-of-relevant-benchmarking-and-test-knowledge)
+   * [List of Just a Few SQLite Encryption Projects](#list-of-just-a-few-sqlite-encryption-projects)
+   * [List of from-scratch MySQL SQL and MySQL Server implementations](#list-of-from-scratch-mysql-sql-and-mysql-server-implementations)
+
+Knowledge Relevant to LumoSQL
+=============================
+
+LumoSQL has many antecedents and relevant codebases.  This document is intended
+to be a terse list of published source code for reference of LumoSQL
+developers. Although it is stored with the rest of the LumoSQL documentation
+and referred to throughout, it is a standalone document.
+
+Everything listed here is open source, except for software produced by
+sqlite.org or the commercial arm hwaci.com. There are many closed-source
+products that extend and reuse SQLite in various ways, none of which have been
+considered by the LumoSQL project.
+
+# List of SQLite Code-related Knowledge
+
+SQLite code has been incorporated into many other projects, and besides there are many other relevant key-value stores and libraries.
+
+| Project | Last modified | Description   |
+| ------------- | ------------- | --------|
+| [sqlightning](https://github.com/LMDB/sqlightning)  | 2013 | SQLight ported to the LMDB key-value store |
+| [Original MDB Paper](https://www.openldap.org/pub/hyc/mdb-paper.pdf) | 2012 | Paper by Howard Chu describing the motivations, design and constraints of the LMDB key-value store |
+| [SQLHeavy](https://github.com/btrask/sqlheavy)  | 2016 | sqlightning updated, and ported to LevelDB, LMDB, RocksDB and more, with a key-value store library abstraction |
+| [libkvstore](https://github.com/btrask/libkvstore) | 2016 | The k-v store abstraction library used by SQLHeavy |
+| [SQLite 4](https://sqlite.org/src4/tree?ci=trunk) | 2014 | Abandoned new version of SQLite with improved backend support and other features |
+| [Sleepycat/Oracle BDB](https://fossies.org/linux/misc/db-18.1.32.tar.gz) | current | The original ubiquitous Unix K-V store, disused in open source since Oracle's 2013 license change; the API template for most of the k-v btree stores around. Now includes many additional features including full MVCC transactions, networking and replication. This link is a mirror of code from download.oracle.com, which requires a login | 
+| [Sleepycat/Oracle BDB-SQL](https://fossies.org/linux/misc/db-18.1.32.tar.gz) | current | Port of SQLite to the Sleepycat/Oracle transactional bdb K-V store. As of 5th March 2020 this mirror is identical to Oracle's login-protected tarball for db 18.1.32 | 
+| [rqlite](https://github.com/rqlite/rqlite) | current | Distributed database with networking and Raft consensus on top of SQLite nodes |
+| [Bedrock](https://github.com/Expensify/Bedrock) | current | WAN-replicated blockchain multimaster database built on SQLite. Has MySQL emulation |
+| [sql.js](https://github.com/kripken/sql.js/) | current | SQLite compiled to JavaScript WebAssembly through Emscripten |
+| [ActorDB](https://github.com/biokoda/actordb) | current | SQLite with a data sharding/distribution system across clustered nodes. Each node stores data in LMDB, which is connected to SQLite at the SQLite WAL layer |
+| [WAL-G](https://github.com/wal-g/wal-g) | current | Backup/replication tool that intercepts the WAL journal log for each of Postgres, Mysql, MonogoDB and Redis |
+| [sqlite3odbc](https://github.com/gdev2018/sqlite3odbc) | current | ODBC driver for SQLite by [Christian Werner](http://www.ch-werner.de/sqliteodbc/) as used by many projects including LibreOffice |
+| [Spatialite](https://www.gaia-gis.it/fossil/libspatialite/index)| current | Geospatial GIS extension to SQLite, similar to PostGIS |
+| [Gigimushroom's Database Backend Engine](https://github.com/gigimushroom/DatabaseBackendEngine)|2019| A good example of an alternative BTree storage engine implemented using SQLite's Virtual Table Interface. This approach is not what LumoSQL has chosen for many reasons, but this code demonstrates virtual tables can work, and also that storage engines implemented at virtual tables can be ported to be LumoSQL backends.|
+
+# List of On-disk SQLite Format-related Knowledge
+
+The on-disk file format is important to many SQLite use cases, and introspection tools are both important and rare. Other K-V stores also have third-party on-disk introspection tools. There are advantages to having investigative tools that do not use the original/canonical source code to read and write these databases. The SQLite file format is promoted as being a stable, backwards-compatible transport (recommend by the Library of Congress as an archive format) but it also has significant drawbacks as discussed elsewhere in the LumoSQL documentation.
+
+| Project | Last modified | Description |
+| ------- | ------------- | ----------- |
+| [A standardized corpus for SQLite database forensics](https://www.sciencedirect.com/science/article/pii/S1742287618300471) | current | Sample SQLite databases and evaluations of 5 tools that do extraction and recovery from SQLite, including Undark and SQLite Deleted Records Parser |
+| [FastoNoSQL](https://github.com/fastogt/fastonosql) | current | GUI inspector and management tool for on-disk databases including LMDB and LevelDB |
+| [Undark](https://github.com/inflex/undark) | 2016 | SQLite deleted and corrupted data recovery tool |
+| [SQLite Deleted Records Parser](https://github.com/mdegrazia/SQLite-Deleted-Records-Parser) | 2015 | Script to recover deleted entries in an SQLite database |
+| [lua-mdb](https://github.com/catwell/cw-lua/tree/master/lua-mdb) | 2016 | Parse and investigate LMDB file format |
+
+(The forensics and data recovery industry has many tools that diagnose SQLite
+database files. Some are open source but many are not. A list of tools commonly
+cited by forensics practicioners, none of which LumoSQL has downloaded or tried
+is: Belkasoft Evidence Center, BlackBag BlackLight, Cellebrite UFED Physical
+Analyser, DB Browser for SQLite, Magnet AXIOM and Oxygen Forensic Detective.)
+
+# List of Relevant SQL Checksumming-related Knowledge
+
+| Project | Last modified | Description |
+| ------- | ------------- | ----------- |
+| [eXtended Keccak Code Package](https://github.com/XKCP/XKCP)  | current | Code from https://keccak.team for very fast peer-reviewed hashing |
+| [SQL code for Per-table Multi-database Solution](https://www.periscopedata.com/blog/hashing-tables-to-ensure-consistency-in-postgres-redshift-and-mysql) | 2014 | Periscope's SQL row hashing solution for Postgres, Redshift and MySQL |
+| [SQL code for Public Key Row Tracking](https://www.percona.com/blog/2018/10/12/track-postgresql-row-changes-using-public-private-key-signing/) | 2018 | Percona's SQL row integrity solution for Postgresql using public key crypto |
+
+# List of Relevant Benchmarking and Test Knowledge
+
+Benchmarking is a big part of LumoSQL, to determine if changes are an improvement. The trouble is that SQLite and other top databases are not really benchmarked in realistic and consistent way, despite SQL server benchmarking using tools like TPC being an obsessive industry in itself, and there being myriad of testing tools released with SQLite, Postgresql, MariaDB etc. But in practical terms there is no way of comparing the most-used databases with each other, or even of being sure that the tests that do exist are in any way realistic, or even of simply reproducing results that other people have found. LumoSQL covers so many codebases and use cases that better SQL benchmarking is a project requirement. Benchmarking and testing overlap, which is addressed in the code and docs.
+
+The well-described [testing of SQLite](https://sqlite.org/testing.html) involves some open code, some closed code, and many ad hoc processes. Clearly the SQLite team have an internal culture of testing that has benefitted the world. However that is very different to reproducible testing, which is in turn very different to reproducible benchmarking, and that is even without considering whether the benchmarking is a reasonable approximation of actual use cases.
+
+To highlight how poorly SQL benchmarking is done: there are virtually no test harnesses that cover encrypted databases and/or encrypted database connections, despite encryption being frequently required, and despite crypto implementation decisions making a very big difference in performance.
+
+| Project | Last modified | Description | 
+| ------- | ------------- | ----------- |
+| [Dangers and complexity of sqlite3 benchmarking](https://www.cs.utexas.edu/~vijay/papers/apsys17-sqlite.pdf)| n/a | Helpful 2017 paper: "...changing just one parameter in SQLite can change the performance by 11.8X... up to 28X difference in performance" |
+| [sqllogictest](https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki)|2017 | [sqlite.org code](https://www.sqlite.org/sqllogictest/artifact/2c354f3d44da6356) to [compare the results](https://gerardnico.com/data/type/relation/sql/test) of many SQL statements between multiple SQL servers, either SQLite or an ODBC-supporting server |
+| [TCL SQLite tests](https://github.com/sqlite/sqlite/tree/master/test)|current| These are a mixture of code covereage tests, unit tests and test coverage. Actively maintained. |
+| [Yahoo Cloud Serving Benchmark](https://github.com/brianfrankcooper/YCSB/)| current | Benchmarking tool for K-V stores and cloud-accessible databases |
+| [Example Android Storage Benchmark](https://github.com/greenrobot/android-database-performance) | 2018 | This code is an example of the very many Android benchmarking/testing tools. This needs further investigation |
+| [Sysbench](https://github.com/akopytov/sysbench) | current | A multithreaded generic benchmarking tool, with one well-supported use case being networked SQL servers, and [MySQL in particular](https://www.percona.com/blog/2019/04/25/creating-custom-sysbench-scripts/) |
+
+
+# List of Just a Few SQLite Encryption Projects
+
+Encryption is a major problem for SQLite users looking for open code. There are no official implementations in open source, although the APIs are documented (seemingly by an SCM mistake years ago (?), see sqlite3-dbx below) and most solutions use the SQLite extension interaface. This means that there are many mutually-incompatible implementations, several of them seeming to be very popular. None appear to have received encryption certification (?) and none seem to publish test results to reassure users about compatibility with SQLite upstream or with the file format. Besides the closed source solution from sqlite.org, there are also at least three other closed source options not listed here. This choice between either closed source or fragmented solutions is a poor security approach from the point of view of maintainance as well as peer-reviewed security. This means that SQLite in 2020 does not have a good approach to privacy.
+
+| Project | Last modified | Description | 
+| ------- | ------------- | ----------- |
+| [SQLite Encryption Extension](https://www.sqlite.org/see/doc/release/www/readme.wiki)(SEE)| current | Info about the proprietary, closed source official SQLite crypto solution, illustrating that there is little to be compatible with in the wider SQLite landscape. This is a standalone product. The API is published and used by some open source code. |
+| [SQLCipher](https://github.com/sqlcipher/sqlcipher) | current | Adds at-rest encryption to SQLite [at the pager level](https://www.zetetic.net/sqlcipher/design/), using OpenSSL (the default) or optionally other providers. Uses an open core licensing model, and the less-capable open source version is BSD licensed with a requirement that users publish copyright notices. Uses the SEE API. |
+| [sqleet](https://github.com/resilar/sqleet) | current | Implements SHA256 encryption, also at the pager level. Public Domain (not Open Source, similar to SQLite) |
+| [sqlite3-dbx](https://github.com/newsoft/sqlite3-dbx) | kinda-current | Accidentally-published but unretracted code on sqlite.org fully documents crypto APIs used by SEE |
+| [SQLite3-Encryption](https://github.com/darkman66/SQLite3-Encryption) | current | No crypto libraries (DIY crypto!) and based on the similar-sounding SQLite3-with-Encryption project | 
+| [wxSqlite3](https://github.com/utelle/wxsqlite3/) | current | wxWidgets C++ wrapper, that also implements SEE-equivalent crypto. Licensed under the LGPL |
+
+... there are many more crypto projects for SQLite. 
+
+# List of from-scratch MySQL SQL and MySQL Server implementations
+
+If we want to make SQLite able to process MySQL queries there is a lot of existing code in this area to consider. There are at least 80 projects on github which implement some or all of the MySQL network-parse-optimise-execute SQL pathway, a few of them implement all of it. None so far reviewed used MySQL or MariaDB code to do so. Perhaps that is because the SQL processing code alone in these databases is many times bigger than the whole of SQLite, and it isn't even clear how to add them to this table if we wanted to. Only a few of these projects put a MySQL frontend on SQLite, but two well-maintained projects do, showing us two ways of implementing this.
+
+| Project | Last modified | Description |
+| ------- | ------------- | ----------- |
+| [Bedrock](https://github.com/Expensify/Bedrock) | current | The MySQL compatibility seems to be popular and is actively supported but it is also small. It speaks the MySQL/MariaDB protocol accurately but doesn't seem to try very hard to match MySQL SQL language semantics and extensions, rather relying on the fact that SQLite substantially overlaps with MySQL. |
+| [TiDB](https://github.com/pingcap/tidb/) | current | Distributed database with MySQL emulation as the primary dialect and referred to throughout the code, with frequent detailed bugfixes on deviations from MySQL SQL language behaviour. |
+| [phpMyAdmin parser](https://github.com/phpmyadmin/sql-parser) | current | A very complete parser for MySQL code, demonstrating that completeness is not the unrealistic goal some claim it to be |
+| [Go MySQL Server](https://github.com/src-d/go-mysql-server) | current | A MySQL server written in Go that executes queries but mostly leaves the backend for the user to implement. Intended to put a compliant MySQL server on top of arbitary backend sources. |
+| [ClickHouse MySQL Frontend](https://github.com/ClickHouse/ClickHouse/tree/146109fe27074229a38cd704d60f23ec7bd2ed67/base/mysqlxx) | current | Yandex' [Clickhouse](https://clickhouse.tech/) has a MySQL frontend.|