A POC about bring-your-own-database.
-
Git clone & CD into repo
git clone https://github.com/protojimmylin/byod-poc cd byod-poc
-
Launch Databases
docker-compose up
-
Install Python Deps
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt
-
Run the Python script
See
main.py
for more detail.python main.py
-
Clean up
Do it only when you want to clean all databases data/config.
docker-compose prune
-
Create tables by these codes will succeed with Postgres but fail MySQL:
users = Table( "users", metadata, Column("id", Integer, primary_key=True), Column("name", String), Column("fullname", String), )
MySQL would says:
sqlalchemy.exc.CompileError: (in table 'users', column 'name'): VARCHAR requires a length on dialect mysql
-
metadata.drop_all
may be stuck because of constraint things. Drop table with SQL statement including CASCADE keyword may work but then we need to write more raw SQL.
- Creating tables and droping tables are not something we will do very frequently, and the SQL statement is very static so we can just write raw SQL with every kinds of databases.
- We only use BYOD solution in some tables. So maybe we can write a series of tests to make sure all the operations we do with ORM will work with Postgres/MySQL/MSSQL and so on.
-
Ubuntu need to install MSSQL driver to connect to MSSQL. (Ref)
-
The bash script can't be executed directly. I ended up copy-paste these lines manually:
sudo su curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list > /etc/apt/sources.list.d/mssql-release.list sudo apt-get update sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 sudo ACCEPT_EULA=Y apt-get install -y mssql-tools18 source ~/.bashrc sudo apt-get install -y unixodbc-dev
-
After installing the driver, there is still a certificate thing. I've not figuered this out yet:
sqlalchemy.exc.OperationalError: (pyodbc.OperationalError) ('08001', '[08001] [unixODBC][Microsoft][ODBC Driver 18 for SQL Server]SSL Provider: [error:0A000086:SSL routines::certificate verify failed:self-signed certificate] (-1) (SQLDriverConnect)') (Background on this error at: https://sqlalche.me/e/14/e3q8)
-
It seems that MSSQL docker image doesn't provide a way to set up a initial DB. I might need to set it up with a script. (Ref)
-
3.
may be solved by addingTrustServerCertificate=yes
parameter in the connection string. -
4.
may be worked around by using the defaultmsdb
ortempdb
database.
- Async ORM need async driver, or this error happens:
sqlalchemy.exc.InvalidRequestError: The asyncio extension requires an async driver to be used. The loaded 'psycopg2' is not async
- Some database like MSSQL may not have a asynchronous driver, we can still use something like
conn.run_sync(...)
though. - SQLAlchemy's API is a little messy and some naming things are wierd. (Core/ORM API and many legacy, different syntax that is not compatible with each others.) (sessionmaker() return a session factory?)
- Calling a model's
__repr__
in a asynchronous block will fail if the__repr__
includes a field defined withRelation(...)
. - Delete/Update syntax seems not same as the original synchronous ones.
- FastAPI's documentation seems good and we are going to use it with SQLAlchemy. Maybe I should use it more.
- Postgres, MySQL, MSSQL all provide some kinds of version/schema information in there system tables.
- There are also some special function like
version()
orpg_database_size()
in postgres to be "called" using SQL. - Some of them even provide information about the OS. (like
version()
in postgres) - The way to get these information differ a lot among these databases. We may need to write it case by case.
- SQLAlchemy also do some environment check things inside some of their function like
_get_server_version_info
. - If there are a existing testcases about these environment checks we can use it first and then improve it later.
- Or we have to make sure what we want to know and check them one by one.
- It is hard to make our client's DB work properly by this way.
- Maybe we should still tell our clients BYOD is not 100% guranteed to work because they just own the DB, a part of our product.
- We can still provide this kind of envrionment check as a basic precheck & diagnosis about client DB.
- And when our backend goes wrong in the runtime because of our client's DB, we should still be ready to handle it in our backend's code.
-
Flyway uses Java as deps, but the backend code work with Flyway can be any other languages and calling Flyway's API through its command line interface.
-
Flyway itself provide a docker image too so we don't have to be worried about the Java deps part. We can just mount the migration files into Flyway's container and let it read it and connect to our client's DB to do the migrations.
-
About the migration files, Flyway works best with Java. You can write Jave code to do migration. like:
package db.migration; import org.flywaydb.core.api.migration.BaseJavaMigration; import org.flywaydb.core.api.migration.Context; import java.sql.PreparedStatement; /** * Example of a Java-based migration. */ public class V1_2__Another_user extends BaseJavaMigration { public void migrate(Context context) throws Exception { try (PreparedStatement statement = context .getConnection() .prepareStatement("INSERT INTO test_user (name) VALUES ('Obelix')")) { statement.execute(); } } }
-
But it provide SQL-based migration files, too. We can just write SQL file like:
CREATE TABLE MyTable ( MyColumn VARCHAR(100) NOT NULL );
and mount it in Flyway's
/flyway/sql
and it will read it. -
We want to create a empty DB ourself from the beginning so everything is recorded by Flyway, which is not a must but would be much easier to maintain.
-
We don't want our normal developers to be able to get the credentials about our client's databases. We don't want to use CI/CD things to do the migrations. We want to build a GUI interface and our clients should click a button themselves to trigger the migration.
-
Then, we need to trigger migration by Python programaticallu, which makes Flyway not a very good choice.
-
Maybe we can use SQLAlchemy or Django migration tools.
- Alembic is a database migrations tool written by the author of SQLAlchemy.
- Alembic uses a
.py
file to describe the migration and the table schema is written withSQLAlchemy
like this:# 1975ea83b712_create_account_table.py def upgrade(): op.create_table( 'account', sa.Column('id', sa.Integer, primary_key=True), sa.Column('name', sa.String(50), nullable=False), sa.Column('description', sa.Unicode(200)), ) def downgrade(): op.drop_table('account')
- Every migration files have two functions -
upgrade
anddowngrade
and some other information written in Python. - The migration file names start with some kind of UUID like a Git commit, and Alembic provides some commands to checkout to each version of the migrations. like:
alembic upgrade head alembic upgrade 1975ea83b712 alembic downgrade -1 alembic upgrade 1975ea83b712+2
- And other command like
alembic history
to check the history of the DB version. - Alembic provides many command-line APIs but we want to uses these functions with Python programmatically.
- And this should be easy because it seems that Alembic has a series of simple internal APIs written in Python and all the commands are just function calls. Ref
- Write the MSSQL part, too. (Done)
- Find out how to do a environment check during our clients set up the database connection. (Doing)
- Make the ORM functions asynchronous. (Done partially) (Ref: https://fastapi.tiangolo.com/advanced/async-sql-databases/, https://docs.sqlalchemy.org/en/14/orm/extensions/asyncio.html#synopsis-core)
- Check the source code about
chat_message
table. - Try to make those queries work without joining others tables.
- Do research how to handle mysql varchar and the posrgres's text fields.