Skip to content

Conversation

@marcelofern
Copy link
Contributor

@marcelofern marcelofern commented Jan 16, 2025

Background

Modern applications usually run schema changes before they deploy new
code to production. For example, in blue/green deployment, there is a
gradual transfer of traffic from the servers using the old code (blue),
to servers using the new code (green).

Once all traffic is migrated to the "green" servers, the "blue" servers
usually remain ready in case the application needs to be rolled back.

In this context, a Python/Django application that just performed a
RenameModel operation, the "blue" servers will still hold the old Django
state in which the table is believed to have the old name. Only the
"green" servers will have the correct Django state with the correct new
table name. This means that any queries performed by the "blue" server
against the model being renamed will fail.

Analogously, if the migration has to be reverted and the traffic
gradually moved from green to blue, the green servers would in turn have
the wrong Django state.

SaferRenameModelPart1

This new class performs the following SQL statements to rename a table
in a safer way than Django's default operation (RenameModel)

BEGIN;

ALTER TABLE foo RENAME TO new_foo;

CREATE VIEW foo AS SELECT * FROM new_foo;

COMMIT;

This is the first step (out of two) to rename a table in a situation
where both old code (before the change) and new code (after the change)
might be running at the same time.

The ALTER TABLE acquires a quick AccessExclusive lock, and the CREATE
VIEW AS SELECT operation is also very quick, as it only creates a "view
alias" to the renamed table. This view alias accepts all operations
(INSERT, UPDATE, DELETE, SELECT). The docs provide further explanation:

Simple views are automatically updatable: the system will allow
INSERT, UPDATE and DELETE statements to be used on the view in the same
way as on a regular table. A view is automatically updatable if it
satisfies all of the following conditions: (...) - All columns in the
view's select list must be simple references to columns of the
underlying relation. They cannot be expressions, literals or functions.
System columns cannot be referenced, either.

It doesn't matter if the table:

  • has a trigger
  • has a constraint

The view alias will work properly.

Note that this operation does not update the name of other relations
that have been created using the old table name, for example Foreign Key
constraints, M2M tables, etc. Django actually drops constraints and
recreates them from scratch during the RenameModel operation, which is
another reason to not use Django's RenameModel.

SaferRenameModelPart2

Introduce SaferRenameModelPart2

This is the second and final step for performing a safer RenameModel
operation.

Essentially, this operation is clean-up only. It allows for removing the
view created by step 1 once the user has attested that no more "blue"
servers are using the view.

It essentially performs a:

DROP VIEW IF EXISTS <view_name>;

There is one important limitation of this operation. Namely, the old/new
states are different from the Part 1 operation since Part 2 will run in a
different migration. This means that we lose access to old_model and
only have access to the new_model.

Without access to the old model, we don't know what the old table name
was. In this regard, the old_table_name parameter is required so that
the routine can know the name of the view that needs to be dropped.

@github-actions
Copy link

github-actions bot commented Jan 16, 2025

Coverage Report Results

Name Stmts Miss Branch BrPart Cover
src/django_pg_migration_tools/indexes.py 9 0 0 0 100%
src/django_pg_migration_tools/management/commands/migrate_with_timeouts.py 138 0 20 0 100%
src/django_pg_migration_tools/operations.py 541 0 108 0 100%
src/django_pg_migration_tools/timeouts.py 73 0 30 0 100%
TOTAL 761 0 158 0 100%

3 empty files skipped.

@marcelofern marcelofern force-pushed the safer-rename-model branch 2 times, most recently from ac96099 to 92e5d51 Compare January 16, 2025 21:57
@marcelofern marcelofern changed the title RESET LATER Safer Rename Model Jan 16, 2025
@marcelofern marcelofern force-pushed the safer-rename-model branch 2 times, most recently from 7a43e83 to a5c234e Compare January 17, 2025 04:53
@marcelofern marcelofern marked this pull request as ready for review January 27, 2025 22:57
@marcelofern marcelofern requested a review from a team January 27, 2025 22:57
raise TableRenameMustBeInsideTransaction(
"Can't rename a table outside a transaction. Please set "
"atomic = True on the migration."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@timb07 timb07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about the status of this:

Note that this operation does not update the name of other relations
that have been created using the old table name, for example Foreign Key
constraints, M2M tables, etc. Django actually drops constraints and
recreates them from scratch during the RenameModel operation, which is
another reason to not use Django's RenameModel.

If this operation leaves FKs (etc.) in place, unlike Django, that should be called out quite explicitly.

In any case, there should be some addition to the docs explaining how FKs (etc.) will be handled, or what to expect to see after a table with a FK pointing to it is renamed (e.g. with an example).

@marcelofern
Copy link
Contributor Author

If this operation leaves FKs (etc.) in place, unlike Django, that should be called out quite explicitly.
@timb07

I've added a few more clarifications here: 5548782

In any case, there should be some addition to the docs explaining how FKs (etc.) will be handled, or what to expect to see after a table with a FK pointing to it is renamed (e.g. with an example).

To clarify, there's no handling of FKs by this process. It's not needed
because there's no change in database state. Particularly, the

ALTER TABLE ... RENAME

operation will handle all objects linked to the table if the table name
is changed. Postgres gives us that for free.

I think that the problem here is that Django actually does more than it should.
Namely, dropping constraints and recreating them from scratch so that they can
have a fresh name, that is costly for FKs. In this case, that's explained in
the commit that adds the docs here which includes the output of the DDLs
performed by Django. Hope that clarifies?

Modern applications usually run schema changes before they deploy new
code to production. For example, in blue/green deployment, there is a
gradual transfer of traffic from the servers using the old code (blue),
to servers using the new code (green).

Once all traffic is migrated to the "green" servers, the "blue" servers
usually remain ready in case the application needs to be rolled back.

In this context, a Python/Django application that just performed a
RenameModel operation, the "blue" servers will still hold the old Django
state in which the table is believed to have the old name. Only the
"green" servers will have the correct Django state with the correct new
table name. This means that any queries performed by the "blue" server
against the model being renamed will fail.

Analogously, if the migration has to be reverted and the traffic
gradually moved from green to blue, the green servers would in turn have
the wrong Django state.

This new class performs the following SQL statements to rename a table
in a safer way than Django's default operation (RenameModel)

  BEGIN;

  ALTER TABLE foo RENAME TO new_foo;

  CREATE VIEW foo AS SELECT * FROM new_foo;

  COMMIT;

This is the first step (out of two) to rename a table in a situation
where both old code (before the change) and new code (after the change)
might be running at the same time.

The ALTER TABLE acquires a quick AccessExclusive lock, and the CREATE
VIEW AS SELECT operation is also very quick, as it only creates a "view
alias" to the renamed table. This view alias accepts all operations
(INSERT, UPDATE, DELETE, SELECT). The docs provide further explanation:

> Simple views are automatically updatable: the system will allow
> INSERT, UPDATE and DELETE statements to be used on the view in the same
> way as on a regular table. A view is automatically updatable if it
> satisfies all of the following conditions: (...) - All columns in the
> view's select list must be simple references to columns of the
> underlying relation. They cannot be expressions, literals or functions.
> System columns cannot be referenced, either.

It doesn't matter if the table:

- has a trigger
- has a constraint

The view alias will work properly.

Note that this operation does not update the name of other relations
that have been created using the old table name, for example Foreign Key
constraints, M2M tables, etc. Django actually drops constraints and
recreates them from scratch during the RenameModel operation, which is
another reason to not use Django's RenameModel.
This is the second and final step for performing a safer RenameModel
operation.

Essentially, this operation is clean-up only. It allows for removing the
view created by step 1 once the user has attested that no more "blue"
servers are using the view.

It essentially performs a:

  DROP VIEW IF EXISTS <view_name>;

There is one important limitation of this operation. Namely, the old/new
states are different from the Part 1 operation since Part 2 will run in a
different migration. This means that we lose access to `old_model` and
only have access to the `new_model`.

Without access to the old model, we don't know what the old table name
was. In this regard, the old_table_name parameter is required so that
the routine can know the name of the view that needs to be dropped.
Copy link
Contributor

@timb07 timb07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the very delayed re-review.

Most of my previous concerns have been addressed — thanks. But the SQL in the docs needs fixing as noted in order for it to address the point that the way Django does things results in constraint names changing (which the SaferRenameModel operations don't mimic).

Comment on lines +1074 to +1077
SET CONSTRAINTS "myapp_bar_foo_id_f5927bae_fk_myapp_newfoo_id" IMMEDIATE;
ALTER TABLE "myapp_bar" DROP CONSTRAINT "myapp_bar_foo_id_f5927bae_fk_myapp_newfoo_id";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the constraint name myapp_bar_foo_id_f5927bae_fk_myapp_newfoo_id include "newfoo" at this point? (I suspect that this example has been put together by hand.)

From here:

I think that the problem here is that Django actually does more than it should.
Namely, dropping constraints and recreating them from scratch so that they can
have a fresh name, that is costly for FKs. In this case, that's explained in
the commit that adds the docs here which includes the output of the DDLs
performed by Django. Hope that clarifies?

I would expect to see a different constraint name after Django drops the old one and makes a new one. So no, the example here doesn't currently clarify :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants