Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect strict/non-strict MySQL modes #29

Merged
merged 16 commits into from
Mar 25, 2025
Merged

Respect strict/non-strict MySQL modes #29

merged 16 commits into from
Mar 25, 2025

Conversation

JanJakes
Copy link
Contributor

@JanJakes JanJakes commented Mar 13, 2025

Summary

This PR implements the emulation of MySQL no-strict mode (disabled STRICT_TRANS_TABLES/STRICT_ALL_TABLES) in SQLite, and its modification and retrieval using SET/SELECT queries.

The non-strict mode is a requirement for WPDB, but in other environments (phpMyAdmin, etc.), it's likely to be left to the MySQL default, which is STRICT_TRANS_TABLES (strict mode for InnoDB tables).

Closes #17.

Strict vs. non-strict mode

The implementation was done in the PHP SQLite driver code, and from a high-level perspective, it does the following:

  1. In non-strict mode, INSERT INTO table (optionally some columns) <select-or-values> is rewritten to INSERT INTO table (all table columns) SELECT <non-strict-mode-adjusted-values> FROM (<select-or-values>).
  2. In non-strict mode, UPDATE table SET column = <value> is rewritten to UPDATE table SET column = COALESCE(<value>, <implicit-default>), when the column is non-nullable.

Details

When zooming into the details, the above-described logic is quite involved and handles things like:

  • We need to read the column metadata to determine the implicit defaults.
  • An INSERT statements may list no fields, or any fields in any order.
  • With INSERT INTO ... SELECT ... we don't know what fields the SELECT returns, and we need to figure that out.
  • With UPDATE statements, a NULL may set a NULL value or an implicit default, depending on the column definition.
  • All of the above should work for temporary tables as well.

SET/SELECT SQL mode

The PR also implements the following statements:

  • SET for session sql_mode value.
  • SELECT for session sql_mode value.

The SET statement is a surprisingly complex one, with various syntax flavors and multi-command support. Statements like SET @var = '...', SESSION sql_mode = '...', @@GLOBAL.time_zone = '...', @@debug = '...', ... are valid, including strange semantics, such as carrying the variable type keywords (SESSION, etc.) through to the subsequent definitions in the same statement.

WPDB

This PR also handles $wpdb->set_sql_mode() for the SQLite driver, as well as initializing the SQL mode for each connection.

Strict/non-strict behavior

Here's a summary of the implemented strict vs. non-strict mode behavior:

When STRICT_TRANS_TABLES or STRICT_ALL_TABLES is enabled:
  1. NULL + NO DEFAULT:     No value saves NULL, NULL saves NULL, DEFAULT saves NULL.
  2. NULL + DEFAULT:        No value saves DEFAULT, NULL saves NULL, DEFAULT saves DEFAULT.
  3. NOT NULL + NO DEFAULT: No value is rejected, NULL is rejected, DEFAULT is rejected.
  4. NOT NULL + DEFAULT:    No value saves DEFAULT, NULL is rejected, DEFAULT saves DEFAULT.

When STRICT_TRANS_TABLES and STRICT_ALL_TABLES are disabled:
  1. NULL + NO DEFAULT:     No value saves NULL, NULL saves NULL, DEFAULT saves NULL.
  2. NULL + DEFAULT:        No value saves DEFAULT, NULL saves NULL, DEFAULT saves DEFAULT.
  3. NOT NULL + NO DEFAULT: No value saves IMPLICIT DEFAULT.
                            NULL is rejected on INSERT, but saves IMPLICIT DEFAULT on UPDATE.
                            DEFAULT saves IMPLICIT DEFAULT.
  4. NOT NULL + DEFAULT:    No value saves DEFAULT.
                            NULL is rejected on INSERT, but saves IMPLICIT DEFAULT on UPDATE.
                            DEFAULT saves DEFAULT.

Other paths considered and attempted

Initially, I started the implementation with the presumption that handling the strict vs. non-strict mode correctly and dynamically on the PHP driver side would be very hard, and I tried to implement this using other approaches at first.

1. DEFAULT and CHECK expressions

SQLite supports expressions in DEFAULT definitions for columns, as well as in CHECK constraints. The natural first direction would be to make a default conditional to emulate the non-strict mode, or, conversely, make a constraint check conditional to emulate the strict mode.

It seems elegant in theory, but it doesn't work because both DEFAULT and CHECK expressions need to be "constant", which means they can't reference any other columns or tables, and they can only use deterministic functions, that is, functions that always return the same value during a single session (except for date/time functions that are a special case).

This means using these expressions to apply strict/non-strict mode conditionally is not possible (and it's actually not possible to use non-deterministic user-defined functions in other cases either).

2. Triggers

The idea here was to use before or after INSERT and UPDATE triggers to modify the new column values to emulate the MySQL implicit defaults. For each table, a specifically tailored trigger could be composed at the time of the table creation or modification.

This turned out to be problematic, and actually nearly impossible without additional complex or dirty hacks because:

  1. SQLite doesn't support modifying the NEW row values in triggers. We can overcome this by using AFTER triggers and updating the values.
  2. It may be tricky to keep the trigger up-to-date across table modifications, renames, etc., although definitely solvable.
  3. There is no way to distinguish an explicit NULL value from an omitted value from INSERT within a view. This is a dealbreaker, although solvable with views (see below).
  4. There is no way of passing any runtime per-session information into the trigger whatsoever, which means there is no way to tell the trigger what's the current SQL mode. There are no user variables in SQLite, temporary tables can't be referenced from non-temporary triggers, non-deterministic user-defined functions are not allowed in triggers, and user-modifiable pragmas (application_id) are not session-specific. Using a real table for this seemed to become a bit too hacky to me. However, this can be overcome with views (see below).

The prototype can be found in the following draft PR: #30

3. Views & triggers

SQLite supports views, and even though they are not writable, we can use INSTEAD OF triggers to hijack the values from the INSERT or UPDATE query, and perform custom logic instead. This approach is very similar to the previous one with triggers only, and it solves its main pain points:

  1. While we can't pass any runtime information into the triggers, we can choose at runtime whether we'll execute the INSERT or UPDATE against the original table or against a view that implements "non-strict" triggers.
  2. Distinguishing an explicit NULL value from an omitted one can be done with a hack. The view can define an extra column with a default value, and we can then use that column to sneak in a string of column names that were explicitly used in the insert statement. It's not a nice solution, but it works.

This approach seemed to work, until I executed the full test case only to realize that SQLITE_ERROR: sqlite3 result code 1: cannot UPSERT a view. So that was it for views.

The prototype can be found in the following draft PR: #31

@JanJakes JanJakes force-pushed the strict-sql-mode branch 10 times, most recently from a70adb8 to bbb6e70 Compare March 21, 2025 14:21
@JanJakes JanJakes changed the title Respect strict SQL modes Respect strict/non-strict MySQL modes Mar 21, 2025
@JanJakes JanJakes linked an issue Mar 21, 2025 that may be closed by this pull request
@JanJakes JanJakes marked this pull request as ready for review March 21, 2025 16:03
@JanJakes JanJakes requested a review from adamziel March 21, 2025 16:03
@adamziel
Copy link
Contributor

The approach taken here is a good compromise. I'm sorry none of the other paths didn't work. One thing that bothers me is the proliferation of ways to insert and update data in MySQL. Let's make sure all our bases are covered – here's a few more query types I can think of:

  • REPLACE
  • LOAD DATA and LOAD XML – we can just bale out on these but I want to point them out
  • CREATE TABLE AS SELECT
  • DO SELECT
  • ...potentially more

We don't have to actually support all of these on day 1, but it would be useful to at least display a warning when running into an unsupported combination of operation + sql mode.

@adamziel
Copy link
Contributor

The SET statement is a surprisingly complex one, with various syntax flavors and multi-command support. Statements like SET @var = '...', SESSION sql_mode = '...', @@GLOBAL.time_zone = '...', @@debug = '...', ... are valid, including strange semantics, such as carrying the variable type keywords (SESSION, etc.) through to the subsequent definitions in the same statement.

Would this also support things like SELECT (@last_id := id) FROM wp_posts?

Also, there's at least a few scenarios where we could reasonably bale out, e.g. most hosts would likely not allow a SET that updates mysqld-auto.cnf.

* See:
* https://dev.mysql.com/doc/refman/8.4/en/data-type-defaults.html#data-type-defaults-implicit
*/
const DATA_TYPE_IMPLICIT_DEFAULT_MAP = array(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really good find

}
}

// TODO: Support user variables (in-memory or a temporary table).
Copy link
Contributor

@adamziel adamziel Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's communicate lack of support with a warning or error to at least make the developer aware why their code breaks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Added in d6f6b6d.

}
}

// TODO: Handle GLOBAL, PERSIST, and PERSIST_ONLY types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's bale out with an error to make sure we don't give a false illusion of these queries actually working. It should also make it easier for us to spot any plugins relying on these semantics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Resolved in d6f6b6d.

@@ -2592,6 +2877,203 @@ private function translate_show_like_or_where_condition( WP_Parser_Node $like_or
return '';
}

/**
* Translate INSERT body, emulating MySQL implicit defaults in non-strict mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great comment ❤️ it makes the intended purpose very clear 🚀 I want to highlighting that for everyone, cc @bgrgicak @brandonpayton @akirk @zaerl

Copy link
Contributor

@adamziel adamziel Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I'd add is the INSERT before the translation and the expected INSERT after the translation – same as in the PR description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Added the input/output in f922251.

$select_list[] = $stmt->getColumnMeta( $i )['name'];
}
} else {
// When inserting from a VALUES list, SQLite uses "columnN" naming.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

Copy link
Contributor

@adamziel adamziel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few nitpicks around being very strict with errors in cases we don't support. The PR is solid, the approach well-researched, and the reasoning clear. It's always a pleasure to review these SQL PRs, thank you for your great work @JanJakes !

@JanJakes
Copy link
Contributor Author

@adamziel Thanks!

Would this also support things like SELECT (@last_id := id) FROM wp_posts?

Probably not, these seem to be part of the expression grammar, and not part of the SET statement.

Also, there's at least a few scenarios where we could reasonably bale out, e.g. most hosts would likely not allow a SET that updates mysqld-auto.cnf.

True, we'll need to go through a list of the variables at some point.

@JanJakes
Copy link
Contributor Author

@adamziel

Great point to check whether all ways of data manipulation are covered or disabled!

Let's make sure all our bases are covered – here's a few more query types I can think of:

REPLACE
LOAD DATA and LOAD XML – we can just bale out on these but I want to point them out
CREATE TABLE AS SELECT
DO SELECT
...potentially more

REPLACE actually works, I forgot to reflect that in the test, adding a test now (commit).

LOAD is not supported at all at the moment, I think that will just fail.

CREATE TABLE AS SELECT is explicitly disabled for now, as we don't support creating information schema data that way.

Can a DO statement insert any data? I don't know if there is any other way to insert data. Also consulting ChatGPT, and it doesn't seem to know of any other ways.


This made me think if there are any other variants for the UPDATE statements.

I found one I didn't consider before — INSERT ... ON DUPLICATE KEY UPDATE. So I first tried it in MySQL, and to my surprise, MySQL doesn't consider the non-strict mode in the ON DUPLICATE KEY UDPATE clause:

SET SESSION sql_mode = '';

CREATE TABLE t1 (id INT PRIMARY KEY, name TEXT NOT NULL);
INSERT INTO t1 (id) VALUES (1);

-- this works (implicit default)
UPDATE t1 SET name = NULL;

-- this fails with "ERROR 1048 (23000) at line 6: Column 'name' cannot be null"
INSERT INTO t1 (id) VALUES (1) ON DUPLICATE KEY UPDATE name = NULL;

So that's great, since it already behaves that way, and I only added a test.

@adamziel
Copy link
Contributor

I first tried it in MySQL, and to my surprise, MySQL doesn't consider the non-strict mode in the ON DUPLICATE KEY UDPATE clause

Oh MySQL 🙈

try {
$this->execute_sqlite_query( 'DELETE FROM sqlite_sequence WHERE name = ?', array( $table_name ) );
} catch ( PDOException $e ) {
if ( str_contains( $e->getMessage(), 'no such table' ) ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would checking the error code be more reliable here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel For some reason, I'm not getting the correct SQLSTATE back: SQLSTATE[HY000]: General error: 1 no such table: sqlite_sequence

It may be something about the PDO driver in SQLite. The correct SQLSTATE should be 42S02.

Comment on lines +2894 to +2898
* Rewrites a statement body in the following form:
* INSERT INTO table (optionally some columns) <select-or-values>
* To a statement body with the following structure:
* INSERT INTO table (all table columns)
* SELECT <non-strict-mode-adjusted-values> FROM (<select-or-values>) WHERE true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

// 4. Compose a new INSERT field list with all columns from the table.
$fragment = '(';
foreach ( $columns as $i => $column ) {
$fragment .= $i > 0 ? ', ' : '';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing actionable for this PR but man, I wish we had a more convenient way of building these SQL queries – both for us here and for WordPress core. cc @dmsnell – we've discussed this recently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel Just yesterday, I was trying to brainstorm with ChatGPT about a lightweight but powerful SQL builder API in PHP. PHP is quite limited for these types of APIs, especially with typing (we'll never be able to do anything like Drizzle ORM), but it would definitely be great to have something like that for WordPress.

$values = 'insertFromConstructor' === $node->rule_name
? $node->get_first_child_node( 'insertValues' )
: $node->get_first_child_node( 'queryExpressionOrParens' );
$fragment .= ' FROM (' . $this->translate( $values ) . ') WHERE true';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need WHERE true? Is this stitched with AND ... later on? Also, TIL we can do a subquery like that without giving it an alias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel It was failing without that, and then I found it documented:

Screenshot 2025-03-25 at 15 47 05

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL. Let's leave a comment with a link inline – it's the kind of thing that seems too easy to remove when not careful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel 👍 Added in 3faa553.

Co-authored-by: Adam Zieliński <[email protected]>
@JanJakes JanJakes merged commit 4074c52 into develop Mar 25, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support disabling STRICT_TRANS_TABLES and STRICT_ALL_TABLES
2 participants