MariaDB-server/mysql-test
Monty c419413ec4 MDEV-25292 Atomic CREATE OR REPLACE TABLE
Atomic CREATE OR REPLACE allows to keep an old table intact if the
command fails or during the crash. That is done by renaming the
original table to temporary name, as a backup and restoring it if the
CREATE fails. When the command is complete and logged the backup
table is deleted.

Atomic replace algorithm

  Two DDL chains are used for CREATE OR REPLACE:
  ddl_log_state_create (C) and ddl_log_state_rm (D).

  1. (C) Log rename of ORIG to TMP table (Rename TMP to original).
  2. Rename orignal to TMP.
  3. (C) Log CREATE_TABLE_ACTION of ORIG (drops ORIG);
  4. Do everything with ORIG (like insert data)
  5. (D) Log drop of TMP
  6. Write query to binlog (this marks (C) to be closed in
     case of failure)
  7. Execute drop of TMP through (D)
  8. Close (C) and (D)

  If there is a failure before 6) we revert the changes in (C)
  Chain (D) is only executed if 6) succeded (C is closed on
  crash recovery).

Foreign key errors will be found at the 1) stage.

Additional notes

  - CREATE TABLE without REPLACE and temporary tables is not affected
    by this commit.
    set @@drop_before_create_or_replace=1 can be used to
    get old behaviour where existing tables are dropped
    in CREATE OR REPLACE.

  - CREATE TABLE is reverted if binlogging the query fails.

  - Engines having HTON_EXPENSIVE_RENAME flag set are not affected by
    this commit. Conflicting tables marked with this flag will be
    deleted with CREATE OR REPLACE.

  - Replication execution is not affected by this commit.
    - Replication will first drop the conflicting table and then
      creating the new one.

  - CREATE TABLE .. SELECT XID usage is fixed and now there is no need
    to log DROP TABLE via DDL_CREATE_TABLE_PHASE_LOG (see comments in
    do_postlock()). XID is now correctly updated so it disables
    DDL_LOG_DROP_TABLE_ACTION. Note that binary log is flushed at the
    final stage when the table is ready. So if we have XID in the
    binary log we don't need to drop the table.

  - Three variations of CREATE OR REPLACE handled:

    1. CREATE OR REPLACE TABLE t1 (..);
    2. CREATE OR REPLACE TABLE t1 LIKE t2;
    3. CREATE OR REPLACE TABLE t1 SELECT ..;

  - Test case uses 6 combinations for engines (aria, aria_notrans,
    myisam, ib, lock_tables, expensive_rename) and 2 combinations for
    binlog types (row, stmt). Combinations help to check differences
    between the results. Error failures are tested for the above three
    variations.

  - expensive_rename tests CREATE OR REPLACE without atomic
    replace. The effect should be the same as with the old behaviour
    before this commit.

  - Triggers mechanism is unaffected by this change. This is tested in
    create_replace.test.

  - LOCK TABLES is affected. Lock restoration must be done after new
    table is created or TMP is renamed back to ORIG

  - Moved ddl_log_complete() from send_eof() to finalize_ddl(). This
    checkpoint was not executed before for normal CREATE TABLE but is
    executed now.

  - CREATE TABLE will now rollback also if writing to the binary
    logging failed. See rpl_gtid_strict.test

backup ddl log changes

- In case of a successfull CREATE OR REPLACE we only log
  the CREATE event, not the DROP TABLE event of the old table.

ddl_log.cc changes

  ddl_log_execute_action() now properly return error conditions.
  ddl_log_disable_entry() added to allow one to disable one entry.
  The entry on disk is still reserved until ddl_log_complete() is
  executed.

On XID usage

  Like with all other atomic DDL operations XID is used to avoid
  inconsistency between master and slave in the case of a crash after
  binary log is written and before ddl_log_state_create is closed. On
  recovery XIDs are taken from binary log and corresponding DDL log
  events get disabled.  That is done by
  ddl_log_close_binlogged_events().

On linking two chains together

  Chains are executed in the ascending order of entry_pos of execute
  entries. But entry_pos assignment order is undefined: it may assign
  bigger number for the first chain and then smaller number for the
  second chain. So the execution order in that case will be reverse:
  second chain will be executed first.

  To avoid that we link one chain to another. While the base chain
  (ddl_log_state_create) is active the secondary chain
  (ddl_log_state_rm) is not executed. That is: only one chain can be
  executed in two linked chains.

  The interface ddl_log_link_chains() was defined in "MDEV-22166
  ddl_log_write_execute_entry() extension".

Atomic info parameters in HA_CREATE_INFO

  Many functions in CREATE TABLE pass the same parameters. These
  parameters are part of table creation info and should be in
  HA_CREATE_INFO (or whatever). Passing parameters via single
  structure is much easier for adding new data and
  refactoring.

InnoDB changes
  Added ha_innobase::can_be_renamed_to_backup() to check if
  a table with foreign keys can be renamed.

Aria changes:
- Fixed issue in Aria engine with CREATE + locked tables
  that data was not properly commited in some cases in
  case of crashes.

Other changes:
- Removed some auto variables in log.cc for better code readability.
- Fixed old bug that CREATE ... SELECT would not be able to auto repair
  a table that is part of the SELECT.
- Marked MyISAM that it does not support ROLLBACK (not required but
  done for better consistency with other engines).

Known issues:
- InnoDB tables with foreign key definitions are not fully supported
  with atomic create and replace:
  - ha_innobase::can_be_renamed_to_backup() can detect some cases
    where InnoDB does not support renaming table with foreign key
    constraints.  In this case MariaDB will drop the old table before
    creating the new one.
    The detected cases are:
    - The new and old table is using the same foreign key constraint
      name.
    - The old table has self referencing constraints.
  - If the old and new table uses the same name for a constraint the
    create of the new table will fail. The orignal table will be
    restored in this case.
  - The above issues will be fixed in a future commit.
- CREATE OR REPLACE TEMPORARY table is not full atomic. Any conflicting
  table will always be dropped before creating a new one. (Old behaviour).

Bug fixes related to this MDEV:

MDEV-36435 Assertion failure in finalize_locked_tables()
MDEV-36439 Assertion `thd_arg->lex->sql_command != SQLCOM_CREATE_SEQUENCE...
MDEV-36498 Failed CoR in non-atomic mode no longer generates DROP in RBR...
MDEV-36508 Temporary files #sql-create-....frm occasionally stay after
           crash recovery

Reverted commits:
MDEV-36685 "CREATE-SELECT may lose in binlog side-effects of
stored-routine" as it did not take into account that it safe to clear
binlogs if the created table is non transactional and there are no
other non transactional tables used.
- This was done because it caused extra logging when it is not needed
  (not using any non transactional tables) and it also did not solve
  side effects when using statement based loggging.
2025-12-27 14:31:51 +02:00
..
collections
include MDEV-25292 Atomic CREATE OR REPLACE TABLE 2025-12-27 14:31:51 +02:00
lib Merge branch '11.8' into 12.0 2025-05-22 09:22:55 +02:00
main MDEV-25292 Atomic CREATE OR REPLACE TABLE 2025-12-27 14:31:51 +02:00
std_data Merge branch '11.8' into 12.0 2025-06-18 07:50:39 +02:00
suite MDEV-25292 Atomic CREATE OR REPLACE TABLE 2025-12-27 14:31:51 +02:00
asan.supp
CMakeLists.txt
dgcov.pl
lsan.supp
mariadb-stress-test.pl
mariadb-test-run.pl Merge branch '11.8' into 12.0 2025-05-22 09:22:55 +02:00
mtr.out-of-source
purify.supp
README
README-gcov
README.stress
suite.pm
valgrind.supp

This directory contains test suites for the MariaDB server. To run
currently existing test cases, execute ./mysql-test-run in this directory.

Some tests are known to fail on some platforms or be otherwise unreliable.
In the file collections/smoke_test there is a list of tests that are
expected to be stable.

In general you do not have to do "make install", and you can have
a co-existing MariaDB installation, the tests will not conflict with it.
To run the tests in a source directory, you must do "make" first.

In Red Hat distributions, you should run the script as user "mysql".
The user is created with nologin shell, so the best bet is something like
  # su -
  # cd /usr/share/mariadb-test
  # su -s /bin/bash mysql -c ./mysql-test-run

This will use the installed MariaDB executables, but will run a private
copy of the server process (using data files within /usr/share/mariadb-test),
so you need not start the mysqld service beforehand.

You can omit --skip-test-list option if you want to check whether
the listed failures occur for you.

To clean up afterwards, remove the created "var" subdirectory, e.g.
  # su -s /bin/bash - mysql -c "rm -rf /usr/share/mariadb-test/var"

If tests fail on your system, please read the following manual section
for instructions on how to report the problem:

https://mariadb.com/kb/en/reporting-bugs

If you want to use an already running MySQL server for specific tests,
use the --extern option to mysql-test-run. Please note that in this mode,
you are expected to provide names of the tests to run.

For example, here is the command to run the "alias" and "analyze" tests
with an external server:

  # mariadb-test-run --extern socket=/tmp/mysql.sock alias analyze

To match your setup, you might need to provide other relevant options.

With no test names on the command line, mysql-test-run will attempt
to execute the default set of tests, which will certainly fail, because
many tests cannot run with an external server (they need to control the
options with which the server is started, restart the server during
execution, etc.)

You can create your own test cases. To create a test case, create a new
file in the main subdirectory using a text editor. The file should have a .test
extension. For example:

  # xemacs t/test_case_name.test

In the file, put a set of SQL statements that create some tables,
load test data, and run some queries to manipulate it.

Your test should begin by dropping the tables you are going to create and
end by dropping them again. This ensures that you can run the test over
and over again.

If you are using mysqltest commands in your test case, you should create
the result file as follows:

  # mariadb-test-run --record test_case_name

  or

  # mariadb-test --record < t/test_case_name.test

If you only have a simple test case consisting of SQL statements and
comments, you can create the result file in one of the following ways:

  # mariadb-test-run --record test_case_name

  # mariadb test < t/test_case_name.test > r/test_case_name.result

  # mariadb-test --record --database test --result-file=r/test_case_name.result < t/test_case_name.test

When this is done, take a look at r/test_case_name.result.
If the result is incorrect, you have found a bug. In this case, you should
edit the test result to the correct results so that we can verify that
the bug is corrected in future releases.

If you want to submit your test case you can send it
to developers@lists.mariadb.org or attach it to a bug report on
http://mariadb.org/jira/.

If the test case is really big or if it contains 'not public' data,
then put your .test file and .result file(s) into a tar.gz archive,
add a README that explains the problem, ftp the archive to
ftp://ftp.mariadb.org/private and submit a report to
https://mariadb.org/jira about it.

The latest information about mysql-test-run can be found at:
https://mariadb.com/kb/en/mariadb/mysqltest/

If you want to create .rdiff files, check
https://mariadb.com/kb/en/mariadb/mysql-test-auxiliary-files/