- Removed duplicate words, like "the the" and "to to"
- Removed duplicate lines (one double sort line found in mysql.cc)
- Fixed some typos found while searching for duplicate words.
Command used to find duplicate words:
egrep -rI "\s([a-zA-Z]+)\s+\1\s" | grep -v param
Thanks to Artjoms Rimdjonoks for the command and pointing out the
spelling errors.
main/statistics_json.result is updated for f8ba5ced55 (MDEV-36099)
The test uses 'delete from t1' in many places and then populates
the table again. The natural order of rows in a MyISAM table is well
defined and the test was implicitly relying on that.
before f8ba5ced55 delete was deleting rows one by one, using
ha_myisam::delete_row() because the connection was stuck in rbr mode.
This caused rows to be shown in the reverse insertion order (because of
the delete link list).
MDEV-36099 fixes this bug and the server now correctly uses
ha_myisam::delete_all_rows(). This makes rows to be shown in the
insertion order as expected.
MDEV-35499 Errored-out CREATE-or-REPLACE-SELECT does not log DROP table into binlog
MDEV-35502 Failed at ROW-format binlogging CREATE-TABLE-SELECT should
not generate Incident event
When a CREATE TABLE .. SELECT errors while inserting data, a user
would expect that all changes are rolled back
and the table would not exist after executing the query.
However CREATE-TABLE-SELECT can face an error near the end of its execution
select_create::send_eof() so that the error was never checked which
led to various assert inside binlogging path that should not be
attended at all.
Specifically when binlog_commit() of ha_commit_one_phase() that
CREATE-TABLE-SELECT employs errored out because of a limited cache size
(binlog_commit may try writing to a transactional cache) the cache
was not flushed to binlog. The missed error check allowed further
execution down to trans_commit_implicit() in whose stack
DBUG_ASSERT(!(entry->using_trx_cache && !mngr->trx_cache.empty() &&
mngr->get_binlog_cache_log(TRUE)->error));
fired. In a non-debug build that table remains created/populated
inconsistently with binlog.
The fixes need and install the error checking in select_create::send_eof().
That prevents from any further execution when ha_commit_one_phase() fails
for any reason (typically due to binlog_commit()).
This commit also covers CREATE-or-REPLACE-SELECT that additionally had
a specific issue in that DROP TABLE was not logged the binary log, MDEV-35499.
See changes select_create::abort_result_set().
The current commit also corrects an unnecessary Incident event
logging when CREATE-TABLE-SELECT encounters a binloging issue, MDEV-35502.
The Incident was actually only harmful in this case as the table was
never going to be created, therefore replicated, in such a case.
In "normal" cases when the SELECT phase errors due to binlogging, an
internal incident flag gets reset inside select_create::abort_result_set().
A hunk in select_insert::prepare_eof() addresses a specific kind of
this issue that deals with incorrect computation of the binlog cache type.
Because of that in the OLD version execution was allowed to proceed along
ha_commit_trans()..binlog_commit() while a Pending event was not
flushed to the transactional cache. That might lead to the unnecessary
binlogged Incident despite the select_create::abort_result_set()
measures. However now with the corrected cache type any binlogging error
to flush the Pending event is covered according to the normal case.
non-transaction table, updates to the non-transactional table
NOTE the commit contains few tests overlapping with unfixed yet MDEV-36027.
Thanks to Brandon Nesterenko and Kristian Nielsen for thorough review,
and Kristian additionally for ideas to simplify the patch and some
code contribution.
into a separate transaction_participant structure
handlerton inherits it, so handlerton itself doesn't change.
but entities that only need to participate in a transaction,
like binlog or online alter log, use a transaction_participant
and no longer need to pretend to be a full-blown but invisible
storage engine which doesn't support create table.
for large transaction
Description
===========
When a transaction commits, it copies the binlog events from
binlog cache to binlog file. Very large transactions
(eg. gigabytes) can stall other transactions for a long time
because the data is copied while holding LOCK_log, which blocks
other commits from binlogging.
The solution in this patch is to rename the binlog cache file to
a binlog file instead of copy, if the commiting transaction has
large binlog cache. Rename is a very fast operation, it doesn't
block other transactions a long time.
Design
======
* binlog_large_commit_threshold
type: ulonglong
scope: global
dynamic: yes
default: 128MB
Only the binlog cache temporary files large than 128MB are
renamed to binlog file.
* #binlog_cache_files directory
To support rename, all binlog cache temporary files are managed
as normal files now. `#binlog_cache_files` directory is in the same
directory with binlog files. It is created at server startup if it doesn't
exist. Otherwise, all files in the directory is deleted at startup.
The temporary files are named with ML_ prefix and the memorary address
of the binlog_cache_data object which guarantees it is unique.
* Reserve space
To supprot rename feature, It must reserve enough space at the
begin of the binlog cache file. The space is required for
Format description, Gtid list, checkpoint and Gtid events when
renaming it to a binlog file.
Since binlog_cache_data's cache_log is directly accessed by binlog log,
online alter and wsrep. It is not easy to update all the code. Thus
binlog cache will not reserve space if it is not session binlog cache or
wsrep session is enabled.
- m_file_reserved_bytes
Stores the bytes reserved at the begin of the cache file.
It is initialized in write_prepare() and cleared by reset().
The reserved file header is hide to callers. Thus there is no
change for callers. E.g.
- get_byte_position() still get the length of binlog data
written to the cache, but not the file length.
- truncate(0) will truncate the file to m_file_reserved_bytes but not 0.
- write_prepare()
write_prepare() is called everytime when anything is being written
into the cache. It will call init_file_reserved_bytes() to create
the cache file (if it doesn't exist) and reserve suitable space if
the data written exceeds buffer's size.
* Binlog_commit_by_rotate
It is used to encapsulate the code for remaing a binlog cache
tempoary file to binlog file.
- should_commit_by_rotate()
it is called by write_transaction_to_binlog_events() to check if
a binlog cache should be rename to a binlog file.
- commit()
That is the entry to rename a binlog cache and commit the
transaction. Both rename and commit are protected by LOCK_log,
Thus not other transactions can write anything into the renamed
binlog before it.
Rename happens in a rotation. After the new binlog file is generated,
replace_binlog_file() is called to:
- copy data from the new binlog file to its binlog cache file.
- write gtid event.
- rename the binlog cache file to binlog file.
After that the rotation will continue to succeed. Then the transaction
is committed in a seperated group itself. Its cache file will be
detached and cache log will be reset before calling
trx_group_commit_with_engines(). Thus only Xid event be written.
PURGE BINARY LOGS did not always purge binary logs. This commit fixes
some of the issues and adds notifications if a binary log cannot be
purged.
User visible changes:
- 'PURGE BINARY LOG TO log_name' and 'PURGE BINARY LOGS BEFORE date'
worked differently. 'TO' ignored 'slave_connections_needed_for_purge'
while 'BEFORE' did not. Now both versions ignores the
'slave_connections_needed_for_purge variable'.
- 'PURGE BINARY LOG..' commands now returns 'note' if a binary log cannot
be deleted like
Note 1375 Binary log 'master-bin.000004' is not purged because it is
the current active binlog
- Automatic binary log purges, based on date or size, will write a
note to the error log if a binary log matching the size or date
cannot yet be deleted.
- If 'slave_connections_needed_for_purge' is set from a config or
command line, it is set to 0 if Galera is enabled and 1 otherwise
(old default). This ensures that automatic binary log purge works
with Galera as before the addition of
'slave_connections_needed_for_purge'.
If the variable is changed to 0, a warning will be printed to the error
log.
Code changes:
- Added THD argument to several purge_logs related functions that needed
THD.
- Added 'interactive' options to purge_logs functions. This allowed
me to remove testing of sql_command == SQLCOM_PURGE.
- Changed purge_logs_before_date() to first check if log is applicable
before calling can_purge_logs(). This ensures we do not get a
notification for logs that does not match the remove criteria.
- MYSQL_BIN_LOG::can_purge_log() will write notifications to the user
or error log if a log cannot yet be removed.
- log_in_use() will return reason why a binary log cannot be removed.
Changes to keep code consistent:
- Moved checking of binlog_format for Galera to be after Galera is
initialized (The old check never worked). If Galera is enabled
we now change the binlog_format to ROW, with a warning, instead of
aborting the server. If this change happens a warning will be printed to
the error log.
- Print a warning if Galera or FLASHBACK changes the binlog_format
to ROW. Before it was done silently.
Reviewed by: Sergei Golubchik <serg@mariadb.com>,
Kristian Nielsen <knielsen@knielsen-hq.org>
Similar to #2480.
567b681 introduced safe_strcpy() to minimize the use of C with
potentially unsafe memory overflow with strcpy() whose use is
discouraged.
Replace instances of strcpy() with safe_strcpy() where possible, limited
here to files in the `sql/` directory.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
Some fixes related to commit f838b2d799 and
Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row()
for system-versioned tables were provided by Nikita Malyavin.
This was required by test versioning.rpl,trx_id,row.
Starting with clang-16, MemorySanitizer appears to check that
uninitialized values not be passed by value nor returned.
Previously, it was allowed to copy uninitialized data in such cases.
get_foreign_key_info(): Remove a local variable that was passed
uninitialized to a function.
DsMrr_impl: Initialize key_buffer, because DsMrr_impl::dsmrr_init()
is reading it.
test_bind_result_ext1(): MYSQL_TYPE_LONG is 32 bits, hence we must
use a 32-bit type, such as int. sizeof(long) differs between
LP64 and LLP64 targets.
binlog_space_limit is a variable in Percona server used to limit the total
size of all binary logs.
This implementation is based on code from Percona server 5.7.
In MariaDB we decided to call the variable max-binlog-total-size to be
similar to max-binlog-size. This makes it easier to find in the output
from 'mariadbd --help --verbose'). MariaDB will also support
binlog_space_limit for compatibility with Percona.
Some internal notes to explain implementation notes:
- When running MariaDB does not delete binary logs that are either
used by slaves or have active xid that are not yet committed.
Some implementation notes:
- max-binlog-total-size is by default 0 (no limit).
- max-binlog-total-size can be changed without server restart.
- Binlog file sizes are checked on startup, or if
max-binlog-total-size is set to a value > 0, not for every log write.
The total size of all binary logs is cached and dynamically updated
when updating the binary log on binary log rotation.
- max-binlog-total-size is checked against existing log files during
serverstart, binlog rotation, FLUSH LOGS, when writing to binary log
or when max-binlog-total-size changes value.
- Option --slave-connections-needed-for-purge with 1 as default added.
This allows one to ensure that we do not delete binary logs if there
is less than 'slave-connections-needed-for-purge' connected.
Without this option max-binlog-total-size would potentially delete
binlogs needed by slaves on server startup or when a slave disconnects
as there are then no connected slaves to protect active binlogs.
- PURGE BINARY LOGS TO ... will be executed as if
slave-connectitons-needed-for-purge would be zero. In other words
it will do the purge even if there is no slaves connected. If there
are connected slaves working on the logs, these will be protected.
- If binary log is on and max-binlog-total_size <> 0 then the status
variable 'Binlog_disk_use' shows the current size of all old binary
logs + the state of the current one.
- Removed test of strcmp(log_file_name, log_info.log_file_name) in
purge_logs_before_date() as this is tested in can_purge_logs()
- To avoid expensive calls of log_in_use() we cache the result for the
last log that is in use by a slave. Future calls to can_purge_logs()
for this binary log will be quickly detected and false will be returned
until a slave starts working on a new log.
- Note that after a binary log rotation caused by max_binlog_size,
the last log will not be purged directly as it is still in use
internally. The next binary log write will purge binlogs if needed.
Reviewer:Kristian Nielsen <knielsen@knielsen-hq.org>
Improve the performance of slave connect using B+-Tree indexes on each binlog
file. The index allows fast lookup of a GTID position to the corresponding
offset in the binlog file, as well as lookup of a position to find the
corresponding GTID position.
This eliminates a costly sequential scan of the starting binlog file
to find the GTID starting position when a slave connects. This is
especially costly if the binlog file is not cached in memory (IO
cost), or if it is encrypted or a lot of slaves connect simultaneously
(CPU cost).
The size of the index files is generally less than 1% of the binlog data, so
not expected to be an issue.
Most of the work writing the index is done as a background task, in
the binlog background thread. This minimises the performance impact on
transaction commit. A simple global mutex is used to protect index
reads and (background) index writes; this is fine as slave connect is
a relatively infrequent operation.
Here are the user-visible options and status variables. The feature is on by
default and is expected to need no tuning or configuration for most users.
binlog_gtid_index
On by default. Can be used to disable the indexes for testing purposes.
binlog_gtid_index_page_size (default 4096)
Page size to use for the binlog GTID index. This is the size of the nodes
in the B+-tree used internally in the index. A very small page-size (64 is
the minimum) will be less efficient, but can be used to stress the
BTree-code during testing.
binlog_gtid_index_span_min (default 65536)
Control sparseness of the binlog GTID index. If set to N, at most one
index record will be added for every N bytes of binlog file written.
This can be used to reduce the number of records in the index, at
the cost only of having to scan a few more events in the binlog file
before finding the target position
Two status variables are available to monitor the use of the GTID indexes:
Binlog_gtid_index_hit
Binlog_gtid_index_miss
The "hit" status increments for each successful lookup in a GTID index.
The "miss" increments when a lookup is not possible. This indicates that the
index file is missing (eg. binlog written by old server version
without GTID index support), or corrupt.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This patch fixes cases where a transaction caused empty writeset to be
replicated. This could happen in the case where a transaction executes
a statement that initially manages to modify some data and therefore
appended keys some for certification. The statement is however rolled
back at some later stage due to some error (for example, a duplicate
key error). After statement rollback the transaction is still alive,
has no other changes. When committing such transaction, an empty
writeset was replicated through Galera.
The fix is to avoid calling into commit hook only when transaction
has appended one or keys for certification *and* has some data in
binlog cache to replicate. Otherwise, the commit is considered empty,
and goes through usual empty commit path.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Use standard handlerton functions for savepoint add/rollback.
To identify the savepoint, the pointer passed is used.
Every table that has online alter in progress maintains a list of
savepoints independently.
Also this removes setting a value to a global variable savepoint_alloc_size
without any protection, which was a race condition bug.
Move all the functions dedicated to online alter to a newly created
online_alter.cc.
With that, make many functions static and simplify the static functions
naming.
Also, rename binlog_log_row_online_alter -> online_alter_log_row.