This is a combined patch of various spelling fixes originally done in
Debian.
* Fix misc typos in MariaDB Server
* Fix spelling of 'allows one to'
Fix the following Lintian nags introduced in commit
c8d040938a:
I: mariadb-backup: spelling-error-in-binary "allows to" "allows one to" [usr/bin/mariadb-backup]
I: mariadb-server-core: spelling-error-in-binary "allows to" "allows one to" [usr/sbin/mariadbd]
I: mariadb-test: spelling-error-in-binary "allows to" "allows one to" [usr/bin/mariadb-client-test-embedded]
I: mariadb-test: spelling-error-in-binary "allows to" "allows one to" [usr/bin/mariadb-test-embedded]
I: mariadb-test: spelling-error-in-binary "allows to" "allows one to" [usr/bin/test-connect-t]
multiple file tablespace
Problem:
=======
- innochecksum was incorrectly interpreting doublewrite buffer
pages as index pages, causing confusion about stale tables
in the system tablespace.
- innochecksum fails to parse the multi-file system tablespace
Solution:
========
1. Rewrite checksum of doublewrite buffer pages
are skipped.
2. Introduced the option --tablespace-flags which can be used
to initialize page size. This option can handle the ibdata2,
ibdata3 etc without parsing ibdata1.
xtrabackup_backup_func(): Invoke btr_search_sys_create(), because
innodb_shutdown() assumes that it will have been called.
srv_boot(): Invoke btr_search_sys_create(). This fixes assertion failures
in the test innodb.temporary_table.
btr_sea::create(): Do not invoke enable().
buf_pool_t::create(): Instead of invoking btr_sea::create(),
invoke btr_sea::enable() when needed.
mtr_t::trx: New public const data member. If it is nullptr,
per-connection statistics will not be updated. The transaction is
not necessarily in active state. We may merely use it as an "anchor"
for buffering updates of buf_pool.stat.n_page_gets in
trx_t::pages_accessed.
As part of this, we try to create mtr_t less often, reusing one object
in multiple places. Some read operations will
invoke mtr_t::rollback_to_savepoint() to release their own page latches
within a larger mini-transactions.
Reviewed by: Vladislav Lesin
Tested by: Saahil Alam
Fixed the following issues:
- aria_read_index() and aria_read_data(), used by mariabackup, checked
the wrong status from maria_page_crc_check().
- Both functions did infinite retries if crc did not match.
- Wrong usage of ma_check_if_zero() in maria_page_crc_check()
Author: Thirunarayanan Balathandayuthapani <thiru@mariadb.com>
This was generally good to get done but also needed to be able to run
mariabackup test under asan.
Things freed:
- Allocated variables (mysql_tmpdir_list, opt_passwd etc)
- InnoDB variables
- Results from SQL queries (A lot of sql queries did not free their result)
- Allocated sys_vars
- Server variables (mysql_server_end())
- Memory allocated by plugins (encryption)
- Free variables allocated by my_default. (Old code had a bug that caused
these to not be freed)
Other things:
- Moved freeing of mysql_tmpdir_list to main, as the old code did not
free the last mysqltmp_dir allocation. Now we also initialize the
variable only once.
- Fixed a serious, potentially 'crashing at end' bug where we called
free_defaults() with wrong pointers.
- Fixed a bug related to update_malloc_size() where we did not take
into account the it was not changed.
- Fixed a bug in Sys_var_charptr_base where we did not allocate
default values. This could lead to trying to free not allocated values
in xtrabackup.
- Added sf_have_memory_leak() to be able to easily check if there was
a memory leak when using safemalloc()
- sf_report_leaked_memory() now returns 1 if a memory leak was found.
The innodb_encrypt_log=ON subformat of FORMAT_10_8 is inefficient,
because a new encryption or decryption context is being set up for
every log record payload snippet.
An in-place conversion between the old and new innodb_encrypt_log=ON
format is technically possible. No such conversion has been
implemented, though. There is some overhead with respect to the
unencrypted format (innodb_encrypt_log=OFF): At the end of each
mini-transaction, right before the CRC-32C, additional 8 bytes will be
reserved for a nonce (really, log_sys.get_flushed_lsn()), which forms
a part of an initialization vector.
log_t::FORMAT_ENC_11: The new format identifier, a UTF-8 encoding of
🗝 U+1F5DD OLD KEY (encryption). In this format, everything except the
types and lengths of log records will be encrypted. Thus, unlike in
FORMAT_10_8, also page identifiers and FILE_ records will be encrypted.
The initialization vector (IV) consists of the 8-byte nonce as well as
the type and length byte(s) of the first record of the mini-transaction.
Page identifiers will no longer form any part of the IV.
The old log_t::FORMAT_ENC_10_8 (innodb_encrypt_log=ON) will be supported
both by mariadb-backup and by crash recovery. Downgrade from the new
format will only be possible if the new server has been running or
restarted with innodb_encrypt_log=OFF. If innodb_encrypt_log=ON,
only the new log_t::FORMAT_ENC_11 will be written.
log_t::is_recoverable(): A new predicate, which holds for all 3
formats.
recv_sys_t::tmp_buf: A heap-allocated buffer for decrypting a
mini-transaction, or for making the wrap-around of a memory-mapped
log file contiguous.
recv_sys_t::start_lsn: The start of the mini-transaction.
Updated at the start of parse_tail().
log_decrypt_mtr(): Decrypt a mini-transaction in recv_sys.tmp_buf.
Theoretically, when reading the log via pread() rather than a read-only
memory mapping, we could modify the contents of log_sys.buf in place.
If we did that, we would have to re-read the last log block into
log_sys.buf before resuming writes, because otherwise that block could be
re-written as a mix of old decrypted data and new encrypted data, which
would cause a subsequent recovery failure unless the log checkpoint had
been advanced beyond this point.
log_decrypt_legacy(): Decrypt a log_t::FORMAT_ENC_10_8 record snippet
on stack. Replaces recv_buf::copy_if_needed().
recv_sys_t::get_backup_parser(): Return a recv_sys_t::parser, that is,
a pointer to an instantiation of parse_mmap or parse_mtr for the current
log format.
recv_sys_t::parse_mtr(), recv_sys_t::parse_mmap(): Add a parameter
template<uint32_t> for the current log_sys.format.
log_parse_start(): Validate the CRC-32C of a mini-transaction.
This has been split from the recv_sys_t::parse() template to
reduce code duplication. These two are the lowest-level functions
that will be instantiated for both recv_buf and recv_ring.
recv_sys_t::parse(): Split into ::log_parse_start() and parse_tail().
Add a parameter template<uint32_t format> to specialize for
log_sys.format at compilation time.
recv_sys_t::parse_tail(): Operate on pointers to contiguous
mini-transaction data. Use a parameter template<bool ENC_10_8>
for special handling of the old innodb_encrypt_log=ON format.
The former recv_buf::get_buf() is being inlined here.
Much of the logic is split into non-inline functions, to avoid
duplicating a lot of code for every template expansion.
log_crypt: Encrypt or decrypt a mini-transaction in place in the
new innodb_encrypt_log=ON format. We will use temporary buffers
so that encryption_ctx_update() can be invoked on integer multiples
of MY_AES_BLOCK_SIZE, except for the last bytes of the encrypted
payload, which will be encrypted or decrypted in place thanks to
ENCRYPTION_FLAG_NOPAD.
log_crypt::append(): Invoke encryption_ctx_update() in MY_AES_BLOCK_SIZE
(16-byte) blocks and scatter/gather shorter data blocks as needed.
log_crypt::finish(), Handle the last (possibly incomplete) block as a
special case, with ENCRYPTION_FLAG_NOPAD.
mtr_t::parse_length(): Parse the length of a log record.
mtr_t::encrypt(): Use log_crypt instead of the old log_encrypt_buf().
recv_buf::crc32c(): Add a parameter for the initial CRC-32C value.
recv_sys_t::rewind(): Operate on pointers to the start of the
mini-transaction and to the first skipped record.
recv_sys_t::trim(): Declare as ATTRIBUTE_COLD so that this rarely
invoked function will not be expanded inline in parse_tail().
recv_sys_t::parse_init(): Handle INIT_PAGE or FREE_PAGE while scanning
to the end of the log.
recv_sys_t::parse_page0(): Handle WRITE to FSP_SPACE_SIZE and
FSP_SPACE_FLAGS.
recv_sys_t::parse_store_if_exists(), recv_sys_t::parse_store(),
recv_sys_t::parse_oom(): Handle page-level log records.
mlog_decode_varint_length(): Make use of __builtin_clz() to avoid a loop
when possible.
mlog_decode_varint(): Define only on const byte*, as
ATTRIBUTE_NOINLINE static because it is a rather large function.
recv_buf::decode_varint(): Trivial wrapper for mlog_decode_varint().
recv_ring::decode_varint(): Special implementation.
log_page_modify(): Note that a page will be modified in recovery.
Split from recv_sys_t::parse_tail().
log_parse_file(): Handle non-page log records.
log_record_corrupted(), log_unknown(), log_page_id_corrupted():
Common error reporting functions.
Ever since commit 685d958e38
(MDEV-14425) mariadb-backup --backup had some trouble to keep up
with write workloads of the mariadbd server.
Debarun Banerjee found out that mariadb-backup --backup was
copying the log in the wrong way and not pausing when it made
sense to do so. This change includes his fix as well as some
dead code removal from xtrabackup_copy_mmap_logfile().
Some earlier changes to the default behaviour of mariadb-backup --backup
will be reverted, by making the configuration parameters OFF by default.
These parameters were basically working around this bug:
* commit 652f33e0a4 (MDEV-30000)
introduced --innodb-log-checkpoint-now and made it ON by default.
Making the server execute a log checkpoint can be really I/O intensive.
* commit 6acada713a (MDEV-34062)
introduced --innodb-log-file-mmap and made it ON by default on
Linux and FreeBSD. There are no documented semantics what should
happen to a memory mapping when there are concurrent pwrite(2)
operations by other processes. While it appears to work, it is safer
to default to clearly documented semantics.
xtrabackup_copy_logfile(): Add a parameter early_exit.
Always read a log snippet to the start of recv_sys.buf and assign
recv_sys.len to the read length. We used to shift recv_sys.buf
with memmove(). However, on recv_sys_t::PREMATURE_EOF we cannot know
which part of the mini-transaction was correctly read, because that
part of the ib_logfile0 may be concurrently modified by the server.
So, we will reread everything from the start of the mini-transaction.
xtrabackup_backup_func(): Invoke xtrabackup_copy_logfile(true),
allowing it to stop on every recv_sys_t::PREMATURE_EOF.
This will also avoid repeated "Retry" messages when there is no
more redo log to copy.
get_current_lsn(): Execute FLUSH ENGINE LOGS to ensure that
InnoDB will complete any buffered writes to the ib_logfile0
and ensure that everything up to the current LSN has been
written.
backup_wait_for_commit_lsn(): Wait for as much as is really needed.
This avoids an extra 5-second wait at the end of the backup.
xtrabackup_copy_mmap_logfile(): Remove some dead code, and add
debug assertions to demonstrate that the parser can only return
recv_sys_t::OK or recv_sys_t::GOT_EOF.
main/statistics_json.result is updated for f8ba5ced55 (MDEV-36099)
The test uses 'delete from t1' in many places and then populates
the table again. The natural order of rows in a MyISAM table is well
defined and the test was implicitly relying on that.
before f8ba5ced55 delete was deleting rows one by one, using
ha_myisam::delete_row() because the connection was stuck in rbr mode.
This caused rows to be shown in the reverse insertion order (because of
the delete link list).
MDEV-36099 fixes this bug and the server now correctly uses
ha_myisam::delete_all_rows(). This makes rows to be shown in the
insertion order as expected.
Problem:
========
- Mariabackup ignores tables-file option because it fails to
register the new entry in database hash cells. This issue was
caused by the commit 3c312d247c (MDEV-35190).
xb_register_filter_entry() fails to stop when it encounters the
empty list.
Solution:
=========
xb_register_filter_entry(): If there is no next member to
deference then it return pointer to existing element.
backup_file_op_fail(): Ignore the FTS internal table if it is being
created in late phase of backup. mariabackup --prepare should be
handle intermediate table and orphaned fts internal table.
check_if_fts_table(): Determine whether the space name belongs to
internal FTS table
Added option 'aria-pagecache-segments', default 1.
For values > 1, this split the aria-pagecache-buffer into the given
number of segments, each independent from each other. Having multiple
pagecaches improve performance when multiple connections runs queries
concurrently using different tables.
Each pagecache will use aria-pageache-buffer/segments amount of
memory, however at least 128K.
Each opened table has its index and data file use the segments in a
a round-robin fashion.
Internal changes:
- All programs allocating the maria pagecache themselves should now
call multi_init_pagecache() instead of init_pagecache().
- pagecache statistics is now stored in 'pagecache_stats' instead of
maria_pagecache. One must call multi_update_pagecache_stats() to
update the statistics.
- Added into PAGECACHE_FILE a pointer to files pagecache. This was
done to ensure that index and data file are using the same
pagecache and simplified the checkpoint code.
I kept pagecache in TABLE_SHARE to minimize the changes.
- really_execute_checkpoint() was update to handle a dynamic number of
pagecaches.
- pagecache_collect_changed_blocks_with_lsn() was slight changed to
allow it to be called for each pagecache.
- undefined not used functions maria_assign_pagecache() and
maria_change_pagecache()
- ma_pagecaches.c is totally rewritten. It now contains all
multi_pagecache functions.
Errors found be QA that are fixed:
MDEV-36872 UBSAN errors in ma_checkpoint.c
MDEV-36874 Behavior upon too small aria_pagecache_buffer_size in case of
multiple segments is not very user-friendly
MDEV-36914 ma_checkpoint.c(285,9): conversion from '__int64' to 'uint'
treated as an error
MDEV-36912 sys_vars.sysvars_server_embedded and
sys_vars.sysvars_server_notembedded fail on x86
backup_file_op_fail(): Ignore the FTS internal table if it is being
created in late phase of backup. mariabackup --prepare should be
handle intermediate table and orphaned fts internal table.
check_if_fts_table(): Modified the code to check for fts auxiliary
table and fts common tables.
Revised some error message during backup stage commands.
This controls which linux implementation to use for
innodb_use_native_aio=ON.
innodb_linux_aio=auto is equivalent to innodb_linux_aio=io_uring when
it is available, and falling back to innodb_linux_aio=aio when not.
Debian packaging is no longer aio exclusive or uring, so
for those older Debian or Ubuntu releases, its a remove_uring directive.
For more recent releases, add mandatory liburing for consistent packaging.
WITH_LIBAIO is now an independent option from WITH_URING.
LINUX_NATIVE_AIO preprocessor constant is renamed to HAVE_LIBAIO,
analogous to existing HAVE_URING.
tpool::is_aio_supported(): A common feature check.
is_linux_native_aio_supported(): Remove. This had originally been added in
mysql/mysql-server@0da310b69d in 2012
to fix an issue where io_submit() on CentOS 5.5 would return EINVAL
for a /tmp/#sql*.ibd file associated with CREATE TEMPORARY TABLE.
But, starting with commit 2e814d4702 InnoDB
temporary tables will be written to innodb_temp_data_file_path.
The 2012 commit said that the error could occur on "old kernels".
Any GNU/Linux distribution that we currently support should be based
on a newer Linux kernel; for example, Red Hat Enterprise Linux 7
was released in 2014.
tpool::create_linux_aio(): Wraps the Linux implementations:
create_libaio() and create_liburing(), each defined in separate
compilation units (aio_linux.cc, aio_libaio.cc, aio_liburing.cc).
The CMake definitions are simplified using target_sources() and
target_compile_definitions(), all available since CMake 2.8.12.
With this change, there is no need to include ${CMAKE_SOURCE_DIR}/tpool
or add TPOOL_DEFINES flags anymore, target_link_libraries(lib tpool)
does all that.
This is joint work with Daniel Black and Vladislav Vaintroub.
log_hdr_buf: Align to an 8-byte boundary, because we will actually
assume at least 4-byte alignment in log_crypt_write_header().
This fixes a regression that had been introduced in
commit 685d958e38 (MDEV-14425)
where a 512-byte alignment requirement was relaxed too much.
Problem:
=========
(1) Mariabackup tries to read the history data from
mysql.mariadb_backup_history and fails with segfault. Reason is that
mariabackup does force innodb_log_checkpoint_now from commit 652f33e0a44661d6093993d49d3e83d770904413(MDEV-30000).
Mariabackup sends the "innodb_log_checkpoint_now=1" query to server and
reads the result set for the query later in the code because the query
may trigger the page thread to flush the pages. But before reading the
query result for innodb_log_checkpoint_now=1, mariabackup does execute
the select query for the history table (mysql.mariadb_backup_history)
and wrongly reads the query result of innodb_log_checkpoint_now. This leads
to assertion in mariabackup.
(2) The recording of incremental backups has the format as "tar"
when mbstream was used. The xb_stream_fmt_t only had XB_STREAM_FMT_NONE
and XB_STREAM_FMT_XBSTREAM and hence in the mysql.mariadb_backup_history
table the format was recorded as "tar" for the "mbstream" due to the
offset in the xb_stream_name array within mariadb-backup.
(3) Also under Windows the full path of mariabackup was recorded in the the
history.
(4) select_incremental_lsn_from_history(): Name of the backup and UUID
of the history record variable could lead to buffer overflow while
copying the variable value from global variable.
Solution:
=========
(1) Move the reading of history data from mysql.mariadb_backup_history
after reading the result of innodb_log_checkpoint_now=1 query
(2) We've removed the "tar" element from the xb_stream_name. As the
"xbstream" was never used, the format name is changed to mbstream.
As the table needs alteration the "mbstream" appended instead of
the unused xbstream in the table. "tar" is left in the enum as
the previous recordings are still possible.
(3) The Windows path separator is used to store just the executable
name as the tool in the mariadb_backup_history table.
(4) select_incremental_lsn_from_history(): Check and validate
the length of incremental history name and incremental history uuid
before copying into temporary buffer
Thanks to Daniel black for contributing the code for solution (2) and (3)
MSAN has been updated since 2022 when this macro was added
and as such the working around MSAN's deficient understanding
of the fstat/stat syscall behaviour at the time is no longer
required.
As an effective no-op a straight removal is sufficient.
Backing up with mariabackup a datadir containing
ENGINE=Mroonga tables leaves behind the corresponding
*.mrn* files. Those tables are therefore broken once
such backup is restored.
minor style/mtr changes by Daniel Black
Fix AWS SDK build, it has changed substantionally since the plugin was
introduced. There is now a bunch of intermediate C libraries, aws-cpp-crt
and others, and for static linking, the link dependency must be declared.
Also support AWS C++ SDK in vcpkg package manager.
Clang ~16+ on MSAN became quite strict with uninitalized
data being passed and returned from functions. Non-debug builds
have a basic optimization that hides these from those builds
Two innodb cases violate the assumptions, however once inlined
with a basic optimization those that existed for uninitialized
values are removed.
(MDEV-36316) rec_set_bit_field_2 calling mach_read_from_2 hits a read of
bits it wasn't actually changing.
(MDEV-36327) The function dict_process_sys_columns_rec left
nth_v_col uninitialized unless it was a virtual column. This was
ok as the function i_s_sys_columns_fill_table also didn't read
this value unless it was a virtual column.
New warnings come from 3 places
1. Warning C5287: Warning comes from json_lib.c from code like
compile_time_assert((int) JSON_VALUE_NULL == (int) JSV_NULL);
2. Warning C5287: Similar warning come from wc_static_assert() from code
in wolfSSL's header file
3. Warning C5286 in WolfSSL code, -enum_value
(i.e multiplying enum with -1)is used
To fix:
- Disable warnings in WolfSSL code, using /wd<num> flag.
- workaround warning for users of WolfSSL, disable
wc_static_assert() with -DWC_NO_STATIC_ASSERT compile flag
- Rewrite some compile_time_assert in json_lib.c to avoid warning.
- add target_link_libraries(vio ${SSL_LIBRARIES}) so that
vio picks up -DWC_NO_STATIC_ASSERT
The problem was that aria_backup_client code did not intialize
maria_tmpdir, which is used during recovery if repair table is needed to
reconstruct indexes.
page_is_corrupted(): Do not allocate the buffers from stack,
but from the heap, in xb_fil_cur_open().
row_quiesce_write_cfg(): Issue one type of message when we
fail to create the .cfg file.
update_statistics_for_table(), read_statistics_for_table(),
delete_statistics_for_table(), rename_table_in_stat_tables():
Use a common stack buffer for Index_stat, Column_stat, Table_stat.
ha_connect::FileExists(): Invoke push_warning_printf() so that
we can avoid allocating a buffer for snprintf().
translog_init_with_table(): Do not duplicate TRANSLOG_PAGE_SIZE_BUFF.
Let us also globally enable the GCC 4.4 and clang 3.0 option
-Wframe-larger-than=16384 to reduce the possibility of introducing
such stack overflow in the future. For RocksDB and Mroonga we relax
these limits.
Reviewed by: Vladislav Lesin
Ensure that backup_reset_alter_copy_lock() is called in case of rollback
or error in mysql_inplace_alter_table() or copy_data_between_tables().
Other things:
- Improved error from mariabackup when unexpected DDL operation is
encountered.
- Added assert if backup_ddl_log() is called in the wrong context.