MDEV-37726 moved wsrep-start-position to INSTALL_RUNDATADIR
and made the latter to be created by systemd-tmpfiles.
Now postin scriptlet has to run systemd-tmpfiles explicitly
to make sure INSTALL_RUNDATADIR exists before restarting
the server.
followup for 649216e70d
* remove the file to be --repeat friendly
* specify the correct defaults-group-suffix. Spider doesn't use the
conventional .1 group, so MYSQLD_CMD skips a lot of config options,
in particular it doesn't read --tmpdir and creates files in /tmp
(Variant 2)
SEL_TREE* tree_or(SEL_TREE *X, SEL_TREE *Y) tries to conserve memory by
reusing object *X for the return value when possible.
MDEV-34620 has added logic to disable construction of index_merge plans
for N-way ORs with large N. That logic interfered with object reuse logic:
for the parameters of:
X = SEL_TREE{ trees=[key1_treeA, key2_treeB]}
Y = SEL_TREE{ trees=[key1_treeC]}
we would decide to reuse object X.
For key1, we would produce key_or(key1_treeA, key1_treeC)
but key2_treeB would be just left there.
Then, we would construct a range scan from key2_treeB.
Fixed by moving the "disable building index_merge plans" logic into a
location where it would not interfere with object reuse logic.
The NULL-aware index statistics fix is now controlled by the
FIX_INDEX_STATS_FOR_ALL_NULLS flag and disabled by default
for preserving execution plan stability in stable versions.
To enable:
SET @@new_mode = 'FIX_INDEX_STATS_FOR_ALL_NULLS';
Or via command line:
--new-mode=FIX_INDEX_STATS_FOR_ALL_NULLS
Or in configuration file:
[mysqld]
new_mode=FIX_INDEX_STATS_FOR_ALL_NULLS
`all_nulls_key_parts` bitmap is now calculated at set_statistics_for_table()
it was only truly used in one place, where it needed to compare
its arguments before removing an entry from the cache. in the second
place it was used, the comparison was redundant, it was only
called to remove, not to compare.
let's replace it with a function that just removes.
don't reload stored routines in the middle of the execution
of a routine. we don't want different iterations of a loop
to see diffefent definitions
For this: remember Cversion in THD on the first sp cache lookup,
after that only compare versions with this value not with Cversion.
Instead of adding another TABLE_LIST to
Item_func_nextval->table_list->next_local, update instead
Item_func_nextval->table_list->table with the correct table.
This removes all checking of table_list->table and table_list->next_local
when using sequences.
if a bunch of tables are prelocked, when a table is actually
needed in the routine, open_tables picks one table out of the
prelocked list with a smallest "distance". Distance is simply
a difference between the actual table lock and the requested
table lock. Say, if the prelocked set contains both t1 write-locked
and t1 read-locked, than an UPDATE will prefer write-locked t1
and SELECT will prefer read-locked. if there's only write-locked
table in the set, both UPDATE and SELECT will use it.
this doesn't distingush between UPDATE and INSERT, but INSERT
marks tables with tables->for_insert_data=1, which causes
prelocking to invoke add_internal_tables() and prepare sequences
for execution.
in this bug there were two prelocked t1's, one for INSERT (with
for_insert_data=1) and one for UPDATE.
INSERT picks the second (they both are write-locked, so the distance is
the same), its sequence is not prepared and crashes.
Let's add for_insert_data as the lowest bit into the distance.
Variant 2: "Don't call optimize_stage2 too early"
The affected query had Split-Materialized derived table in another
derived table:
select * -- select#1
from (
select * -- select#2
from t1,
(select * from t2 ... group by t2.group_col) DT -- select#3
where t1.col=t2.group_col
) TBL;
The optimization went as follows:
JOIN::optimize() calls select_lex->handle_derived(DT_OPTIMIZE) which
calls JOIN::optimize() for all (direct and indirect) children SELECTs.
select#1->handle_derived() calls JOIN::optimize() for select#2 and #3;
select#2->handle_derived() calls JOIN::optimize() for select#3 the second
time. That call would call JOIN::optimize_stage2(), assuming the query
plan choice has been finalized for select#3.
But after that, JOIN::optimize() for select#2 would continue and consider
Split-Materialized for select#3. This would attempt to pick another join
order and cause a crash.
The fix is to have JOIN::optimize() not call optimize_stage2() ever.
Finalizing the query plan choice is now done by a separate call:
select_lex->handle_derived(thd->lex, DT_OPTIMIZE_STAGE2)
which invokes JOIN::optimize_stage2() and saves the query plan.
Problem was that thd->lex->m_sql_cmd is not always set
especially when user has not provided ENGINE=xxx so
requesting option_storage_engine_name from there is not
safe.
Fixed by accessing thd->lex->m_sql_cmd only when user
has used ENGINE= and if not using ha_default_handlerton
and requesting engine name after it.
The Debian library dependencies are auto detected and populated
as part of shlibs:Depends and the explicit population
is suspectable to distro packaging changes.
Also fix the server gssapi plugin to the compiled server
dependency consistent with other server plugins.
In Query_log_event::do_apply_event there is comparison is used
character set in event same as cached charset and if not used
charset is changed. Unfortunately, it was done only if thread is
replica thread.
Fixed by adding condition for Galera applier thread so that
comparison is done leading to charset change if event
had different charset.
When all values in an indexed column are NULL, EITS statistics show
avg_frequency == 0. This commit adds logic to distinguish between
"no statistics available" and "all values are NULL" scenarios.
For NULL-rejecting conditions (e.g., t1.col = t2.col), when statistics
confirm all indexed values are NULL, the optimizer can now return a
very low cardinality estimate (1.0) instead of unknown (0.0), since
NULL = NULL never matches.
For non-NULL-rejecting conditions (e.g., t1.col <=> t2.col),
normal cardinality estimation continues to apply since matches are possible.
Changes:
- Added KEY::rec_per_key_null_aware() to check nulls_ratio from column
statistics when avg_frequency is 0
- Modified best_access_path() in sql_select.cc to use the new
rec_per_key_null_aware() method for ref access cost estimation
- The optimization works with single-column and composite indexes,
checking each key part's NULL-rejecting status via notnull_part bitmap
they're very fragile by nature, but let's at least move them
into one file with msan/embedded/ps/32-bit all disabled to
have the memory usage more predictable.
And remove these restrictions from other test files.
With PermissionsStartOnly deprecated, remove this from the
systemd service file.
Replace Debian's ExecStartPre "install -d" with a tmpfile
configuration directive creating the directory with this.
Debian's ExecStartPost of the mariadb upgrade uses the !
special executable prefix added in systemd v231 to use
root privs.
After moving the systemd service to using environment files
instead of `systemctl set-environment` in 11.6 (MDEV-19210),
they (wsrep-new-cluster and wsrep-start-position) are located
in /var/lib/mysql along with the socket file in
Fedora/RHEL-based distros. This causes them to have incorrect
selinux permissions and therefore be not readable by systemd.
A solution is to generate these files in the run directory,
instead, which already has correct selinux label mysqld_var_run_t
mysql-selinux-1.0.12. Dissociating these files and the socket
in CMake configs can also prove useful for other things.
This also corrects some of the duplicate code in the build
scripts and made INSTALL_RUNDATADIR into a proper location
and used this for the tmpfiles where the temporary files
are created.
Debian's location is /run/mysqld/ matching its INSTALL_UNIX_ADDRDIR,
which is now a temporary location controlled by tmpfiles.
MDEV-35904 - backport MDEV-19210 to 10.11 as referenced
by unset environment variables become warnings.
We used to run `systemctl set-environment` to pass
_WSREP_START_POSITION. This is bad because:
* it clutter systemd's environment (yes, pid 1)
* it requires root privileges
* options (like LimitNOFILE=) are not applied
Let's just create an environment file in ExecStartPre=, that is read
before ExecStart= kicks in. We have _WSREP_START_POSITION around for the
main process without any downsides.
(Preparation for the main patch)
set_statistics_for_table() incorrectly treated indexes with all NULL
values the same as indexes with no statistics, because avg_frequency
is 0 in both cases. This caused the optimizer to ignore valid EITS
data and fall back to engine statistics.
Additionally, KEY::actual_rec_per_key() would fall back to engine
statistics even when EITS was available, and used incorrect pointer
comparison (rec_per_key == 0 instead of nullptr).
Fix by adding Index_statistics::stats_were_read flag to track per-index
whether statistics were actually read from persistent tables, and
restructuring actual_rec_per_key() to prioritize EITS when available.
Fix my_print_help() to print "No currently supported values".
It used to print the first value even if it was hidden.
(cherry picked from commit 8664461e80)
cannot add new warnings in 11.8 anymore:
* remove ER_WARN_DEFAULT_SYNTAX
* use ER_VARIABLE_IGNORED instead
* change the wording in it to be more generic
* simplified new_mode warning-printing code
(cherry picked from commit 399edc7c62)
@@new_mode is a set of flags to control introduced features.
Flags are by default off. Setting a flag in @@new_mode will introduce
a new different server behaviour and/or set of features.
We also introduce a new option 'hidden_values' into some system variable
types to hide options that we do not wish to show as options.
- Don't print hidden values in mysqld --help output.
- Make get_options() use the same logic as check_new_mode_value() does.
- Setting @@new_mode=ALL shouldn't give warnings.
(cherry picked from commit f7387cb13d)
Wrong result is produced when split-materialized optimization is used for
grouping with order by and limit.
The fix is to not let Split-Materialized optimization to happen
when the sub-query has an ORDER BY with LIMIT, by returning FALSE early
in the method opt_split.cc#check_for_splittable_materialized()
However, with just the above change, there is a side-effect of
NOT "using index for group by" in the scenario when
all the following conditions are met: -
1. The query has derived table with GROUP BY and ORDER BY LIMIT
2. joined in a way that would allow Split-Materialized
if ORDER BY LIMIT wasn't present
3. An index suitable for using "index for group-by"
4. No where clause so that, "using for group by" is applicable,
but the index is not included in "possible_keys".
The reason being, join_tab's "keys" field wasn't being set in
sql_select.cc#make_join_select(). So, made this change as well
as part of this PR.
The "FETCH FIRST n ROWS WITH TIES" was not enforced when the SELECT used
"using index for group-by".
This was caused by an optimization which removed the ORDER BY clause
when the GROUP BY clause prescribed a compatible ordering.
Other GROUP BY strategies used workarounds to still handle WITH TIES,
see comment to using_with_ties_and_group_min_max() in this patch for
details. QUICK_GROUP_MIN_MAX_SELECT didn't have a workaround.
Fix this by disabling removal of ORDER BY when
QUICK_GROUP_MIN_MAX_SELECT is used.
Any InnoDB write workload is marking data pages in the buffer pool dirty.
To make the changes durable, it suffices to write to the write-ahead log
to facilitate crash recovery. The longer we keep pages dirty in the
buffer pool, the better, because the same pages could be modified again
and a single write to the file system could cover several changes.
Eventually, we must write out dirty pages, so that we can advance the
log checkpoint, that is, allow the start of the recovery log to be
discarded. (On a clean shutdown, a full checkpoint to the end of the log
will be made.)
A write workload can be bound by innodb_buffer_pool_size
(we must write out changes and evict data pages to make room for others)
or by innodb_log_file_size
(we must advance the log checkpoint before the tail of the circular
ib_logfile0 would overwrite the previous checkpoint).
In innodb_log_file_size bound workloads, we failed to set an optimal
target for the next checkpoint LSN. If we write out too few pages, then
all writer threads may occasionally be blocked in log_free_check() while
the buf_flush_page_cleaner() thread is resolving the situation. If we
write out too many pages, then the I/O subsystem will be loaded
unnecessarily and there will be some write amplification in case some of
the unnecessarily written pages would be modified soon afterwards.
log_close(): Return the target LSN for buf_flush_ahead(lsn),
bitwise-ORed with the "furious" flag, or the special values 0
indicating that no flushing is needed, which is the usual case.
log_checkpoint_margin(): Use a similar page checkpoint target as
log_close() for the !not_furious case.
mtr_flush_ahead(): A wrapper for buf_flush_ahead().
mtr_t::commit_log_release(): Make some common code non-inline in order
to reduce code duplication.
buf_flush_ahead(lsn, furious=false): Avoid an unnecessary wake-up of the
page cleaner if it is scheduled to wake up once per second.
Co-developed-by: Alessandro Vetere
Reviewed by: Vladislav Lesin
row_raw_format_str(): Treat the invalid value charset_coll==0 as binary.
This could be invoked on FTS_%_CONFIG.key or SYS_FOREIGN.ID
or possible other key columns.
dtype_is_utf8(): Merge to its only caller row_raw_format_str().
row_raw_format(): Add debug assertions and comments to document
when dtype_get_charset_coll(prtype) may be 0.
rtree_mbr_from_wkb::n_node: Change the data type from int to uint16_t
and pack it together with the added field key_len. In this way, an
increase of sizeof(rtree_mbr_from_wkb) will be avoided.
Problem:
========
- When an R-tree root page becomes full and requires splitting,
InnoDB follows a specific root-raising procedure to maintain
tree integrity. The process involves allocating a new page
(Page X) to hold the current root's content, preserving the
original root page number as the tree's entry point, and
migrating all existing records to Page X.
The root page is then cleared and reconstructed as an
internal node containing a single node pointer with an
MBR that encompasses all spatial objects on Page X.
Subsequently, InnoDB should split the records on Page X
into two spatially optimized groups using the
pick_seeds() and pick_next() algorithms,
creating a second page (Page Y) for Group B records
while retaining Group A records on Page X.
After records are redistributed between Page X and Page Y,
the recalculated MBR for Page X must remain within
or be smaller than the original MBR stored in the
root page's node pointer.
Bug scenario:
============
- When root page 4 becomes full, it triggers a split operation
where the content is copied to page 7 and root page 4 is cleared
to become an internal node.
- During the first split attempt on page 7, Group 1 overflows
and remaining entries are reassigned to Group 2.
- A new page 8 is created and the remaining entry record
is inserted, but the combined size of the remaining entry
record and new record exceeds the page size limit.
- This triggers a second split operation on page 7, where
Group 2 overflows again and entries are moved back to Group 1.
- When the new record is finally inserted into page 7,
it causes the MBR (Minimum Bounding Rectangle) for page 7
to expand beyond its original boundaries.
- Subsequently, when InnoDB attempts to update the parent
page 4 with the new MBR information, it fails to locate
the corresponding internal node, leading to spatial
index corruption and the reported failure.
Problem:
========
- Second split operation should happen on page 8, not on page 7.
- split_rtree_node() considers key_size to estimate
record sizes during the splitting algorithm, which fails to
account for variable-length fields in spatial records.
- In rtr_page_split_and_insert(), when reorganization
succeeds, InnoDB doesn't attempt the insert the entry
Solution:
========
rtr_page_split_and_insert(): InnoDB should do insert the
tuple when btr_page_reorganize() is successful.
rtr_page_split_and_insert(): Use the overflow page
for consecutive split operation.
split_rtree_node(): Store the record length for each
record in r-tree node. This should give proper
estimation while determining the group entries and
also helpful in overflow validation
This is a fixup of 41725b4cee where the
symlinks were incorrectly created using absolute paths.
Also fix ps protocol for spider/bugfix.mdev_29502,usual_handler
Analysis:
Maximum length of the field of the temorary table that cursor protocol uses
is not enough, which it actually gets when we prepare the
json_array_intersect() function and set the max length of the result.
So even though the entire result is sent, only partial is actually copied
because the field length is not enough.
Fix:
Have enough max_length.
Analysis:
While writing the view to .FRM file, we check the datatype of each column
and append the appropriate type to the string (which will be written to
the frm). This is where the conversion from JSON to longtext happens because
that is how it is stored internally.
Now, while SELECT, when the frm is read it has longtext instead of JSON
which also results in changing the handler type. Since the handler types
dont match, m_format_json becomes false for that specific column.
Now, when filling the values, since the format is not json, it does not
get added in the result. Hence the output is NULL.
Fix:
Before writing the view to the FRM file, check if the datatype for the
column is JSON (which means the m_format_json will be true). If it is JSON
append JSON.
buf_read_page_low(): Implement separate code paths for synchronous
and asynchronous read, similar to
commit b2c1ba820b (MDEV-37860),
and always invoke thd_wait_begin() and thd_wait_end()
the same number of times.
This fixes a regression that affects the non-default setting
thread_handling=pool-of-threads and had been introduced in
commit b1ab211dee (MDEV-15053).
Reviewed by: Vladislav Vaintroub
This prevents segv on NULL result_list->current->result on handler
calls that are not the first. Reasons supporting this change:
- spider_db_result::limit_mode is not called anywhere else. The only
calls to limit_mode method are on spider_db_conn
- it is very unlikely (impossible?) for the connection to change from
one backend to another between execution of sql statements and
storing result:
/* in spider_bg_conn_action: */
if (dbton_handler->execute_sql(
sql_type,
conn,
result_list->quick_mode,
&spider->need_mons[conn->link_idx])
// [... 9 lines elided]
if (!(result_list->bgs_error =
spider_db_store_result(spider, conn->link_idx,
result_list->table)))
this also means it is very unlikely (impossible?) for the backend
type (dbton_id) to differ between conn->db_conn and
result_list->current->result, also considering that
spider_db_result::dbton_id comes from spider_db_conn::dbton_id:
spider_db_result::spider_db_result(
SPIDER_DB_CONN *in_db_conn
) : db_conn(in_db_conn), dbton_id(in_db_conn->dbton_id)
Since this was the only call to spider_db_result::limit_mode, we also
remove the method altogether.
Ideally spider should throw an error when local and remote tables have
conflict definitions.
This fixes the test for --view-protocol and
--mysqld=--loose-spider-disable-group-by-handler
Multi-table UPDATE and DELETE statements employ mysql_select() calls
during their processing, so the server may try to instantiate
`select_handler` in an attempt to push down the statement to
a foreign engine.
However, the current implementation of `select_handler` for FederatedX
pushes down the whole query and not only its select part
(`thd->query()`, see `int ha_federatedx_select_handler::init_scan()`).
FederatedX engine does not support execution of DML statements
on the remote side, that is why the error occured.
Solution:
- Add an extra check to only allow SELECT statements pushdown
to FederatedX