Commit Graph

77 Commits

Author SHA1 Message Date
Sergei Petrunia
72c0ba43b2 Code cleanup part #1 2022-01-19 18:10:09 +03:00
Sergei Petrunia
f76e310ace Rename histogram_type=JSON to JSON_HB 2022-01-19 18:10:09 +03:00
Michael Okoko
bff65a813e Implement point selectivity for JSON histograms
* Also merges tests relating to JSON statistics into one file

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
547f805311 Refactor histogram point selectivity
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
63cbd0748b replace range_selectivity methods for Histograms and add tests
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
c129689ddc Use binary search to compute range selectivity
* it also adds an "explain select" statement to the test so that the fprintf calls
  can print the computed intervals to mysqld.1.err

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
69f24c238e Use generic Histogram_base class for Histogram_builders
This fixes the wrong calculation for avg_frequency in json histograms
by replacing the specific histogram objects with the generic Histogram_base class.

It also restores get/set size functions as they were useful in calculating fields
for binary histogram.

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Sergei Petrunia
21e0f5487f MDEV-21130: Histograms: use JSON as on-disk format
A demo of how to use in-memory data structure for histogram.
The patch shows how to
* convert string form of data to binary form
* compare two values in binary form
* compute a fraction for val in [X, Y] range.

grep for GSOC-TODO for notes.
2022-01-19 18:10:08 +03:00
Michael Okoko
fe2e516a50 inform test result of zero hist_size for json histogram
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
bf4d0dcfe2 implement parse and serialize for histogram json 2022-01-19 18:10:08 +03:00
Michael Okoko
9bba595528 remove unneeded shared methods
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
1fa7af749e Split histogram classes and into JSON and binary classes
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Sergei Petrunia
1998b787ac MDEV-21130: Histograms: use JSON as on-disk format
Preparation for handling different kinds of histograms:

- In Column_statistics, change "Histogram histogram" into
  "Histogram *histogram_".  This allows for different kinds
  of Histogram classes with virtual functions.

- [Almost] remove the usage of Histogram->set_values and
  Histogram->set_size. The code outside the histogram should
  not make any assumptions about what/how is stored in the Histogram.

- Introduce drafts of methods to read/save histograms to/from disk.
2022-01-19 18:10:08 +03:00
Michael Okoko
9954aecc2b Store bucket bounds and extend test cases for JSON histogram
This fixes the memory allocation for json histogram builder and add more column types for testing.
Some challenges at the moment include:
* Garbage value at the end of JSON array still persists.
* Garbage value also gets appended to bucket values if the column is a primary key.
* There's a memory leak resulting in a "Warning: Memory not freed" message at the end of tests.

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:07 +03:00
Michael Okoko
2aca7b0c33 Prepare JSON as valid histogram_type
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:07 +03:00
Sergei Golubchik
e841957416 Merge branch '10.3' into 10.4 2021-02-23 09:25:57 +01:00
Sergei Golubchik
0ab1e3914c Merge branch '10.2' into 10.3 2021-02-22 22:42:27 +01:00
Varun Gupta
a461e4d306 MDEV-19474: Histogram statistics are used even with optimizer_use_condition_selectivity=3
The issue here was histogram statistics were being used even when
the level of optimizer_use_condition_selectivity doesn't allow
usage of statistics from histogram.

The histogram statistics are read for a table only when
optimizer_use_condition_selectivity > 3. But the TABLE structure can be
stored in the internal table cache and be reused for the next query.
So in this case the histogram statistics will be available for the next query.

The fix would be to make sure to use the histogram statistics only when
optimizer_use_condition_selectivity > 3.
2021-02-16 11:53:13 +05:30
Marko Mäkelä
4b959bd8df Merge 10.3 into 10.4 2020-07-20 15:34:59 +03:00
Marko Mäkelä
acc58fd835 Merge 10.2 into 10.3 2020-07-20 15:11:59 +03:00
Marko Mäkelä
ca9276e37e Merge 10.1 into 10.2 2020-07-20 14:53:24 +03:00
Varun Gupta
dfdfeecb03 MDEV-22851: Engine independent index statistics are incorrect for large tables on Windows
An oveflow was happening on windows because on Windows sizeof(ulong) is 4 bytes
while it is 8 bytes on Linux.
Switched avg_frequency and avg length for column statistics to ulonglong.
Switched avg_frequency for index statistics to ulonglong.
2020-07-15 11:27:32 +05:30
Marko Mäkelä
8059148154 Merge 10.3 into 10.4 2020-06-03 07:32:09 +03:00
Marko Mäkelä
8300f639a1 Merge 10.2 into 10.3 2020-06-02 10:25:11 +03:00
Marko Mäkelä
d72eebaa3d Merge 10.1 into 10.2 2020-06-01 09:33:03 +03:00
Sergey Vojtovich
c279878493 Thread safe histograms loading
Previously multiple threads were allowed to load histograms concurrently.
There were no known problems caused by this. But given amount of data
races in this code, it'd happen sooner or later.

To avoid scalability bottleneck, histograms loading is protected by
per-TABLE_SHARE atomic variable.

Whenever histograms were loaded by preceding statement (hot-path), a
scalable load-acquire check is performed.

Whenever histograms have to be loaded anew, mutual exclusion for loaders
is established by atomic variable. If histograms are being loaded
concurrently, statement waits until load is completed.

- Table_statistics::total_hist_size moved to TABLE_STATISTICS_CB: only
  meaningful within TABLE_SHARE (not used for collected stats).
- TABLE_STATISTICS_CB::histograms_can_be_read and
  TABLE_STATISTICS_CB::histograms_are_read are replaced with a tri state
  atomic variable.
- Simplified away alloc_histograms_for_table_share().

Note: there's still likely a data race if a thread attempts accessing
histograms data after it failed to load it (because of concurrent load).
It was there previously and goes out of the scope of this effort. One way
of fixing it could be reviving TABLE::histograms_are_read and adding
appropriate checks whenever it is needed.

Part of MDEV-19061 - table_share used for reading statistical tables is
                     not protected
2020-05-29 21:53:54 +04:00
Marko Mäkelä
c11e5cdd12 Merge 10.3 into 10.4 2019-10-10 11:19:25 +03:00
Marko Mäkelä
892378fb9d Merge 10.2 into 10.3 2019-10-09 13:25:11 +03:00
Marko Mäkelä
24232ec12c Merge 10.1 into 10.2 2019-10-09 08:30:23 +03:00
Sergey Vojtovich
adefaeffcc MDEV-19536 - Server crash or ASAN heap-use-after-free in is_temporary_table /
read_statistics_for_tables_if_needed

Regression after 279a907, read_statistics_for_tables_if_needed() was
called after open_normal_and_derived_tables() failure.

Fixed by moving read_statistics_for_tables() call to a branch of
get_schema_stat_record() where result of open_normal_and_derived_tables()
is checked.

Removed THD::force_read_stats, added read_statistics_for_tables() instead.
Simplified away statistics_for_command_is_needed().
2019-10-07 13:30:22 +04:00
Sergey Vojtovich
e43791d4dc Cleanup EITS
Moved EITS allocation inside read_statistics_for_tables_if_needed().
Removed redundant is_safe argument.
2019-10-02 15:23:59 +04:00
Oleksandr Byelkin
c07325f932 Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
Marko Mäkelä
be85d3e61b Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru
cb248f8806 Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
Marko Mäkelä
c64265f3f9 Merge 10.3 into 10.4 2018-12-12 14:09:48 +02:00
Marko Mäkelä
839cf16bb2 Merge 10.2 into 10.3 2018-12-12 13:46:06 +02:00
Marko Mäkelä
db1210f939 Merge 10.1 into 10.2 2018-12-12 12:13:43 +02:00
Marko Mäkelä
f77f8f6d1a Merge 10.0 into 10.1 2018-12-12 10:48:53 +02:00
Varun Gupta
9207a838ed MDEV-17255: New optimizer defaults and ANALYZE TABLE
Added to new values to the server variable use_stat_tables.
The values are COMPLEMENTARY_FOR_QUERIES and PREFERABLY_FOR_QUERIES.
Both these values don't allow to collect EITS for queries like
    analyze table t1;
To collect EITS we would need to use the syntax with persistent like
   analyze table t1 persistent for columns (col1,col2...) index (idx1, idx2...) / ALL

Changing the default value from NEVER to PREFERABLY_FOR_QUERIES.
2018-12-09 13:25:27 +05:30
Varun Gupta
4886d14827 MDEV-17032: Estimates are higher for partitions of a table with @@use_stat_tables= PREFERABLY
The problem here is EITS statistics does not calculate statistics for the partitions of the table.
So a temporary solution would be to not read EITS statistics for partitioned tables.

Also disabling reading of EITS for columns that participate in the partition list of a table.
2018-12-07 19:59:45 +05:30
Sergei Golubchik
57e0da50bb Merge branch '10.2' into 10.3 2018-09-28 16:37:06 +02:00
Oleksandr Byelkin
28f08d3753 Merge branch '10.1' into 10.2 2018-09-14 08:47:22 +02:00
Oleksandr Byelkin
31081593aa Merge branch '11.0' into 10.1 2018-09-06 22:45:19 +02:00
Varun Gupta
a9c09c95bd MDEV-15306: Wrong/Unexpected result with the value optimizer_use_condition_selectivity set to 4
Currently for selectivity calculation we perform range analysis for a column even when we don't have any statistics(EITS).
This makes less sense but is used to catch contradiction for WHERE condition.

So the solution is to not perform range analysis for selectivity calculation for columns that do not have statistics.
2018-08-29 02:17:37 +05:30
Marko Mäkelä
7830fb7f45 Merge 10.2 into 10.3 2018-08-28 12:22:56 +03:00
Igor Babaev
4eac5df3fc MDEV-16934 Query with very large IN clause lists runs slowly
This patch introduces support for the system variable eq_range_index_dive_limit
that existed in MySQL starting from 5.6. The variable sets a limit for
index dives into equality ranges. Index dives are performed by optimizer
to estimate the number of rows in range scans. Index dives usually provide
good estimate but they are pretty expensive. To estimate the number of rows
in equality ranges statistical data on indexes can be employed. Its usage gives
not so good estimates but it's cheap. So if the number of equality dives
required by an index scan exceeds the set limit no dives for equality
ranges are performed by the optimizer for this index.

As the new system variable is introduced in a stable version the default
value for it is set to a special value meaning there is no limit for the number
of index dives performed by the optimizer.

The patch partially uses the MySQL code for WL 5957
'Statistics-based Range optimization for many ranges'.
2018-08-17 14:28:39 -07:00
Marko Mäkelä
05459706f2 Merge 10.2 into 10.3 2018-08-03 15:57:23 +03:00
Igor Babaev
095dc81158 MDEV-16757 Memory leak after adding manually min/max statistical data
for blob column

ANALYZE TABLE <table> does not collect statistical data on min/max values
for BLOB columns of <table>. However these values can be added into
mysql.column_stats manually by executing proper statements.
Unfortunately this led to a memory leak because the memory allocated
for these values was never freed.
This patch provides the server with a function to free memory allocated
for min/max statistical values of BLOB types.

Temporarily changed the test case until MDEV-16711 is fixed as without
this fix the test case for MDEV-16757 did not fail only for 10.0.
2018-07-15 16:24:24 -07:00
Igor Babaev
c89bb15c31 MDEV-16757 Memory leak after adding manually min/max statistical data
for blob column

ANALYZE TABLE <table> does not collect statistical data on min/max values
for BLOB columns of <table>. However these values can be added into
mysql.column_stats manually by executing proper statements.
Unfortunately this led to a memory leak because the memory allocated
for these values was never freed.
This patch provides the server with a function to free memory allocated
for min/max statistical values of BLOB types.
2018-07-13 17:48:45 -07:00