[go: up one dir, main page]

Page MenuHomePhabricator

Adjust the field type of globalblocks timestamp columns to fixed binary on wmf wikis
Closed, ResolvedPublic

Description

Apply T306747 to wmf wikis

  1. ALTERs to run: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GlobalBlocking/+/785376/1/sql/mysql/patch-globalblocks-timestamps.sql
  2. Where to run those changes: centralauth database only
  3. When to run those changes: any time
  4. If the schema change is backwards compatible: Yes
  5. If the schema change has been tested already on some of the test/beta wikis: beta cluster is running with the new schema
  6. if the data should be made available on the labs replicas and/or dumps: no change of the existing rules

This makes gb_timestamp a fixed length column and removes default of empty string from gb_expiry

Progress

  • dbstore1008
  • db2220
  • db2218 master
  • db2208
  • db2200
  • db2198
  • db2187
  • db2182
  • db2168
  • db2159
  • db2150
  • db2122
  • db2121
  • db2098
  • db1236 master
  • db1227
  • db1202
  • db1194
  • db1191
  • db1181
  • db1174
  • db1171
  • db1170
  • db1158 sanitarium master
  • db1155
  • clouddb1021
  • clouddb1018
  • clouddb1014

Event Timeline

I guess:

Where to run those changes: everywhere

Should be:

Where to run those changes: centralauth

Am I right?

Yes, it is a special table, exists only once per "farm".

Marostegui claimed this task.

I ran this directly on the master with replication.

This has caused a massive outage and had to be reverted:

root@db1181.eqiad.wmnet[centralauth]> ALTER TABLE   globalblocks CHANGE  gb_timestamp gb_timestamp varchar(14) NOT NULL, CHANGE  gb_expiry gb_expiry VARBINARY(14) DEFAULT '' NOT NULL;
Query OK, 110397 rows affected (1.129 sec)
Records: 110397  Duplicates: 0  Warnings: 0

Checking the optimizer output for this query:

SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`WHERE (gb_range_start  LIKE '5B85%' ESCAPE '`' ) AND (gb_range_start <= '5B85B2D2') AND (gb_range_end >= '5B85B2D2') AND (gb_expiry > '20220505012805');

The result is the same on the original schema and the schema after the schema change. Still investigating

Mentioned in SAL (#wikimedia-operations) [2022-05-05T10:30:10Z] <marostegui> Alter globalblocks on db1127 T307501

Altered db1127 and pooled it with very low weight to see if there's some slowness there

db1127 is now fully repooled, and I am not seeing any locks or high latency for those queries (or any). Still checking things

As another test I have altered db1101:3317 without depooling it (to sort of simulate what happened earlier today when the change arrived thru replication). The change itself took around 8 seconds (so the table was locked for 8 seconds). There was no trace of pile ups on the host.
However, during those 8 seconds, there were 8 reports of this query taking 5 seconds to run:

SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`    WHERE (gb_range_start  LIKE 'v6-2%' ESCAPE '`' ) AND (gb_range_start <= 'v6-2A02C207204105070000000000000099') AND (gb_range_end >= 'v6-2A02C207204105070000000000000099') AND (gb_expiry > '20220505113815')

That explain is:

root@db1101.eqiad.wmnet[centralauth]> explain SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`    WHERE (gb_range_start  LIKE 'v6-2%' ESCAPE '`' ) AND (gb_range_start <= 'v6-2A02C207204105070000000000000099') AND (gb_range_end >= 'v6-2A02C207204105070000000000000099') AND (gb_expiry > '20220505113815')  ;
+------+-------------+--------------+-------+--------------------+----------+---------+------+------+------------------------------------+
| id   | select_type | table        | type  | possible_keys      | key      | key_len | ref  | rows | Extra                              |
+------+-------------+--------------+-------+--------------------+----------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | globalblocks | range | gb_range,gb_expiry | gb_range | 74      | NULL | 3552 | Using index condition; Using where |
+------+-------------+--------------+-------+--------------------+----------+---------+------+------+------------------------------------+
1 row in set (0.001 sec)

If I revert the change the query explain is a lot better:

root@db1101.eqiad.wmnet[centralauth]> explain SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`    WHERE (gb_range_start  LIKE 'v6-2%' ESCAPE '`' ) AND (gb_range_start <= 'v6-2A02C207204105070000000000000099') AND (gb_range_end >= 'v6-2A02C207204105070000000000000099') AND (gb_expiry > '20220505113815')  ;
+------+-------------+--------------+-------+--------------------+-----------+---------+------+------+------------------------------------+
| id   | select_type | table        | type  | possible_keys      | key       | key_len | ref  | rows | Extra                              |
+------+-------------+--------------+-------+--------------------+-----------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | globalblocks | range | gb_range,gb_expiry | gb_expiry | 16      | NULL | 1    | Using index condition; Using where |
+------+-------------+--------------+-------+--------------------+-----------+---------+------+------+------------------------------------+
1 row in set (0.000 sec)

So if this table is queried a lot, then I can see the slow down across all the hosts of s7.

Reverted the schema change on db1127 too (as the query plan changes there as well).

This looks like a classic optimizer bug, when sometimes it picks the right index and sometimes it doesn't:

Altering the host:

root@db1101.eqiad.wmnet[centralauth]> ALTER TABLE  /*_*/globalblocks
    -> CHANGE  gb_timestamp gb_timestamp BINARY(14) NOT NULL,
    -> CHANGE  gb_expiry gb_expiry VARBINARY(14) NOT NULL;
Query OK, 110271 rows affected (13.890 sec)
Records: 110271  Duplicates: 0  Warnings: 0

root@db1101.eqiad.wmnet[centralauth]> SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`    WHERE (gb_range_start  LIKE 'v6-2%' ESCAPE '`' ) AND (gb_range_start <= 'v6-2A02C207204105070000000000000099') AND (gb_range_end >= 'v6-2A02C207204105070000000000000099') AND (gb_expiry > '20220505113815')  ;

<snip>
1 row in set (6.787 sec)

And this is the optimizer analyze where we can see it picking gb_range

| {
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 1.067,
    "table": {
      "table_name": "globalblocks",
      "access_type": "range",
      "possible_keys": ["gb_range", "gb_expiry"],
      "key": "gb_range",
      "key_length": "74",
      "used_key_parts": ["gb_range_start", "gb_range_end"],
      "r_loops": 1,
      "rows": 3552,
      "r_rows": 1,
      "r_total_time_ms": 1.0456,
      "filtered": 50,
      "r_filtered": 100,
      "index_condition": "globalblocks.gb_range_start like 'v6-2%' escape '`' and globalblocks.gb_range_start <= 'v6-2A02C207204105070000000000000099' and globalblocks.gb_range_end >= 'v6-2A02C207204105070000000000000099'",
      "attached_condition": "globalblocks.gb_expiry > '20220505113815'"
    }
  }
} |

And now let's revert and the query is back being super fast:

root@db1101.eqiad.wmnet[centralauth]> ALTER TABLE   globalblocks CHANGE  gb_timestamp gb_timestamp varchar(14) NOT NULL, CHANGE  gb_expiry gb_expiry VARBINARY(14) DEFAULT '' NOT NULL;
Query OK, 110271 rows affected (8.233 sec)
Records: 110271  Duplicates: 0  Warnings: 0

root@db1101.eqiad.wmnet[centralauth]> SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`    WHERE (gb_range_start  LIKE 'v6-2%' ESCAPE '`' ) AND (gb_range_start <= 'v6-2A02C207204105070000000000000099') AND (gb_range_end >= 'v6-2A02C207204105070000000000000099') AND (gb_expiry > '20220505113815')  ;
<snip>

However, the optimizer keeps saying it would pick gb_range and it does, but the query now is super fast:

  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 0.8109,
    "table": {
      "table_name": "globalblocks",
      "access_type": "range",
      "possible_keys": ["gb_range", "gb_expiry"],
      "key": "gb_range",
      "key_length": "74",
      "used_key_parts": ["gb_range_start", "gb_range_end"],
      "r_loops": 1,
      "rows": 3552,
      "r_rows": 1,
      "r_total_time_ms": 0.7956,
      "filtered": 50,
      "r_filtered": 100,
      "index_condition": "globalblocks.gb_range_start like 'v6-2%' escape '`' and globalblocks.gb_range_start <= 'v6-2A02C207204105070000000000000099' and globalblocks.gb_range_end >= 'v6-2A02C207204105070000000000000099'",
      "attached_condition": "globalblocks.gb_expiry > '20220505113815'"
    }
  }
} |
"analyzing_range_alternatives": {
  "range_scan_alternatives": [
    {
      "index": "gb_range",
      "ranges": [
        "(v6-2,v6-2A02C207204105070000000000000099) <= (gb_range_start,gb_range_end) <= (v6-2A02C207204105070000000000000099)"
      ],
      "rowid_ordered": false,
      "using_mrr": false,
      "index_only": false,
      "rows": 3552,
      "cost": 4466.6,
      "chosen": true
    },
Marostegui raised the priority of this task from Low to Medium.May 6 2022, 1:09 PM

I don't think we should attempt this schema change again. There's not much benefit and it is very risky. We can try to see if the optimizer behaves better on 10.6 once we migrate to it (T301879). Once I have a testing host in s7 I can try to give it a go, and if it still behaves weirdly, we should probably decline this task.

Just pasting this for the record here with one other test I have done on db2108 (after the alter table):
Before:

Variable_name	Value
Handler_commit	1
Handler_delete	0
Handler_discover	0
Handler_external_lock	0
Handler_icp_attempts	2225
Handler_icp_match	1
Handler_mrr_init	0
Handler_mrr_key_refills	0
Handler_mrr_rowid_refills	0
Handler_prepare	0
Handler_read_first	0
Handler_read_key	1
Handler_read_last	0
Handler_read_next	1
Handler_read_prev	0
Handler_read_retry	0
Handler_read_rnd	0
Handler_read_rnd_deleted	0
Handler_read_rnd_next	0
Handler_rollback	0
Handler_savepoint	0
Handler_savepoint_rollback	0
Handler_tmp_delete	0
Handler_tmp_update	0
Handler_tmp_write	0
Handler_update	0
Handler_write	0

After the schema change:

Variable_name	Value
Handler_commit	1
Handler_delete	0
Handler_discover	0
Handler_external_lock	0
Handler_icp_attempts	99552
Handler_icp_match	99552
Handler_mrr_init	0
Handler_mrr_key_refills	0
Handler_mrr_rowid_refills	0
Handler_prepare	0
Handler_read_first	0
Handler_read_key	23
Handler_read_last	0
Handler_read_next	99552
Handler_read_prev	0
Handler_read_retry	0
Handler_read_rnd	0
Handler_read_rnd_deleted	0
Handler_read_rnd_next	0
Handler_rollback	0
Handler_savepoint	0
Handler_savepoint_rollback	0
Handler_tmp_delete	0
Handler_tmp_update	0
Handler_tmp_write	0
Handler_update	0
Handler_write	0

I have left db2108 with the original table definition, that is, before this schema change.

And now we run it again:

Variable_name	Value
Handler_commit	1
Handler_delete	0
Handler_discover	0
Handler_external_lock	0
Handler_icp_attempts	2225
Handler_icp_match	1
Handler_mrr_init	0
Handler_mrr_key_refills	0
Handler_mrr_rowid_refills	0
Handler_prepare	0
Handler_read_first	0
Handler_read_key	23
Handler_read_last	0
Handler_read_next	1
Handler_read_prev	0
Handler_read_retry	0
Handler_read_rnd	0
Handler_read_rnd_deleted	0
Handler_read_rnd_next	0
Handler_rollback	0
Handler_savepoint	0
Handler_savepoint_rollback	0
Handler_tmp_delete	0
Handler_tmp_update	0
Handler_tmp_write	0
Handler_update	0
Handler_write	0

Mentioned in SAL (#wikimedia-operations) [2022-05-12T05:34:45Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2122 T307501', diff saved to https://phabricator.wikimedia.org/P27792 and previous config saved to /var/cache/conftool/dbconfig/20220512-053444-marostegui.json

I have migrated db2122 to 10.6 and it looks like the query, after the alter table, goes fast and the optimizer behaves correctly:

mysql:root@localhost [centralauth]> ALTER TABLE  /*_*/globalblocks
    -> CHANGE  gb_timestamp gb_timestamp BINARY(14) NOT NULL,
    -> CHANGE  gb_expiry gb_expiry VARBINARY(14) NOT NULL;
Query OK, 97872 rows affected (7.521 sec)
Records: 97872  Duplicates: 0  Warnings: 0

mysql:root@localhost [centralauth]> SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`    WHERE (gb_range_start  LIKE 'v6-2%' ESCAPE '`' ) AND (gb_range_start <= 'v6-2A02C207204105070000000000000099') AND (gb_range_end >= 'v6-2A02C207204105070000000000000099') AND (gb_expiry > '20220505113815')  ;

<snip>
1 row in set (0.020 sec)



mysql:root@localhost [centralauth]>  SHOW STATUS like 'Hand%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 2224  |
| Handler_icp_match          | 1     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 1     |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_delete         | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
27 rows in set (0.001 sec)

I will double check this again with an eqiad host to make sure this was not just pure luck

Mentioned in SAL (#wikimedia-operations) [2022-05-12T05:41:38Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Repool db2122 T307501', diff saved to https://phabricator.wikimedia.org/P27793 and previous config saved to /var/cache/conftool/dbconfig/20220512-054138-marostegui.json

And of course, I alter db1127 before migrating to 10.6 and now the optimizer goes well...

mysql:root@localhost [centralauth]> FLUSH STATUS; pager cat > /dev/null; SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`    WHERE (gb_range_start  LIKE 'v6-2%' ESCAPE '`' ) AND (gb_range_start <= 'v6-2A02C207204105070000000000000099') AND (gb_range_end >= 'v6-2A02C207204105070000000000000099') AND (gb_expiry > '20220505113815')  ; nopager; SHOW STATUS like 'Hand%';
Query OK, 0 rows affected (0.000 sec)

PAGER set to 'cat > /dev/null'
1 row in set (0.001 sec)

PAGER set to stdout
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 2224  |
| Handler_icp_match          | 1     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 1     |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_delete         | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
27 rows in set (0.000 sec)

The migration to 10.6 at least, keeps the optimizer working fine.

Reverted the alter table on db1127 and db2140 to leave those hosts consistent with the rest.

Marostegui changed the task status from Open to Stalled.May 27 2022, 12:40 PM

Going to stall this until we are on 10.6

Marostegui changed the task status from Stalled to Open.May 17 2024, 6:21 AM
Marostegui moved this task from Blocked to In progress on the DBA board.

So s7 is fully on 10.6, and I've tried again on db2122:

cumin2024@db2122.codfw.wmnet[centralauth]> ALTER TABLE   globalblocks CHANGE  gb_timestamp gb_timestamp varchar(14) NOT NULL, CHANGE  gb_expiry gb_expiry VARBINARY(14) DEFAULT '' NOT NULL;
Query OK, 0 rows affected (0.036 sec)
Records: 0  Duplicates: 0  Warnings: 0

cumin2024@db2122.codfw.wmnet[centralauth]> explain SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`    WHERE (gb_range_start  LIKE 'v6-2%' ESCAPE '`' ) AND (gb_range_start <= 'v6-2A02C207204105070000000000000099') AND (gb_range_end >= 'v6-2A02C207204105070000000000000099') AND (gb_expiry > '20220505113815')  ;
+------+-------------+--------------+-------+--------------------+----------+---------+------+------+------------------------------------+
| id   | select_type | table        | type  | possible_keys      | key      | key_len | ref  | rows | Extra                              |
+------+-------------+--------------+-------+--------------------+----------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | globalblocks | range | gb_range,gb_expiry | gb_range | 74      | NULL | 4502 | Using index condition; Using where |
+------+-------------+--------------+-------+--------------------+----------+---------+------+------+------------------------------------+
1 row in set (0.031 sec)

cumin2024@db2122.codfw.wmnet[centralauth]> SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`    WHERE (gb_range_start  LIKE 'v6-2%' ESCAPE '`' ) AND (gb_range_start <= 'v6-2A02C207204105070000000000000099') AND (gb_range_end >= 'v6-2A02C207204105070000000000000099') AND (gb_expiry > '20220505113815')  ;
+---------+--------------------------+---------------------+------------+------------------------------------------------------------------------------------------------+----------------+--------------+----------------+-------------------------------------+-------------->
| gb_id   | gb_address               | gb_by               | gb_by_wiki | gb_reason                                                                                      | gb_timestamp   | gb_anon_only | gb_expiry      | gb_range_start                      | gb_range_end >
+---------+--------------------------+---------------------+------------+------------------------------------------------------------------------------------------------+----------------+--------------+----------------+-------------------------------------+-------------->
| 3083101 | 2A02:C204:0:0:0:0:0:0/30 | AntiCompositeNumber | metawiki   | [[m:NOP|Open proxy/Webhost]]: Visit the [[m:WM:OP/H|FAQ]] if you are affected <!-- Contabo --> | 20220818144435 |            0 | 20250418144435 | v6-2A02C204000000000000000000000000 | v6-2A02C207FF>
+---------+--------------------------+---------------------+------------+------------------------------------------------------------------------------------------------+----------------+--------------+----------------+-------------------------------------+-------------->
1 row in set (0.034 sec)

The optimizer looks fine now:

cumin2024@db2122.codfw.wmnet[centralauth]>  FLUSH STATUS; pager cat > /dev/null; SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`    WHERE (gb_range_start  LIKE 'v6-2%' ESCAPE '`' ) AND (gb_range_start <= 'v6-2A02C207204105070000000000000099') AND (gb_range_end >= 'v6-2A02C207204105070000000000000099') AND (gb_expiry > '20220505113815')  ; nopager; SHOW STATUS like 'Hand%';
Query OK, 0 rows affected (0.032 sec)

PAGER set to 'cat > /dev/null'
1 row in set (0.033 sec)

PAGER set to stdout
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 2375  |
| Handler_icp_match          | 1     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 1     |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_delete         | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
27 rows in set (0.032 sec)

I should probably try some other hosts to see if this is just coincidence or 10.6 has really fixed this issue.

Enabled slow query log in db2122 to capture queries that take longer than 10 seconds.

I've not see any slow queries logged on globalblocks table for now. I am going to alter this on a few more hosts and wait a little bit longer again.

db2150 looking okay too after the change:

cumin2024@db2150.codfw.wmnet[centralauth]> FLUSH STATUS; pager cat > /dev/null; SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`    WHERE (gb_range_start  LIKE 'v6-2%' ESCAPE '`' ) AND (gb_range_start <= 'v6-2A02C207204105070000000000000099') AND (gb_range_end >= 'v6-2A02C207204105070000000000000099') AND (gb_expiry > '20220505113815')  ; nopager; SHOW STATUS like 'Hand%';
Query OK, 0 rows affected (0.032 sec)

PAGER set to 'cat > /dev/null'
1 row in set (0.032 sec)

PAGER set to stdout
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 2382  |
| Handler_icp_match          | 1     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 1     |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_delete         | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
27 rows in set (0.032 sec)

One more test in db1191:

cumin2024@db1191.eqiad.wmnet[centralauth]> FLUSH STATUS; pager cat > /dev/null; SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`    WHERE (gb_range_start  LIKE 'v6-2%' ESCAPE '`' ) AND (gb_range_start <= 'v6-2A02C207204105070000000000000099') AND (gb_range_end >= 'v6-2A02C207204105070000000000000099') AND (gb_expiry > '20220505113815')  ; nopager; SHOW STATUS like 'Hand%';
Query OK, 0 rows affected (0.002 sec)

PAGER set to 'cat > /dev/null'
1 row in set (0.002 sec)

PAGER set to stdout
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 2395  |
| Handler_icp_match          | 1     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 1     |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_delete         | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
27 rows in set (0.002 sec)

cumin2024@db1191.eqiad.wmnet[centralauth]> ALTER TABLE  /*_*/globalblocks
    -> CHANGE  gb_timestamp gb_timestamp BINARY(14) NOT NULL,
    -> CHANGE  gb_expiry gb_expiry VARBINARY(14) NOT NULL;
Query OK, 60660 rows affected (2.343 sec)
Records: 60660  Duplicates: 0  Warnings: 0

cumin2024@db1191.eqiad.wmnet[centralauth]> FLUSH STATUS; pager cat > /dev/null; SELECT /* MediaWiki\Extension\GlobalBlocking\GlobalBlocking::getGlobalBlockingBlock  */  gb_id,gb_address,gb_by,gb_by_wiki,gb_reason,gb_timestamp,gb_anon_only,gb_expiry,gb_range_start,gb_range_end  FROM `globalblocks`    WHERE (gb_range_start  LIKE 'v6-2%' ESCAPE '`' ) AND (gb_range_start <= 'v6-2A02C207204105070000000000000099') AND (gb_range_end >= 'v6-2A02C207204105070000000000000099') AND (gb_expiry > '20220505113815')  ; nopager; SHOW STATUS like 'Hand%';
Query OK, 0 rows affected (0.002 sec)

PAGER set to 'cat > /dev/null'
1 row in set (0.002 sec)

PAGER set to stdout
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 2395  |
| Handler_icp_match          | 1     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 1     |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_delete         | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
27 rows in set (0.002 sec)

As part of T365338: MariaDB 10.6.18 released I have installed 10.6.18 and double checked that the query doesn't generate a full scan on db1174 - all good.

There has been no issues so far, so I am going to go ahead and deploy this in codfw for now.

Mentioned in SAL (#wikimedia-operations) [2024-05-27T07:21:16Z] <marostegui> Deploy schema change on s7 codfw dbmaint T307501

So far no issues in codfw. I am going to wait another day to go ahead in eqiad.

For the record (in case we need to revert again - doesn't look like it will be needed):

CREATE TABLE `globalblocks` (
  `gb_id` int(11) NOT NULL AUTO_INCREMENT,
  `gb_address` varbinary(255) NOT NULL DEFAULT '',
  `gb_by` varbinary(255) NOT NULL DEFAULT '',
  `gb_by_wiki` varbinary(255) NOT NULL DEFAULT '',
  `gb_reason` varbinary(767) NOT NULL,
  `gb_timestamp` varbinary(14) NOT NULL,
  `gb_anon_only` tinyint(1) NOT NULL DEFAULT 0,
  `gb_expiry` varbinary(14) NOT NULL DEFAULT '',
  `gb_range_start` varbinary(35) NOT NULL,
  `gb_range_end` varbinary(35) NOT NULL,
  `gb_by_central_id` int(10) unsigned NOT NULL,
  `gb_target_central_id` int(10) unsigned NOT NULL DEFAULT 0,
  PRIMARY KEY (`gb_id`),
  UNIQUE KEY `gb_address` (`gb_address`,`gb_anon_only`),
  KEY `gb_range` (`gb_range_start`,`gb_range_end`),
  KEY `gb_timestamp` (`gb_timestamp`),
  KEY `gb_expiry` (`gb_expiry`),
  KEY `gb_target_central_id` (`gb_target_central_id`)
) ENGINE=InnoDB AUTO_INCREMENT=3246904 DEFAULT CHARSET=binary

Mentioned in SAL (#wikimedia-operations) [2024-05-29T09:12:05Z] <marostegui> Deploy schema change on s7 eqiad dbmaint T307501

Marostegui updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)

After 2 years, this task is completed!