[go: up one dir, main page]

Page MenuHomePhabricator

Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki
Closed, DeclinedPublicPRODUCTION ERROR

Description

MariaDB MARIADB db1071 dewiki > EXPLAIN SELECT /* ApiQueryRecentChanges::run */ rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>='20161024013525') AND rc_namespace IN ('0', '120') AND rc_type IN ('0', '1', '3', '6') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: recentchanges
         type: index
possible_keys: rc_timestamp,rc_namespace_title,rc_ns_usertext,tmp_3,rc_name_type_patrolled_timestamp
          key: rc_timestamp
      key_len: 16
          ref: NULL
         rows: 1239
        Extra: Using where
1 row in set (0.01 sec)

MariaDB MARIADB db1071 dewiki > FLUSH STATUS; pager cat > /dev/null; SELECT /* ApiQueryRecentChanges::run */ rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>='20161024013525') AND rc_namespace IN ('0', '120') AND rc_type IN ('0', '1', '3', '6') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101; nopager; SHOW STATUS like 'Hand%';
Query OK, 0 rows affected (0.00 sec)

PAGER set to 'cat > /dev/null'
101 rows in set (1 min 14.06 sec)

PAGER set to stdout
+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Handler_commit             | 1       |
| Handler_delete             | 0       |
| Handler_discover           | 0       |
| Handler_external_lock      | 0       |
| Handler_icp_attempts       | 0       |
| Handler_icp_match          | 0       |
| Handler_mrr_init           | 0       |
| Handler_mrr_key_refills    | 0       |
| Handler_mrr_rowid_refills  | 0       |
| Handler_prepare            | 0       |
| Handler_read_first         | 1       |
| Handler_read_key           | 0       |
| Handler_read_last          | 0       |
| Handler_read_next          | 3343340 |
| Handler_read_prev          | 0       |
| Handler_read_rnd           | 0       |
| Handler_read_rnd_deleted   | 0       |
| Handler_read_rnd_next      | 0       |
| Handler_rollback           | 0       |
| Handler_savepoint          | 0       |
| Handler_savepoint_rollback | 0       |
| Handler_tmp_update         | 0       |
| Handler_tmp_write          | 0       |
| Handler_update             | 0       |
| Handler_write              | 0       |
+----------------------------+---------+
25 rows in set (0.00 sec)

I believe this query is impossible to optimize no matter the index used- the reason is that it has 3 ranges and an order by, which a BTREE is not good for that. A rewrite should be considered (SELECT UNION?), or if it is not possible, disabling it entirely.

Event Timeline

Anomie subscribed.

Hmm. Yeah, theoretically it could realize that it could do this by combining multiple ranges from an appropriate index like we'd do manually with unions, but it doesn't seem like MySQL is likely to be smart enough to do that. It's doing the best it can now by using rc_timestamp and using the rest of the conditions to filter.

For this situation, it looks like we'd actually have to add another condition to the query to make it amenable to using a decent index at all: add AND rc_patrolled IN ('0','1') or else it'll still have to make bad choices instead of being able to directly use rc_name_type_patrolled_timestamp. Then we can decompose it manually into unions, which would look something like this:

(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='0') AND (rc_type='0') AND (rc_patrolled='0') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='0') AND (rc_type='0') AND (rc_patrolled='1') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='0') AND (rc_type='1') AND (rc_patrolled='0') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='0') AND (rc_type='1') AND (rc_patrolled='1') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='0') AND (rc_type='3') AND (rc_patrolled='0') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='0') AND (rc_type='3') AND (rc_patrolled='1') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='0') AND (rc_type='6') AND (rc_patrolled='0') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='0') AND (rc_type='6') AND (rc_patrolled='1') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='120') AND (rc_type='0') AND (rc_patrolled='0') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='120') AND (rc_type='0') AND (rc_patrolled='1') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='120') AND (rc_type='1') AND (rc_patrolled='0') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='120') AND (rc_type='1') AND (rc_patrolled='1') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='120') AND (rc_type='3') AND (rc_patrolled='0') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='120') AND (rc_type='3') AND (rc_patrolled='1') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='120') AND (rc_type='6') AND (rc_patrolled='0') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
UNION ALL
(SELECT rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>=20161024013525) AND (rc_namespace='120') AND (rc_type='6') AND (rc_patrolled='1') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101)
ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101;

The worst case on dewiki would be to wind up with 240 subqueries with 5001 rows each, if someone happened to specify 24 of 25 namespaces with all 5 types and both values for rc_patrolled. We could reduce that by adding more indexes, e.g. we could halve it by adding (rc_namespace, rc_type, rc_timestamp) for when rc_patrolled doesn't matter, and if it's common enough (rc_type, rc_timestamp) would really cut down the permutations when all namespaces are being queried. It's already the default to only query 4 of the 5 values for rc_type, so it might not be worth adding a (rc_namespace, rc_timestamp) index to target with the unionization unless it turns out people are overriding that default often enough.

Thoughts on that plan before I look at actually writing the code? In particular, should we add a (rc_namespace, rc_type, rc_timestamp) index and/or others?

Well, I would wait a bit before working on this, to check if we really want to support this kind of complex queries in the first place.

I am thinking of a more general approach (maybe?) which would be prohibit complex queries and document those limitations. "Error: query too complex- The query you asked for uses a too complex set of condition that are known to be slow- please simplify your query". This may sound stupid, but maybe something like that should be enabled for the largest wikis? I honestly do not know.

Your UNION query would take 1.28 sec cold, 0.01 second hot, much less than the 1 minute of the original one, and much more efficient.

+----------------------------+--------+
| Variable_name              | Value  |
+----------------------------+--------+
| Handler_commit             | 1      |
| Handler_delete             | 0      |
| Handler_discover           | 0      |
| Handler_external_lock      | 0      |
| Handler_icp_attempts       | 522954 |
| Handler_icp_match          | 373    |
| Handler_mrr_init           | 0      |
| Handler_mrr_key_refills    | 0      |
| Handler_mrr_rowid_refills  | 0      |
| Handler_prepare            | 0      |
| Handler_read_first         | 0      |
| Handler_read_key           | 16     |
| Handler_read_last          | 0      |
| Handler_read_next          | 361    |
| Handler_read_prev          | 0      |
| Handler_read_rnd           | 101    |
| Handler_read_rnd_deleted   | 0      |
| Handler_read_rnd_next      | 365    |
| Handler_rollback           | 0      |
| Handler_savepoint          | 0      |
| Handler_savepoint_rollback | 0      |
| Handler_tmp_update         | 0      |
| Handler_tmp_write          | 364    |
| Handler_update             | 0      |
| Handler_write              | 0      |
+----------------------------+--------+
25 rows in set (0.01 sec)

The problem I see here is maintainability and simplicity (?). I would need a product manager to take informed and consistent decisions on this, I do not know myself.

Regarding indexes, this works well, but I am open for new index creation as long as it is indeed useful to speed up queries.

Well, I would wait a bit before working on this, to check if we really want to support this kind of complex queries in the first place.

I am thinking of a more general approach (maybe?) which would be prohibit complex queries and document those limitations. "Error: query too complex- The query you asked for uses a too complex set of condition that are known to be slow- please simplify your query". This may sound stupid, but maybe something like that should be enabled for the largest wikis? I honestly do not know.

One trouble there is that making something that has worked for a long time suddenly stop working is liable to break a lot of API users' code. If we can reasonably avoid doing that, it would be better to avoid it.

A second trouble is that this particular example isn't really all that complex of a query: "I want to see changes since $TIME in either of two namespaces, that aren't 'external' (i.e. from Wikidata)". And the "that aren't 'external'" bit isn't even being explicitly requested, it's the default. It's not filtering by user, tag, flags (e.g. minor, bot, anon, redirect, patrolled), etc.

The problem I see here is maintainability and simplicity (?). I would need a product manager to take informed and consistent decisions on this, I do not know myself.

There is no product manager for the action API. For informed and consistent decisions for the action API people generally ask me.

@Tgr and @Legoktm most often give me code review, we can bring them in easily enough.

Regarding indexes, this works well, but I am open for new index creation as long as it is indeed useful to speed up queries.

In this case, it would be to reduce the number of subqueries in the union. I'd think the (rc_namespace, rc_type, rc_timestamp) index would be useful here as a better target for unionizing than the existing rc_name_type_patrolled_timestamp.

I'd think the (rc_namespace, rc_type, rc_timestamp) index would be useful here as a better target for unionizing than the existing rc_name_type_patrolled_timestamp.

There was some pending open index suggestion for recentchanges. We need to identify that (if it was the same or another index) and establish a plan for applying all pending design changes on the rc table. Changes on rc "should be easy".

This query asks for the recent content changes. That (separating content changes from changes in supplemental namespaces) is central to change patrolling, which is one of the main roles of Wikipedia editors. I doubt breaking the functionality is an option.

I guess you could make Database::select handle unions by adding some sort of placeholder tokens - $db->select( $table, $fields, [ 'rc_namespace' => $db->uniqueToken( 'namespace' ), 'rc_type' => $db->uniqueToken( 'type' ) ], __METHOD__, [ 'UNION ALL' => [ 'namespace' => $namespaces, 'type' => $types ] ] ); which would then iterate through all namespace/type combinations, string-replace the tokens, and join the queries with UNION ALL. It's mildly disgusting but reasonably maintainable.

Then again, a simple FORCE INDEX (rc_name_type_patrolled_timestamp) brings down the execution time to nearly the same level (3s cold / 0.25s warm) so I's just go with that.

I wouldn't bother trying to make Database:select somehow magically do it, I'd just build the subqueries with Database::selectSQLText() and combine them with Database::unionQueries(). Chances are I'd only do the unionization when $wgMiserMode is active and $dbr->unionSupportsOrderAndLimit() is true (i.e. not sqlite or mssql).

FORCE INDEX (rc_name_type_patrolled_timestamp) is going to fetch all matching rows and filesort them; while that works ok-ish for this query with only around 50000 matching rows on dewiki, it tends not to scale all that well when a slightly altered query gets into millions of rows.

Change 359501 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] ApiQueryRecentChanges: Make better use of the rc_name_type_patrolled_timestamp index

https://gerrit.wikimedia.org/r/359501

Perhaps. If so, this task seems the more useful of the two. The discussion on T101502 seems focused around the index forcing which was removed with rMW2f1cfcf306bb: API: Remove index forcing in ApiQueryRecentChanges.

Could this be another case of MariaDB getting the optimizer fixed with a new version as it doesn't happen on 10.1.36 or 10.1.37 for the original query?

root@db1070.eqiad.wmnet[dewiki]> EXPLAIN SELECT /* ApiQueryRecentChanges::run */ rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>='20161024013525') AND rc_namespace IN ('0', '120') AND rc_type IN ('0', '1', '3', '6') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: recentchanges
         type: range
possible_keys: rc_timestamp,rc_ns_usertext,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_namespace_title_timestamp
          key: rc_timestamp
      key_len: 16
          ref: NULL
         rows: 518658
        Extra: Using index condition; Using where
1 row in set (0.00 sec)

root@db1070.eqiad.wmnet[dewiki]> FLUSH STATUS; pager cat > /dev/null; SELECT /* ApiQueryRecentChanges::run */ rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>='20161024013525') AND rc_namespace IN ('0', '120') AND rc_type IN ('0', '1', '3', '6') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101; nopager; SHOW STATUS like 'Hand%';
Query OK, 0 rows affected (0.00 sec)

PAGER set to 'cat > /dev/null'
101 rows in set (0.01 sec)

PAGER set to stdout
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 219   |
| Handler_icp_match          | 219   |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 185   |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.05 sec)

root@db1070.eqiad.wmnet[dewiki]> select @@version;
+-----------------+
| @@version       |
+-----------------+
| 10.1.36-MariaDB |
+-----------------+
1 row in set (0.00 sec)
root@db1100.eqiad.wmnet[dewiki]> EXPLAIN SELECT /* ApiQueryRecentChanges::run */ rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>='20161024013525') AND rc_namespace IN ('0', '120') AND rc_type IN ('0', '1', '3', '6') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: recentchanges
         type: range
possible_keys: rc_timestamp,rc_ns_usertext,tmp_3,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_namespace_title_timestamp
          key: rc_timestamp
      key_len: 16
          ref: NULL
         rows: 568680
        Extra: Using index condition; Using where
1 row in set (0.01 sec)

root@db1100.eqiad.wmnet[dewiki]> FLUSH STATUS; pager cat > /dev/null; SELECT /* ApiQueryRecentChanges::run */ rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>='20161024013525') AND rc_namespace IN ('0', '120') AND rc_type IN ('0', '1', '3', '6') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101; nopager; SHOW STATUS like 'Hand%';
Query OK, 0 rows affected (0.01 sec)

PAGER set to 'cat > /dev/null'
101 rows in set (0.00 sec)

PAGER set to stdout
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 215   |
| Handler_icp_match          | 215   |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 193   |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.10 sec)

root@db1100.eqiad.wmnet[dewiki]> select @@version;
+-----------------+
| @@version       |
+-----------------+
| 10.1.37-MariaDB |
+-----------------+
1 row in set (0.00 sec)
EvanProdromou subscribed.

We're watching this problem, but it sounds like MariaDB upgrade made it less important.

Could this be another case of MariaDB getting the optimizer fixed with a new version as it doesn't happen on 10.1.36 or 10.1.37 for the original query?

Since the recentchanges table has rows older than 30 days continuously deleted, the old query probably isn't representative anymore.

We didn't check at the time, but maybe the problem query was hitting an anomalous stretch of RC entries, like a flood of RC entries from Wikibase (rc_type = 5) or for pages in other namespaces.

It looks like this Wikimedia-production-error task does not have an error associated with it. Is this hitting an execution timeout exception, or DBPerformance warning?

Looking at Logstash, I find several instances of the following. Would that be the issue described here?

  • Request ID: XIcztApAMEEAAKFH25UAAACT
  • Request Method: GET.
  • Request URL: wikidatawiki - /w/api.php?action=query&list=recentchanges&rcshow=redirect&rcnamespace=0&rclimit=500&format=json
channel:DBPerformance - severity:WARNING
message:
  Expectation (readQueryTime <= 5) by ApiMain::setRequestExpectations not met (actual: 33.72279214859):

query:
  SELECT rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid,page_is_redirect FROM `recentchanges` LEFT JOIN `page` ON ((rc_namespace=page_namespace) AND (rc_title=page_title)) WHERE rc_namespace = 'X') AND (pag [TRX..

I don't think that one is exactly the same, as that joins with page and the above didn't. I think we need a bit of a higher level overview of query performance, grouping by query digest, api start point and timing to get a better understanding of the current state and prioritization. Looking at a 24-hour cycle on enwiki, worst types of queries are (in worst to "best" order):

  • Variations of ApiQueryRevisions::run
  • CategoryMembershipChangeJob::run
  • SpecialRecentChangesLinked::doMainQuery
  • IndexPager::buildQueryInfo (contributions page unfiltered)

There are some instances of SpecialRecentChanges::doMainQuery for 40+ seconds, but they do not look anything like the above query. I am ok with making this ticket declined (much has changed in 3 years) and open others instead if necessary.

What would you think of creating a #mediawiki-query-performance label to track these overtime so there are no useless tickets over time like this?

It looks like this Wikimedia-production-error task does not have an error associated with it. Is this hitting an execution timeout exception, or DBPerformance warning?

Chances are it's hitting one of the execution timeouts now (either MW level or DB query-killer). I don't know of those limits existed back in 2016 when this was filed.

Looking at Logstash, I find several instances of the following. Would that be the issue described here?

No. Your query is clearly problematic: it can either scan recentchanges, join on page, and filter for the small fraction that have page_is_redirect = 1; or it can find all the rows from page with page_namespace = 0 AND page_is_redirect = 1, scan for recentchanges all entries with matching namespace+title, and filesort. The original query in this task is not so clear and could probably be fixed with better use of indexes.

Although it does seem that this query is choosing the latter option, while the former would be faster (at least currently). If you want to follow up on that, I'd suggest filing a different task.

SELECT rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid,page_is_redirect FROM `recentchanges` LEFT JOIN `page` ON ((rc_namespace=page_namespace) AND (rc_title=page_title)) WHERE rc_namespace = 'X') AND (pag [TRX..

It's not terribly useful to not have the whole query. The ORDER BY and LIMIT are often important for understanding the query behavior. For that matter, it's also not terribly useful to have values replaces with "X".

At least in this case you included the request URL and it happened to be a GET. From that I can determine that your query is probably

SELECT rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid,page_is_redirect FROM `recentchanges` LEFT JOIN `page` ON ((rc_namespace=page_namespace) AND (rc_title=page_title)) WHERE (rc_namespace = '0') AND (page_is_redirect = 1) ORDER BY rc_timestamp DESC, rc_id DESC LIMIT 501;

For the record, I included all the information that Logstash had available for this query. Unfortunately, the channel it came from only has a sanitized and cropped version of the query available.

[mediawiki/core@master] ApiQueryRecentChanges: Make better use of the rc_name_type_patrolled_timestamp index
https://gerrit.wikimedia.org/r/359501

Looks like this is still awaiting code review.

Krinkle triaged this task as Medium priority.Jun 24 2019, 11:23 PM

@daniel @Reedy @Legoktm @jcrespo @Marostegui are all marked as potential reviewers, would any of you be able to pick this up so it can get closed out?

Change 359501 abandoned by Anomie:
ApiQueryRecentChanges: Make better use of the rc_name_type_patrolled_timestamp index

Reason:
As mentioned, there's no way to test this and MariaDB may not be so limited now. Let's just drop this until it occurs again.

https://gerrit.wikimedia.org/r/359501

I'm going to go ahead and just close this. Stuff happened 3 years ago, we didn't think of everything to test at the time, and now we have no way to reproduce it to see if MariaDB is now smarter about things than it was back in 2016.

If it happens again, we can reopen and then we'll have a test case to use with a modern version of MariaDB.

@Anomie I support strongly to close old tasks like this that can no longer be reproduced, too much in the code and infrastructure changes to be relevant later in an open status. I belive both mw improvements, mysql status e.g. (ANALYIZE + reboot, SSDs) improvements and mariadb code improvement may have left many of these obsolete.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:11 PM