[go: up one dir, main page]

Page MenuHomePhabricator

Missing index on revision_userindex.rev_actor
Closed, ResolvedPublic

Description

Querying by rev_actor is very slow on the Toolforge replicas.

SELECT * FROM `enwiki_p`.`revision_userindex`
WHERE rev_actor = 7528
LIMIT 1

(times out)

Meanwhile archive seems to have the necessary indexing:

SELECT * FROM `enwiki_p`.`archive_userindex`
WHERE ar_actor = 7528
LIMIT 1

(0.01 secs)

I assume this was an oversight. We can of course use rev_user, which does have indexing, but this would only work for registered users.

Event Timeline

@Anomie I don't know if you can help with this? It is currently a blocker for migrating my tools to use the actor storage.

Revision has to go through a temp table (revision_actor_temp), which does change things a fair bit. That is intended to be a temporary situation (T215466). Let me see what indexes that table has.

So, apparently, the revision query scans the entire revision table first.

+------+-------------+-----------------------+------+------------------------+---------+---------+------------------------+-----------+--------------------------+
| id   | select_type | table                 | type | possible_keys          | key     | key_len | ref                    | rows      | Extra                    |
+------+-------------+-----------------------+------+------------------------+---------+---------+------------------------+-----------+--------------------------+
|    1 | SIMPLE      | revision              | ALL  | NULL                   | NULL    | NULL    | NULL                   | 771531338 | Using where              |
|    1 | SIMPLE      | revision_comment_temp | ref  | PRIMARY,revcomment_rev | PRIMARY | 4       | enwiki.revision.rev_id |         1 | Using index              |
|    1 | SIMPLE      | revision_actor_temp   | ref  | PRIMARY,revactor_rev   | PRIMARY | 4       | enwiki.revision.rev_id |         1 | Using where; Using index |
+------+-------------+-----------------------+------+------------------------+---------+---------+------------------------+-----------+--------------------------+

Since the other one is a far more simple search, and it has an index that is peculiar to the replicas, it isn't a good comparison. That said, I might be able to add an index on the replicas for this. Looking a bit more.

I don't think using coalesce in the view changes anything, btw. The issue is the underlying table indexes or not. This is scanning the entire underlying revision table, which is bound to time out no matter what.

The upstream table doesn't seem to have an index on that field in general. That seems to suggest that running a where on that isn't a good idea? However, there is a usertext_timestamp index. The reason it is fast on the archive_userindex table is because it is hitting an index that I've added to the replicas named ar_actor_deleted. It is hit because the it checks the deleted field, which is used by the userindex view by default. That appears to be the pattern on these userindex views. That all makes sense. I'd normally just follow the pattnern and add it but...

That said, it is a tad complicated by the joins. The column in question isn't actually *in* that table. It's only in the view because of the temp table. You could use the temp table and join directly off it, which might work better, until the temp table goes away or stops being used. Does that sound correct @Anomie?

Change 510595 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/puppet@production] wiki replicas: Remove reference to old user fields

https://gerrit.wikimedia.org/r/510595

The coalesce is indeed causing it to not be able to use the index. In general, the planner can't know how the output of a function depends on the input columns in order to be able to use the input columns for indexing. For COALESCE() it probably could special-case it, but it doesn't look like it does.

From Wikimedia production, the query equivalent to the WMCS view looks more or less like

wikiadmin@10.64.16.12(enwiki)> explain SELECT rev_id, rev_deleted, revcomment_comment_id, revactor_actor FROM revision JOIN revision_comment_temp ON(revcomment_rev = rev_id) LEFT JOIN revision_actor_temp ON(revactor_rev = rev_id) WHERE (rev_deleted & 4) = 0 AND COALESCE(revactor_actor,0) = 7528 LIMIT 1;
+------+-------------+-----------------------+------+------------------------+---------+---------+------------------------+-----------+--------------------------+
| id   | select_type | table                 | type | possible_keys          | key     | key_len | ref                    | rows      | Extra                    |
+------+-------------+-----------------------+------+------------------------+---------+---------+------------------------+-----------+--------------------------+
|    1 | SIMPLE      | revision              | ALL  | PRIMARY                | NULL    | NULL    | NULL                   | 783589133 | Using where              |
|    1 | SIMPLE      | revision_actor_temp   | ref  | PRIMARY,revactor_rev   | PRIMARY | 4       | enwiki.revision.rev_id |         1 | Using where; Using index |
|    1 | SIMPLE      | revision_comment_temp | ref  | PRIMARY,revcomment_rev | PRIMARY | 4       | enwiki.revision.rev_id |         1 | Using index              |
+------+-------------+-----------------------+------+------------------------+---------+---------+------------------------+-----------+--------------------------+

If we get rid of the COALESCE(), note how it changes to use the actor_timestamp index and no longer has an "ALL" in the type:

wikiadmin@10.64.16.12(enwiki)> explain SELECT rev_id, rev_deleted, revcomment_comment_id, revactor_actor FROM revision JOIN revision_comment_temp ON(revcomment_rev = rev_id) LEFT JOIN revision_actor_temp ON(revactor_rev = rev_id) WHERE (rev_deleted & 4) = 0 AND revactor_actor = 7528 LIMIT 1;
+------+-------------+-----------------------+--------+--------------------------------------+-----------------+---------+-----------------------------------------+--------+-------------+
| id   | select_type | table                 | type   | possible_keys                        | key             | key_len | ref                                     | rows   | Extra       |
+------+-------------+-----------------------+--------+--------------------------------------+-----------------+---------+-----------------------------------------+--------+-------------+
|    1 | SIMPLE      | revision_actor_temp   | ref    | PRIMARY,revactor_rev,actor_timestamp | actor_timestamp | 8       | const                                   | 229536 | Using index |
|    1 | SIMPLE      | revision              | eq_ref | PRIMARY                              | PRIMARY         | 4       | enwiki.revision_actor_temp.revactor_rev |      1 | Using where |
|    1 | SIMPLE      | revision_comment_temp | ref    | PRIMARY,revcomment_rev               | PRIMARY         | 4       | enwiki.revision_actor_temp.revactor_rev |      1 | Using index |
+------+-------------+-----------------------+--------+--------------------------------------+-----------------+---------+-----------------------------------------+--------+-------------+

The upstream table doesn't seem to have an index on that field in general. That seems to suggest that running a where on that isn't a good idea?
[...] The reason it is fast on the archive_userindex table is because it is hitting an index that I've added to the replicas named ar_actor_deleted.

In MediaWiki, the archive table has an index ar_actor_timestamp that would be used for the query. The ar_actor_deleted index would likely be better for the WMCS replica query where the view adds the (ar_deleted & 4) = 0 condition, though.

As for revision, as you said it's more complicated. If we get rid of the coalesce it should work for this query, but then users could in theory get NULL back when reading the column now[1] and 0 eventually in the future. If that's ok, we should probably do that.

For other possible queries we might want to additionally change the JOIN condition to ON( revactor_rev = rev_id AND revactor_page = rev_page AND revactor_timestamp = rev_timestamp) so MariaDB can know it's allowed to propagate accesses to rev_page and/or rev_timestamp to use the indexes on the corresponding columns in revision_actor_temp as well. Although that might fail for rev_timestamp for the same reason that approach failed in an early patchset on gerrit:505814.

[1]: In practice, we've already backfilled everything so they should never actually see NULL.

You could use the temp table and join directly off it, which might work better, until the temp table goes away or stops being used. Does that sound correct @Anomie?

That would be good, but note the temp table isn't currently on the replicas (possibly related to T212972#5171045).

That would be good, but note the temp table isn't currently on the replicas (possibly related to T212972#5171045).

Do you mean the temp table isn't on the views or the replicas themselves? I don't believe we ever defined a view for the temp tables at all, come to think of it. I can do that if folks want to join off them.

That said, since we've got the backfill, and a lot of the safeguards like that are coming off, I think it'd be worth it to remove the coalesce. I can take a look at the join conditions as well. If I'm quick, I might be able to piggy back such changes on the view updates for Monday.

Aaaand after more checking, the temp table was a very old artifact of the errors I had in some view definitions, just like you suggested @Anomie. That is confirmed fixed on the replicas I've been able to deploy on and will be everywhere as soon as I can get it deployed past some DB maintenance.

Change 510965 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: remove the coalesce guard around revactor_actor

https://gerrit.wikimedia.org/r/510965

Change 510965 merged by Bstorm:
[operations/puppet@production] wiki replicas: remove the coalesce guard around revactor_actor

https://gerrit.wikimedia.org/r/510965

I should be deploying the changes above to all replicas starting tomorrow if all goes well.

This is deployed across the replicas. Please test things and let me know if it is all good.

This is deployed across the replicas. Please test things and let me know if it is all good.

It works wonderfully! Thank you! :)

My web tool is now fine, the performance is OK. Thank you! :-)

Bstorm claimed this task.

I'll close this one for now.

@Bstorm While it no longer times out, it seems in some cases using rev_actor is still considerably slower than rev_user:

SELECT 'rev_latest' AS `key`, rev_id AS `id`, rev_timestamp AS `timestamp`
FROM `enwiki_p`.`revision_userindex`
WHERE rev_user = 11656865
ORDER BY rev_timestamp DESC LIMIT 1

(0.00 sec)

versus

SELECT 'rev_latest' AS `key`, rev_id AS `id`, rev_timestamp AS `timestamp`
FROM `enwiki_p`.`revision_userindex`
WHERE rev_actor = 21308
ORDER BY rev_timestamp DESC LIMIT 1

(1.62 sec)

Is there anything we can do to improve performance? Maybe there's a better way to query for this same information (get the latest revision by a user)? Also pinging @Anomie in case you have some ideas.

For reference, here are the query plans I see:

Using rev_user:

select_typetabletypepossible_keyskeykey_lenrefrowsExtra
SIMPLErevisionrefuser_timestampuser_timestamp4const32904Using where; Using filesort

Using rev_actor:

select_typetabletypepossible_keyskeykey_lenrefrowsExtra
SIMPLErevision_actor_temprefPRIMARY, revactor_rev, actor_timestampactor_timestamp8const32792Using index; Using temporary; Using filesort
SIMPLErevisioneq_refPRIMARYPRIMARY4enwiki.revision_actor_temp.revactor_rev1Using where

Maybe it's the revision_actor_temp that's slowing it down? As the name suggests that table won't be there forever, so maybe without it things will be faster.

It is interesting that the rev_user query is so much faster even though it scans more rows.

I did notice that when doing the same thing for logging (grab latest log entry by a user), there is no noticeable difference in performance with log_user versus log_actor. There, the query plan for log_actor shows logging is used directly instead of a temporary table such as logging_actor_temp, so perhaps indeed it's just revision_actor_temp that's slowing it down for revisions.

Change 511910 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/puppet@production] wiki replicas: Improve index usage for queries against revision_userindex

https://gerrit.wikimedia.org/r/511910

There are more fixes that @Anomie baked into the next patch set that will be deployed with the field drops. I tried not to be too ambitious in what I deployed for this ticket because I know the DB maintenance would block quick rollbacks. It is possible that will be improved by some of that.

That said, the temp table is going to make things interesting for a bit, I'm sure.

Ah! Also, that patch might help. I'll get testing it in review and hopefully the maintenance is over or will be soon.

https://gerrit.wikimedia.org/r/#/q/Iba2db780fb4ef6fbc1d5812d4d3dd5ed44675311 sounds good to me, if it works! Thank you. As Anomie says it may slow down tools that are still using rev_user and rev_user_text, but they still won't break (yet), so maybe the performance impact will push the maintainers to update their tools before the June 3 deadline.

If you could, give us (at least Community Tech) a ping before https://gerrit.wikimedia.org/r/#/q/Iba2db780fb4ef6fbc1d5812d4d3dd5ed44675311 goes live? Then we'll push what we have for the actor migration to prod, and hopefully the slow-query interim period will be at a minimum.

Since we're doing things here again, I should probably re-open it.

Change 511910 merged by Bstorm:
[operations/puppet@production] wiki replicas: Improve index usage for queries against revision_userindex

https://gerrit.wikimedia.org/r/511910

With that merged, I'll coordinate with the DBAs to find a time to deploy it.

@Bstorm Any updates on when the new indexing might be deployed?

Working on getting them out today, actually, if all goes well.

Change 512981 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: depool labsdb1010

https://gerrit.wikimedia.org/r/512981

Change 512981 merged by Bstorm:
[operations/puppet@production] wikireplicas: depool labsdb1010

https://gerrit.wikimedia.org/r/512981

Change 513002 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: depool labsdb1011

https://gerrit.wikimedia.org/r/513002

Change 513002 merged by Bstorm:
[operations/puppet@production] wikireplicas: depool labsdb1011

https://gerrit.wikimedia.org/r/513002

Mentioned in SAL (#wikimedia-operations) [2019-05-28T23:06:23Z] <bstorm_> T221339 depooled labsdb1011 and updated views

Mentioned in SAL (#wikimedia-operations) [2019-05-28T23:17:02Z] <bstorm_> T221339 completed view updates on labsdb1009 without depooling

@Bstorm Thanks! So far everything is going swimmingly. Tools that are still filter revision by timestamp and rev_user/rev_user_text are now really slow and/or timing out, but we knew that was going to happen. Filtering by rev_actor meanwhile is quite fast, possibly faster than it was before the actor migration!

Cool! Ok, now I'll close this, and hopefully we'll be pretty on-track to strip out the fields on Monday :)

Change 510595 merged by Bstorm:
[operations/puppet@production] wiki replicas: Remove reference to old user fields

https://gerrit.wikimedia.org/r/510595