The list=alldeletedrevisions has a adrtag parameter to show only revisions marked with given tag. The list=allrevisions should provide an arvtag parameter for the same purpose.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Add arvtag parameter to list=allrevisions API queries. | mediawiki/core | master | +11 -0 |
Event Timeline
If I'm not mistaken, this would be a reasonable GCI task. I would be willing to mentor.
Imported as https://codein.withgoogle.com/dashboard/tasks/5928597153906688/ -- but I do not need it published yet
Change 394757 had a related patch set uploaded (by Pppery; owner: Pppery):
[mediawiki/core@master] Add arvtag parameter to list=allrevisions API queries.
@TTO When asked on ticket about peformance profiling, I will normally anwer on phabricator as the editor there is quite bad. These are my findings for it:
SELECT /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_text_id,rev_timestamp,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_comment AS rev_comment_text,NULL AS rev_comment_data,NULL AS rev_comment_cid,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len FROM revision INNER JOIN page ON ((page_id = rev_page)) INNER JOIN change_tag ON ((rev_id=ct_rev_id)) WHERE ct_tag = 'Foo' ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11
The explain may look fine:
enwiki> EXPLAIN SELECT /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_text_id,rev_timestamp,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_comment AS rev_comment_text,NULL AS rev_comment_data,NULL AS rev_comment_cid,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len FROM revision INNER JOIN page ON ((page_id = rev_page)) INNER JOIN change_tag ON ((rev_id=ct_rev_id)) WHERE ct_tag = 'Foo' ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: revision type: index possible_keys: PRIMARY,page_timestamp,rev_page_id key: rev_timestamp key_len: 16 ref: NULL rows: 11 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: page type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: enwiki.revision.rev_page rows: 1 Extra: *************************** 3. row *************************** id: 1 select_type: SIMPLE table: change_tag type: ref possible_keys: ct_rev_id,change_tag_rev_tag,ct_tag,change_tag_tag_id key: ct_rev_id key_len: 262 ref: enwiki.revision.rev_id,const rows: 1 Extra: Using where; Using index 3 rows in set (0.15 sec)
There is no type:ALL (full table scan), but there is an index scan on revision, which is unlikely to finish within the 60 seconds of the query (and it should finish in 1 second or less):
When run:
root@dbstore1002[enwiki]> FLUSH STATUS; SELECT /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_text_id,rev_timestamp,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_comment AS rev_comment_text,NULL AS rev_comment_data,NULL AS rev_comment_cid,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len FROM revision INNER JOIN page ON ((page_id = rev_page)) INNER JOIN change_tag ON ((rev_id=ct_rev_id)) WHERE ct_tag = 'Foo' ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11; SHOW STATUS like 'Hand%'; Query OK, 0 rows affected (0.05 sec) ^CCtrl-C -- query killed. Continuing normally. ERROR 1317 (70100): Query execution was interrupted +----------------------------+--------+ | Variable_name | Value | +----------------------------+--------+ | Handler_commit | 0 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 0 | | Handler_icp_attempts | 0 | | Handler_icp_match | 0 | | Handler_mrr_init | 0 | | Handler_mrr_key_refills | 0 | | Handler_mrr_rowid_refills | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_key | 119842 | | Handler_read_last | 1 | | Handler_read_next | 0 | | Handler_read_prev | 59942 | | Handler_read_rnd | 0 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 0 | | Handler_rollback | 2 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_tmp_update | 0 | | Handler_tmp_write | 0 | | Handler_update | 0 | | Handler_write | 0 | +----------------------------+--------+ 25 rows in set (0.01 sec)
I had to kill the query because it would not finish- this is not something we can run on production- if we give some kind of option to the user of which query to select, and without guarantees that the findings will be uniform. The problem is the WHERE ct_tag = 'Foo' ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11 combined with the STRAIGHT_JOIN- this makes a full index scan of the revision table backwards- removing the STRAIGHT_JOIN should be beneficial (although I have not analyzed all cases).
Without straight join and the same parameter (needs more testing):
enwiki> EXPLAIN SELECT rev_id,rev_page,rev_text_id,rev_timestamp,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_comment AS rev_comment_text,NULL AS rev_comment_data,NULL AS rev_comment_cid,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len FROM revision INNER JOIN page ON ((page_id = rev_page)) INNER JOIN change_tag ON ((rev_id=ct_rev_id)) WHERE ct_tag = 'Foo' ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: change_tag type: ref possible_keys: ct_rev_id,change_tag_rev_tag,ct_tag,change_tag_tag_id key: ct_tag key_len: 257 ref: const rows: 1 Extra: Using where; Using index; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: revision type: eq_ref possible_keys: PRIMARY,page_timestamp,rev_page_id key: PRIMARY key_len: 4 ref: enwiki.change_tag.ct_rev_id rows: 1 Extra: Using where *************************** 3. row *************************** id: 1 select_type: SIMPLE table: page type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: enwiki.revision.rev_page rows: 1 Extra: 3 rows in set (0.00 sec) enwiki> FLUSH STATUS; SELECT rev_id,rev_page,rev_text_id,rev_timestamp,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_comment AS rev_comment_text,NULL AS rev_comment_data,NULL AS rev_comment_cid,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len FROM revision INNER JOIN page ON ((page_id = rev_page)) INNER JOIN change_tag ON ((rev_id=ct_rev_id)) WHERE ct_tag = 'Foo' ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11; SHOW STATUS like 'Hand%'; Query OK, 0 rows affected (0.00 sec) Empty set (0.00 sec) +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 2 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 0 | | Handler_icp_attempts | 0 | | Handler_icp_match | 0 | | Handler_mrr_init | 0 | | Handler_mrr_key_refills | 0 | | Handler_mrr_rowid_refills | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_key | 1 | | Handler_read_last | 0 | | Handler_read_next | 0 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 1 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_tmp_update | 0 | | Handler_tmp_write | 0 | | Handler_update | 0 | | Handler_write | 0 | +----------------------------+-------+ 25 rows in set (0.02 sec)
Change 394757 abandoned by Pppery:
Add arvtag parameter to list=allrevisions API queries.