[go: up one dir, main page]

Page MenuHomePhabricator

phantom redirects lingering in incategory searches after page moves
Closed, ResolvedPublic8 Estimated Story PointsBUG REPORT

Description

What happens?: Recently, when I perform the regular and essential task of doing an incategory search of https://en.wikipedia.org/wiki/Category:Living_people to search for draftspace or userspace pages that should not be filed in mainspace categories, the search finds a number of phantom redirects where the page in fact isn't in draftspace at all, but rather has already been moved into mainspace -- but the resulting draftspace redirect doesn't have any categories on it, and if you manually eyeball the individual categories that it's in they don't actually show the draftspace titles as being filed in them

For example, today's run of the search at https://en.wikipedia.org/w/index.php?search=incategory%3A%22Living_people%22&title=Special:Search&profile=advanced&fulltext=1&ns2=1&ns3=1&ns118=1&ns119=1 displays the title https://en.wikipedia.org/w/index.php?title=Draft:Kadek_Dimas_Satria&redirect=no. Note that the draft title doesn't have categories in it, and if you look at any of the categories that are on the target page https://en.wikipedia.org/wiki/Kadek_Dimas_Satria, they do not list the draftspace redirect as being in them -- but if you go back and perform an incategory search on each of those categories to look for draftspace pages, the incategory search does still say that Draft:Kadek Dimas Satria is in each and every one of them.

The issue invariably results from cases where an editor applied categories to the page while it was still in draft, and then moved the page into mainspace after adding the categories. So far, the only solution I have found that works to clear the phantom redirects out of the incategory search is to actually move the page back into draftspace, and wrap the categories in the "draft categories" wrapper; this would finally cause the page to drop from the incategory search, following which I could then move the page back into mainspace again and unwrap the categories in mainspace, and the redirect would not then return to the incategory search again. Nothing else has successfully cleared the redirects from the search: null-editing the draftspace redirect didn't work, deleting and then restoring the draftspace redirect didn't work, adding the redirects to a maintenance holding category didn't work.

I never, ever saw even one case of this ever happening at all before February 2023. A couple of weeks ago, for the first time ever, there was one standalone case of it which looked like an isolated problem at the time, but then after I resolved the problem by redoing the page move it did not recur again until March 1 -- at which point it suddenly became an epidemic, with eighteen phantom redirects turning up so far just in the past three days alone. I had already corrected the seven instances I found on Wednesday and Thursday, but with eleven more of them today, I'm at the end of my patience with it.

This also is not just normal lag in the job queue, as real drafts which are in the category inappropriately, and have the category removed or disabled accordingly, still successfully drop from the search results within less than one minute.

What should have happened instead?: Obviously, draftspace redirects that don't have categories on them should not be showing up in incategory searches of those categories if they aren't actually in the categories. It's absolutely essential that I be able to do a clean incategory search on Living people -- with over one million articles in that category, searching for draft and userpages manually isn't a feasible alternative at all, so I need to be able to do an incategory search on that category without having it polluted by pages that aren't actually in the category. Removing draft and userpages from articlespace categories is an essential maintenance task that cannot be ignored, so I can't just stop scanning Living people for such pages entirely -- and while redoing the page move myself was a viable workaround when there were just one or two isolated instances, it isn't so feasible anymore when there are 10, 20 or 30 phantom redirects to deal with at the same time.,

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This is continuing to happen, with 20 phantom redirects (or, rather, it ''reports'' 20 as the number of pages, but only ''displays'' 17 pages) now in the category and nothing dropping unless I redo the page moves from scratch. This is not a "put up with it" situation; it needs to be resolved.

This is something that might be addressed as part of T317045.

MPhamWMF moved this task from needs triage to Bugs on the Discovery-Search board.

Change 894709 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Properly pass the page id on page moves

https://gerrit.wikimedia.org/r/894709

dcausse raised the priority of this task from Medium to High.Mar 6 2023, 6:12 PM

Change 894709 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Properly pass the page id on page moves

https://gerrit.wikimedia.org/r/894709

Change 894677 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@wmf/1.40.0-wmf.25] Properly pass the page id on page moves

https://gerrit.wikimedia.org/r/894677

Change 894677 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@wmf/1.40.0-wmf.25] Properly pass the page id on page moves

https://gerrit.wikimedia.org/r/894677

Mentioned in SAL (#wikimedia-operations) [2023-03-07T08:24:25Z] <dcausse@deploy2002> Started scap: Backport for [[gerrit:894677|Properly pass the page id on page moves (T331127)]]

Mentioned in SAL (#wikimedia-operations) [2023-03-07T08:28:36Z] <dcausse@deploy2002> dcausse: Backport for [[gerrit:894677|Properly pass the page id on page moves (T331127)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-03-07T08:41:00Z] <dcausse@deploy2002> Finished scap: Backport for [[gerrit:894677|Properly pass the page id on page moves (T331127)]] (duration: 16m 34s)

The problem should be resolved, new page moves across namespaces should properly delete the page in the old namespace. Some phantom redirects created because of this bug will stay until the process that cleanups the index fixes these pages (it can take up to 8weeks), if you identify annoying ones please do let us know in this ticket so that we can clean them up manually quicker. Sorry for the inconvenience this has caused.

I'll note that there is one new page so far that ended up in the incategory search today for the same reasons, but I'm not immediately doing anything about it because of what you said about how the process fix may take time to work through the database -- and also because I've nominated the mainspace move target for AFD as improperly sourced possible self-promotion anyway -- but I also wanted to ask: despite there only being one page currently in the search, the number of pages is being reported as three by the "results" counter in the top right corner. Would this simply be an artifact of the same problem, which will clean itself up as the fix that was already applied here propagates, or would this be a different problem that has to be looked at separately?

I'll note that there is one new page so far that ended up in the incategory search today for the same reasons, but I'm not immediately doing anything about it because of what you said about how the process fix may take time to work through the database -- and also because I've nominated the mainspace move target for AFD as improperly sourced possible self-promotion anyway -- but I also wanted to ask: despite there only being one page currently in the search, the number of pages is being reported as three by the "results" counter in the top right corner. Would this simply be an artifact of the same problem, which will clean itself up as the fix that was already applied here propagates, or would this be a different problem that has to be looked at separately?

The fix for this ticket was applied on all WMF servers today at 2023-03-07T08:41:00‎

  • User:Tuokkarr/sandbox (Daniele_Servadei) was moved today at 2023-03-07T07:04:28‎ (moved before the fix)

The two invisible results are due to the same problem I believe but are removed as part of existence check done when displaying results

  • Draft:Move/Catherine E. Delahodde moved at 2023-03-07T06:14:26‎ (moved before the fix)
  • Draft:Move/Jim Connors moved at 2023-03-03T23:09:10Z (moved before the fix as well)

I manually ran the clean up script on these 3 pages to avoid future confusions, please do let me know if you still encounter this issue in the future.

Thanks for this, I forgot that I had filed a report here in the past and thought I had always just brought it to VPT.

I have to add that I'm also coming across a few examples of this happening to other pages on the current https://en.wikipedia.org/wiki/Wikipedia:Database_reports/Polluted_categories report; e.g. User:ArielrobinsonWK/sandbox is showing up in all of the categories that are on JCO Global Oncology, and in fact I outright missed the tile of the page, and removed the categories without realizing that I was removing them from a mainspace article, only to have to revert myself when I finally realized what I'd done. (Which is one of the main reasons why the search needs to be clean: because I, or somebody else, might make that very mistake.) But I'm not seeing as many examples of it as one might think based on how many of these are in the Living people search. (There are more categories to search than usual, but that's because the report failed to run at all for a couple of weeks.)

Sorry to see this happening again, it is probable that we missed some edge cases when deploying T317045.

dcausse subscribed.

This is continuing, with another cluster of these uncategorized redirects to already-moved pages having shown up today. I really need this to be solved as soon as possible, because the redirects constitute clutter that deeply interferes with getting the job of cleaning up categorized draft or user pages done. There are now 10 of them that I have to work around every time I run the search (as well as the counter at the top of the page claiming 14, meaning that there are also four fully-deleted mystery titles as well) with the problem being that I can't just skip the purple "have visited before" links on the grounds that they're definitely one of these -- sometimes they're still draft or user pages that somebody put back into categories a second time after the first removal, meaning they have to be actively removed again, meaning I have to memorize all of these phantom redirects to know which purple links are these and which purple links aren't. So it's becoming a deep burden, and desperately needs to be fixed, because I can't be expected to infallibly memorize a growing list of these.

I would also note that even with the draft or user pages that actually are in categories, there seems to be a deep, deep lag in those pages actually dropping out of a refreshed search after they've been removed. Unlike these phantom redirects, pages like that do eventually drop out, but only after an hour or so (when they should normally drop out within seconds), and the phantom redirects still just aren't dropping out at all -- even days later, the first batch that I identified days ago are all still there.

Gehel set the point value for this task to 8.Jun 17 2024, 3:46 PM

All of the original pages that were in the category when I first revived this bug report are gone; however, there's now a completely different cluster of phantom redirects still lingering in the category. I really need them to go away, because having to constantly work around them is becoming a burden that's actively interfering with getting the job done.

Change #1051077 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] cirrus-streaming-updater: bump image version

https://gerrit.wikimedia.org/r/1051077

Change #1051077 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus-streaming-updater: bump image version

https://gerrit.wikimedia.org/r/1051077

Moving a page from one namespace to another should now properly cleanup the search index, existing phantom redirects might still be around for a couple weeks while the automated cleanup process takes care of them. Please let me know if you see new instances of this problem in the future, sorry for the inconvenience.

It's been a few weeks for old stuff to get refreshed, but we got this report today (permalink).

Is there a query that would identify the phantoms/zombies so that we can check that they are being cleaned out?

indeed that page should have been cleared out within ~2 weeks, i'll take a closer look into why it hasn't been automatically fixed.

Change #1067372 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] sanity: Handle redirect in wrong index

https://gerrit.wikimedia.org/r/1067372

It looks like the reason these weren't being cleared out is that the remediations haven't seen this particular error before. The redirect was written to the wrong index. It notices that the redirect is there and requested a deletion, but it was deleted from the place it was expected to be instead of where it actually is. Attached patch adjusts to pass along the index that the redirect was found in so that it can be deleted from that index.

Change #1067372 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] sanity: Handle redirect in wrong index

https://gerrit.wikimedia.org/r/1067372

This should have been released today, it will still take two weeks for the remediations to work their way through all possible pages.

After having not seen this problem recur for several weeks since the last batch I posted about was cleared, there's been a new isolated case recurring again, with User:Citrivescence/Asher Perlman still not dropping from https://en.wikipedia.org/w/index.php?search=incategory%3A%22Living_people%22&title=Special%3ASearch&profile=advanced&fulltext=1&ns2=1&ns3=1&ns118=1&ns119=1 after being moved to mainspace five days ago. Given the "two weeks to work through all possible pages" comment above, I suppose it may take a while, but I'm just reporting it so that it's known.

After having not seen this problem recur for several weeks since the last batch I posted about was cleared, there's been a new isolated case recurring again, with User:Citrivescence/Asher Perlman still not dropping from https://en.wikipedia.org/w/index.php?search=incategory%3A%22Living_people%22&title=Special%3ASearch&profile=advanced&fulltext=1&ns2=1&ns3=1&ns118=1&ns119=1 after being moved to mainspace five days ago. Given the "two weeks to work through all possible pages" comment above, I suppose it may take a while, but I'm just reporting it so that it's known.

Thanks for the info this is very helpful.
I think we still have issues with this, looking closer it appears that MW is not properly flagging this event as a page move but a page edit which then is misleading CirrusSearch into thinking that it's a "normal" edit: P69074.
@Ottomata is this expected? is this possible that page moves between namespaces are not properly identified by the page_change stream?

I think we still have issues with this, looking closer it appears that MW is not properly flagging this event as a page move but a page edit which then is misleading CirrusSearch into thinking that it's a "normal" edit: P69074.
@Ottomata is this expected? is this possible that page moves between namespaces are not properly identified by the page_change stream?

Please scratch this, this is apparently the event that happened after the move, it appears that the page_move event simply was not emitted...

Some random debug info:

Poking in logstash i found reqId bbebe293-bf71-43a9-ab67-1c8e12e2453b which is a request to Special:MovePage for title User:Citrivescence/Asher Perlman, but the only log generated was from StashEdit about an empty cache key. Looking at logs nearby in time from the same server didn't turn up anything interesting.

Some random debug info:

Poking in logstash i found reqId bbebe293-bf71-43a9-ab67-1c8e12e2453b which is a request to Special:MovePage for title User:Citrivescence/Asher Perlman, but the only log generated was from StashEdit about an empty cache key. Looking at logs nearby in time from the same server didn't turn up anything interesting.

Correction, it's even weirder. There were more logs generated (same pod, same reqId) for deferred's failing. But they were generated at 07:00, when the edit and the initial log message were at 04:51

This also doesn't seem to be isolated to this one request. I sampled a couple dozen requests that have EmergencyTimeoutException and found 2 more that have the same behaviour of timing out hours after the first log message. In all 3 of these they are edit requests (although possibly thats because most requests log nothing, so we don't always have a pre-failure log message).

Additional reqIds: a8ea1282-dc5c-4265-a35d-38374f2731a5 bb7130ad-59a0-4075-939b-0c75c35cb2ce