[go: up one dir, main page]

Page MenuHomePhabricator

Three pages appear are marked as seen in the watchlist when visiting only one of them (French Wikisource)
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue:
On fr.wikisource, assuming that 3 consecutive pages of the same book belong to your watchlist (Foo.djvu/10, Foo.djvu/11 and Foo.djvu/12) and appear currently as bold because someone else modified them recently (CSS class mw-watchlist-line-watched, if I'm correct):

  • visit the page in the middle (Foo.djvu/11)
  • view the watchlist again: the three pages are now marked as seen (they don't have the mw-watchlist-line-watched class any more).

What should have happened instead?:
Of course, only the page you visited (Foo.djvu/11) should be marked as seen in the watchlist.

Other users on the French Wikisource have reported the same problem. What is important appear to be the order of the pages in the DjVu book, not the order they appear in your watchlist.

Sorry this is not easy to reproduce. To understand what is happening I loaded hundreds of pages to my watchlist, using the Recent Pages page to "predict" what pages might be modified by other users in the near future.

Other information (browser name/version, screenshots, etc.):
Firefox 91.13.0esr
French Wikisource with Vector 2010 and Vector 2022 (I checked both)
Edit: I first reported that I couldn't reproduce it on en.wikisource, but just now I could reproduce it there. So maybe it's not language-specific.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Seudo renamed this task from Three pages appear are marked as seen in the watchlist when visiting only one of them to Three pages appear are marked as seen in the watchlist when visiting only one of them (French Wikisource).Oct 12 2022, 2:18 PM

I can confirm the problem reported by Seudo; more importantly, it does happen even with two pages: assuming only Foo.djvu/11 and Foo.djvu/12 appear in the watchlist as recenlty modified, and Foo.djvu/11 is visited, Foo.djvu/12 will also be marked as visited.

This seems to be caused by a performance optimization for browsing the book pages. When you visit any book page on Wikisource, the ProofreadPage extension asks your browser to also fetch the data for the previous and next page, to speed up the navigation if you view those pages next (this is done using <link rel="prefetch" …>). However from MediaWiki's perpective, this looks exactly the same as you actually viewing those pages, so they are marked as seen on the watchlist. I found some other concerns about this behavior reported at T299124.

This seems to be caused by a performance optimization for browsing the book pages. When you visit any book page on Wikisource, the ProofreadPage extension asks your browser to also fetch the data for the previous and next page, to speed up the navigation if you view those pages next (this is done using <link rel="prefetch" …>). However from MediaWiki's perpective, this looks exactly the same as you actually viewing those pages, so they are marked as seen on the watchlist. I found some other concerns about this behavior reported at T299124.

You are probably right : I noticed that, when loading the watchlist very quickly after loading a page, the previous and following are not (or not yet) marked as seen. But If I wait for a few seconds, they are systematically marked as seen.

matmarex assigned this task to Tpt.
matmarex added a subscriber: Tpt.

This problem should also be resolved by the fix for T299124. The change will be deployed to Wikimedia wikis next week, 25-27 October, per the usual schedule. Thanks for the fix @Tpt!

Seudo reopened this task as Open.EditedNov 9 2022, 1:02 PM

I reopen this bug report because the bug still appears, although it has changed slightly. Now, only the following page is marked as read.

In the example given in the first message, when you view Foo.djvu/11, both Foo.djvu/11 and Foo.djvu/12 are marked as seen (but Foo.djvu/10 is not marked as seen, which is an improvement).

@Seudo Thanks for the report. I don't see that problem now (and I could reproduce it before), so I have some questions/requests:

  • Are you sure that it's behaving exactly as you described, and it's not some other issue with pages being marked as read incorrectly? (e.g. T320865)
  • Can you try disabling your gadgets, user scripts, etc. the next time you get a chance to test this, and see if the problem still occurs? (it could be a gadget or something also trying to prefetch the pages)
  • If you don't mind, please share the links to the specific pages where you saw the issue, the next time it happens.

Hi @matmarex

If I understand T320865 correctly, it's something different. I am not speaking about different revisions of the same page, but about revisions of different pages (when someone edited a range of subsequent pages in the same book). Also, this behaviour is specific to Wikisource (maybe the French Wikisource?) since it's about the Page: namespace only.

I just deactivated all gadgets and beta features and it does not change that behaviour. The procedure is exactly the one I described in the first message here, except that only two pages are marked as seen.

Another user in the French Wikisource mentioned the same problem here.

The image below shows my watchlist. Both Page:Ramuz - La beauté sur la terre, 1927.djvu/240 and Page:Ramuz - La beauté sur la terre, 1927.djvu/241 are marked as "seen" by Mediawiki, but only 240 is marked as a visited link by the browser (Firefox) since this is the only page I really visited.

watchlist.png (456×728 px, 50 KB)

[EDIT 1] I made another try and checked the HTTP requests made by Firefox.

When I load https://fr.wikisource.org/wiki/Page:Ramuz_-_La_beaut%C3%A9_sur_la_terre,_1927.djvu/246, Firefox loads:

  • the 246 and 247 HTML pages (not 245), which are around 46 kB;
  • the 245, 246 and 247 JPG images from Commons (the fac-simile, I guess);
  • many other icons, Javascript files, directly or through load.php

[EDIT 2] To be even more precise, when loading later page 212, page 213 was also loaded by the browser with the following request headers (notice that the referer is page 212):

GET /wiki/Page:Ramuz_-_La_beaut%C3%A9_sur_la_terre,_1927.djvu/213 HTTP/2
Host: fr.wikisource.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0
Accept: */*
Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate, br
Referer: https://fr.wikisource.org/wiki/Page:Ramuz_-_La_beaut%C3%A9_sur_la_terre,_1927.djvu/212
X-Moz: prefetch
DNT: 1
Connection: keep-alive
Cookie: [removed for privacy]
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: no-cors
Sec-Fetch-Site: same-origin
TE: trailers

Thanks!

Thank you for the details! That explains it – the X-Moz: prefetch header points to Firefox's link prefetching feature (https://developer.mozilla.org/en-US/docs/Web/HTTP/Link_prefetching_FAQ), triggered by <link rel="next" …> on the page. I couldn't reproduce this, because I had the network.prefetch-next preference disabled (I don't remember ever toggling it… but I've been using and upgrading the same Firefox installation for many years, so maybe I forgot, or maybe it was changed by some update).

As a workaround you could also toggle off that preference.

As for fixing it, I see three approaches:

  1. Just remove the <link rel="next" …> (like we removed <link rel="prefetch" …> already in T299124). This would fix this problem, but I'm not sure about other consequences. Browsers might provide keyboard shortcuts for navigating using these links that would stop working. In theory it would also make navigating between the pages slightly slower (only slightly though, I'm not proposing removing the prefetch for the image thumbnails, which is the slowest part, as far as I know), but in practice I'm not sure if this is even working correctly now… I've been playing with the browser dev tools and the prefetched data doesn't seem to be loaded from cache when actually navigating to the next page. The feature was added about 2 years ago (in T230689).
  2. Teach MediaWiki about the X-Moz: prefetch header, so that it won't mark those pages as visited. We'd also need to set some caching headers if the user has the page on their watchlist, to ensure that the browser doesn't load the response directly from cache when really navigating to the next page, so that then the page can be marked as visited. (The last part might already work, not sure.)
  3. Change the whole system of marking pages as visited so it doesn't depend on GET requests, because they are prone to exactly these kinds of problems. This is probably way bigger than this task.

Thanks, @matmarex !

I confirm that setting network.prefetch-next to false solves the problem Apparently the option is activated by default and Chrome also has a prefetching option.

Of course disabling the Firefox preference is not for the average user... Solutions 2 or 3 would probably be preferable, but I understand they might not be easy to implement.

Tgr moved this task from Inbox to Triaged on the Growth-Team board.
Tgr subscribed.
  1. Change the whole system of marking pages as visited so it doesn't depend on GET requests, because they are prone to exactly these kinds of problems. This is probably way bigger than this task.

Making that decision would require a level of product management attention watchlists are unlikely to get in the near future.
Unless maybe we implement a functionally near-identical replacement, like marking read via AJAX or tracking pixel.

  1. Teach MediaWiki about the X-Moz: prefetch header, so that it won't mark those pages as visited. We'd also need to set some caching headers if the user has the page on their watchlist, to ensure that the browser doesn't load the response directly from cache when really navigating to the next page, so that then the page can be marked as visited. (The last part might already work, not sure.)

This seems complex and somewhat pointless. If we mark it as uncacheable, why bother prefetching it? In theory next has added meaning beyond prefetching, in practice not really.

  1. Just remove the <link rel="next" …>

Seems like the simplest fix. Chrome apparently already dropped prefetch-on-next because of the same mark-as-read issue so it would only affect a small fraction of users.

Change 862330 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/ProofreadPage@master] Remove <link rel="next"> (and "prev") to avoid unwanted prefetching

https://gerrit.wikimedia.org/r/862330

Change 862330 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] Remove <link rel="next"> (and "prev") to avoid unwanted prefetching

https://gerrit.wikimedia.org/r/862330

This change will be deployed to Wikimedia wikis next week, per the usual schedule.