[go: up one dir, main page]

Page MenuHomePhabricator

Wikibase Geographic coordinate changes may not propagate to GeoData database table
Open, Needs TriagePublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  1. Install a fresh wiki with Wikibase (client and repository) and GeoData extensions (without using CirrusSearch, with Scribunto)
  2. Import the Module:WikidataCoord and Template:Coord pages (and the modules they depend on)
  3. Create a Wikibase property with label "coordinate location" with the Geographic coordinates data type
  4. Create a new page with content "{{#invoke: WikidataCoord | main | {{#property: coordinate location }} | display=title }}"
  5. Create a sitelink to a new Wikibase Item for the new page
  6. Add the "coordinate location" property to the Wikibase Item
  7. Run maintenance/runJobs --wait to drain the JobQueue
  8. Change the value of the "coordinate location" property in the Wikibase Item
  9. Run maintenance/runJobs --wait to drain the JobQueue
  10. Check the coordinates displayed on the new page
  11. Check the latitude and longitude for the new page in the geo_tags table in the database
  12. Repeat steps 4-11 several times

What happens?:
On at least one of the pages, the geo_tags entry will not be updated with the new coordinates, even though the Wikibase Item and the coordinates displayed on the page are using the new coordinates correctly.

What should have happened instead?:

All of the geo_tags entries are updated to the new coordinates.

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc:

Installed Software
MediaWiki1.36.1
PHP7.4.22 (fpm-fcgi)
MariaDB10.5.11-MariaDB
ICU67.1
LuaSandbox4.0.2
Lua5.1.5
Installed Skins
Timeless0.9.1 (80cc022) 18:56, 23 July 2021
Vector1.0.0 (b9deb92) 19:04, 23 July 2021
Installed extensions
Flagged Revisions– (c63cb4d) 19:16, 22 July 2021
CodeEditor– (e587a94) 12:59, 22 July 2021
WikiEditor0.5.3 (cf1d759) 14:23, 23 July 2021
ParserFunctions1.6.0 (d999660) 13:28, 27 May 2021
Scribunto– (ac71012) 21:34, 23 July 2021
TemplateStyles1.0 (e548bf1) 04:55, 28 May 2021
TitleBlacklist1.5.0 (67533b2) 05:07, 28 May 2021
WikibaseClient2ddf4cd (with patch)
WikibaseRepository2ddf4cd (with patch)
WikipartmentsGeocode0.1.0*
GeoData– (a788098) 04:31, 27 May 2021
WikipartmentsMessages0.0.0*

(versions with an asterisk indicate custom extensions of mine)

Root cause

The root cause is one of two possible things, depending on how you look at it.

  1. The GeoData onLinksUpdateComplete hook uses the parser output to get the coordinates. If the refreshLinks job does not occur after the associated htmlCacheUpdate job, GeoData will process old parser output that does not have the new coordinates, causing the geo_tags database to lag behind the actual Wikibase and page coordinate values.
  2. The possibility of jobs with (recursive) dependencies on other jobs being added to the job queue, but no way of recording or enforcing those dependency requirements. The ChangeNotification job creates both an htmlCacheUpdate job and a refreshLinks job. The refreshLinks depends on the htmlCacheUpdate job (and any jobs it creates), but there's no way for the job queue runner to know that or enforce an ordering because of that. This allows the jobs to be run out of order.

There are several ways the jobs involved can be run out of order, but the most common two I've seen are:

  1. The ChangeNotification job schedules a recursive htmlCacheUpdate job followed by a refreshLinks job. The random order processing of the queue (default) can cause the refreshLinks job type to be pulled from the queue before the htmlCacheUpdate job type
  2. The change notification job schedules a recursive htmlCacheUpdate job followed by a refreshLinks job. The job queue is processed in-order (either by chance or configuration). The initial htmlCacheUpdate job uses partitionBacklinkJob() to schedule more jobs to do the actual cache update. Theses jobs are behind the refreshLinks job in the queue, and since the processing is in-order, the refreshLinks job runs before the htmlCacheUpdate job(s) that do the actual invalidation.

Workarounds

  • Remove the ChangeNotification, htmlCacheUpdate, and refreshLinks job classes from the default queue (via $wgJobTypesExcludedFromDefaultQueue), run them explicitly in order (ChangeNotification then htmlCacheUpdate then refreshLinks) via runJobs.php --type. It's probably required to remove the other Wikibase Client job classes as well.

Note: this workaround has the potential to cause starvation of some of the job classes that are not run as default.