[go: up one dir, main page]

Page MenuHomePhabricator

Handling of multiple location values from wikidata items
Closed, ResolvedPublic5 Estimated Story Points

Description

There are different situations where the Maps (Kartotherian) rendering code is confronted with duplicates. Before, it refused to render anything in most of these situations. As a first improvement we decided to show whatever comes first, see T306543. Further possible improvements:

User provides duplicate Q-ids

This is a (partially) new feature, see T305822: GeoPoints via QID. The user-provided GeoJSON might contain something like "ids": "Q1, Q1". For whatever reason. Maybe this list is generated by a template that forgot to consider duplicates.

Possible interpretations: Showing the exact same information twice is useless, no matter if it's a shape, point, or whatever. The only information is the fact that some id appears multiple times, and how often. While we could render this as a number or a note, it's very unlikely this is of any use. This is different when the GeoJSON asks for the same Q-id with different properties, e.g. different colors.

Proposal:

  1. Make the list unique before doing anything else with it. In other words, make "ids": "Q1, Q1" behave identical to "ids": "Q1".
  2. Check if it's possible to show the same Q-id twice with different properties. Discuss what should happen then.
User-provided SPARQL returns same Q-id twice

This is only for the existing feature where a SPARQL query returns nothing but Q-ids, which are then resolved and rendered as shapes.

Proposal:

  1. As long as there is no different information assigned to the duplicate Q-ids (e.g. different colors), drop the duplicates, as above.
  2. Discuss what should happen when the properties are different. It doesn't make much sense to render the same shape twice, with different colors.
User-provided SPARQL returns same coordinate twice

This is only when the query returns coordinates that are then rendered as points.

Proposal:

  1. Drop exact duplicates where all relevant information is identical.
    • Optional: Consider only fields than make a difference on the map. This includes the Q-id, coordinates, title, description and image, but no other fields.
  2. Discuss what should happen when two different Q-ids have the same coordinate. These will be rendered on top of each other. One is probably unreachable for the reader.
Item actually contains multiple coordinates

Example: https://www.wikidata.org/wiki/Q833129. While this is rare and often an error (typically a missing "preferred" or "deprecated" flag), it's sometimes valid and needs to be considered. A good example is a lake that is considered a single thing, but is described with multiple shapes.

Proposal:

  1. As a next incremental step render all points, all the same way. Make sure this only happens when the coordinates are actually different.
  2. Discuss if we want to render the points somehow different, e.g. with an extra note like "Great Sea (multiple places)" or "Great Sea (1 of 2 places)".

Event Timeline

The duplicate QID issue might be mostly due to Kartotherian's internal usage of the QID as a key into objects. If we stored entities as a list internally, then we might be able to simply ignore the issues around duplicates and render each point, be they exactly the same or not. That saves us from having to make decisions about the precedence of properties on each row, for example.

Change 789080 had a related patch set uploaded (by Awight; author: Awight):

[mediawiki/services/kartotherian@master] [WIP] Test for multiple points

https://gerrit.wikimedia.org/r/789080

Change 789107 had a related patch set uploaded (by Awight; author: Awight):

[mediawiki/services/kartotherian@master] [WIP] Treat rows independently rather than keying by QID

https://gerrit.wikimedia.org/r/789107

Change 789127 had a related patch set uploaded (by Awight; author: Awight):

[mediawiki/services/kartotherian@master] Store ids as a Set

https://gerrit.wikimedia.org/r/789127

awight moved this task from Doing to Tech Review on the WMDE-TechWish-Sprint-2022-04-27 board.
awight moved this task from Tech Review to Doing on the WMDE-TechWish-Sprint-2022-04-27 board.

I think there might be some glitches around properties. Pulling this back in to add tests and verify.

awight removed awight as the assignee of this task.May 5 2022, 6:49 AM
awight removed awight as the assignee of this task.May 5 2022, 3:09 PM
awight moved this task from Doing to Tech Review on the WMDE-TechWish-Sprint-2022-04-27 board.

Change 789809 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/services/kartotherian@master] Fix incomplete array length check in _runWikidataQuery

https://gerrit.wikimedia.org/r/789809

Change 789809 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/services/kartotherian@master] Fix incomplete array length check in _runWikidataQuery

Reason:

Obsolete via Id27701c.

https://gerrit.wikimedia.org/r/789809

Change 789127 merged by jenkins-bot:

[mediawiki/services/kartotherian@master] Store ids as a Set

https://gerrit.wikimedia.org/r/789127

Change 789107 merged by jenkins-bot:

[mediawiki/services/kartotherian@master] Treat rows independently rather than keying by QID

https://gerrit.wikimedia.org/r/789107

Change 789080 merged by jenkins-bot:

[mediawiki/services/kartotherian@master] Test for multiple points

https://gerrit.wikimedia.org/r/789080

WMDE-Fisch set the point value for this task to 5.May 12 2022, 12:19 PM

To test this one could use

<mapframe latitude="41.671111" longitude="13.487222" zoom="10" width="400" height="400" align="center">
{
  "type": "ExternalData",
  "service": "geopoint",
  "ids": "Q833129"
}
</mapframe>

It's also possible to explicitly trigger the error with a duplicated QID even when the entity doesn't contain multiple points, like:

"ids": "Q833129,Q833129"

My issue during the demo was simply that I didn't have geopoints enabled locally :-) -- Fisch pointed this out and now I'm able to verify the fix locally.

thiemowmde claimed this task.