Possible data sources are:
- Processed, cached OSM dump (this is how geoshape works)
- Processed, cached Wikidata dump (P625). This would be new development.
- SPARQL query by QID.
- wbentities query
Questions:
- What are advantages and disadvantages of each?
- Wikidata
- Pro: can be edited by any wiki account.
- Pro: contains 9x more geolocated items than OSM
- Con: we'll need a new mechanism to cache or query this source.
- OSM
- Pro: High visibility means more eyes to correct errors.
- Pro: import mostly exists already.
- Con: import is not quite configured for point, would require adjustments.
- Wikidata
- Which data set is more complete? (Determine how many coordinates are associated with a QID on OSM vs. how many wikidata items have coordinates?
- 9M geolocated Wikidata items
- 1M OSM entries linked to a Wikidata item
- Find out how the dataflow (wikidata -> OSM) is currently working (for GeoShapes): Do they have the same data? In which direction does the data flow? Does it happen automatically, and if so on what schedule?
- Every 12 hours a job pulls from OSM and imports data into postgres-postgis.