[go: up one dir, main page]

Wikidata:WikiProject Authority control: Difference between revisions

Content deleted Content added
 
(16 intermediate revisions by 8 users not shown)
Line 1:
{{seeThis also|Wikidata:WikiProject Authorityaims control/Archive|Wikidata:Externalto the improvement of the quantity and quality of external identifiers| present on Wikidata:VIAF}}.
Wikidata pays a lot of tribute to authority control, linking to all kinds of datasets and databases with various IDs.
The holy grail of every GLAM worker '''Sum of All People, with links to their Works''' is coming about!
 
If you have ideas, please [https://www.wikidata.org/w/index.php?title=Wikidata_talk:WikiProject_Authority_control&action=edit&section=new open a new thread] in the talk of the WikiProject.
But we’re just at the start of a lot of work in that direction. The purpose of this project is to try and coordinate such work.
 
== Overview on external identifiers ==
I know a few things (ViafBot, Mix-n-Match) and I'd like to help with some things, but I don't know what others are doing.
* For a description of the structure of external-id properties, see '''[[Wikidata:External identifiers]]'''
--[[User:Vladimir Alexiev|Vladimir Alexiev]] ([[User talk:Vladimir Alexiev|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 08:08, 27 January 2015 (UTC)
* For a list (obviously incomplete) of databases using Wikidata as authority control, see '''[[Wikidata:Wikidata for authority control]]'''
* For information about the progresses of coreferencing on Wikidata in the years up to 2020, see the [[Wikidata:WikiProject Authority control/Archive|'''archive of this page''']]
 
=== Most useful tools and gadgets ===
== Data Sources ==
{{see also|Wikidata:Tools}}
The report [http://vladimiralexiev.github.io/CH-names/README.html#sec-1 Name Data Sources for Semantic Enrichment] shows that when it comes to name data sources, maybe the two that matter are VIAF and Wikidata.
Main tools for coreferencing:
* Their name coverage is fairly orthogonal: VIAF has more name variations and permutations, Wikidata has more translations ([http://vladimiralexiev.github.io/CH-names/cranach-venn.html Venn diagram of names for Cranach]).
* {{Q|28054658}}: catalogs of external ids can be imported and gradually matched to Wikidata items
* VIAF is much bigger: 35M persons/orgs. Wikidata has 2.7M persons and maybe 1M orgs.
* {{Q|29032512}}: allows adding batches of statements to Wikidata items
* Only 0.5M of Wikidata persons/orgs are coreferenced to VIAF, with maybe another 0.5M coreferenced to other datasets, either VIAF-constituent (eg GND) or non-constituent (eg RKDartists). So coreferenced part between the two is still quite small (30%) and a lot of work remains!
* A lot can be gained by [http://vladimiralexiev.github.io/CH-names/README.html#sec-4 leveraging coreferencing across VIAF and Wikidata]: finding errors in Authority files, finding merge candidates in Wikidata, promulgating identifiers...
* Wikidata has great tools for crowd-sourced coreferencing.
 
Main gadgets for coreferencing:
Please comment!
* [[User:Magnus Manske/mixnmatch gadget.js]]: when you open an item, you see all the Mix'n'match entries automatically matched to the item and you can manually confirm the correct automatches
* [[User:Bargioni/MnM ext2.js]]: little extension of the previous gadget, which allows to easily remove the incorrect automatches (one click on X) and to eventually mark entries as not applicable to Wikidata (two clicks on X)
 
== VIAF ==
=== RKDArtists Coreferencing ===
For information about the relationship between Wikidata and {{Q|54919}}, see '''[[Wikidata:VIAF]]''' and its subpages
RKDartists is an important Authority that does not yet participate in VIAF. There are already [http://wdq.wmflabs.org/api?q=claim%5B650%5D&noitems=1 21760 RKDartist id's] on Wikidata. These could be imported to VIAF for free!
 
== AAT ==
=== British Museum Coreferencing ===
The BM has several thesauri that are not co-referenced to anything in the world. I think they'd see it as a major win if the community helps them to co-reference.
 
The {{Q|Q611299}} from the {{Q|Q11203476}} is a crucial multilingual thesaurus in cultural heritage, with 56,537 concepts as of 14 June 2023. See http://vocab.getty.edu/sparql
* {{P|1711}}: 176461 persons, 21511 matched
* [[Wikidata:Property_proposal/British_Museum_place]]: 45883 places
* [[Wikidata:Property_proposal/British_Museum_thesauri]]: 28 more thesauri with 26804 entries
 
This could be followed by importing the 2.5M cultural objects of the BM.
 
=== ULAN Coref Relations ===
 
ULAN does record possible matches and mismatches in their editorial system: [http://getty.ontotext.com/sparql?query=PREFIX+gvp%3A+%3Chttp%3A%2F%2Fvocab.getty.edu%2Fontology%23%3E%0D%0APREFIX+aat%3A+%3Chttp%3A%2F%2Fvocab.getty.edu%2Faat%2F%3E%0D%0APREFIX+xl%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2008%2F05%2Fskos-xl%23%3E%0D%0APREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0D%0APREFIX+schema%3A+%3Chttp%3A%2F%2Fschema.org%2F%3E%0D%0Aselect+%3Fx+%3Fx_name+%3Fx_bio+%3Frel+%3Fy+%3Fy_name+%3Fy_bio+%7B%0D%0A++filter+%28%3Frel+in+%28gvp%3Aulan1005_possibly_identified_with%2C+gvp%3Aulan1006_formerly_identified_with%2C%0D%0A+++++++++++++++++++gvp%3Aulan1007_distinguished_from%2C+gvp%3Aulan1008_meaning_-usage_overlaps_with%29%29%0D%0A++%3Fx+%3Frel+%3Fy.%0D%0A++filter+exists+%7B%3Fx+gvp%3AagentTypePreferred%7C%28gvp%3AagentTypePreferred%2Fgvp%3AbroaderExtended%29+aat%3A300025101%7D%0D%0A++%3Fx+gvp%3AprefLabelGVP+%5Bxl%3AliteralForm+%3Fx_name%5D%3B%0D%0A+++++foaf%3Afocus+%5Bgvp%3AbiographyPreferred+%5Bschema%3Adescription+%3Fx_bio%5D%5D.%0D%0A++%3Fy+gvp%3AprefLabelGVP+%5Bxl%3AliteralForm+%3Fy_name%5D%3B%0D%0A+++++foaf%3Afocus+%5Bgvp%3AbiographyPreferred+%5Bschema%3Adescription+%3Fy_bio%5D%5D.%0D%0A%7D&_implicit=false&implicit=true&_equivalent=false&_form=%2Fsparql ULAN Artists Whose Identity May be Associated or Confused With Another] (608 pairs).
 
Looks like this:
{| class="wikitable sortable"
|-
! x !! x_name !! x_bio !! rel !! y !! y_name !! y_bio
|-
| ulan:500071106 || Master of 1515 || Portuguese painter, active 1515 || gvp:ulan1005_possibly_identified_with || ulan:500025279 || Afonso, Jorge || Portuguese painter and court artist, born ca. 1470-1475, died before 1540
|-
| ulan:500042027 || Master of the Madre de Deus Retablo || Portuguese painter, active 16th century || gvp:ulan1005_possibly_identified_with || ulan:500025279 || Afonso, Jorge || Portuguese painter and court artist, born ca. 1470-1475, died before 1540
|-
| ulan:500032055 || Monogrammist A. M. || Spanish artist, active 19th century || gvp:ulan1005_possibly_identified_with || ulan:500038287 || Aguirre, Marcial || Spanish sculptor, 1841-1900
|}
 
Here's to proper coreferencing! --[[User:Vladimir Alexiev|Vladimir Alexiev]] ([[User talk:Vladimir Alexiev|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 18:07, 12 March 2015 (UTC)
 
=== Match Persons not Disambiguation Pages ===
We should match '''persons to persons''', not '''disambiguation pages''' to persons or other disambiguation pages.
 
Wikipedias, GND and RKD all have disambiguation pages (in GND they are called "undifferentiated names").
13 Feb 2015:
* [https://commons.wikimedia.org/wiki/User_talk:Jane023#Don.27t_Coreference_Disambiguation_pages Wrote to Jane]
* Wrote to Magnus: [https://meta.wikimedia.org/wiki/Talk:Mix%27n%27match#Filter_out_Disambiguation_entries_and_Un-notable_Persons Filter out Disambiguation entries and Un-notable Persons]
 
Do you agree with my reasoning:
* Jane said "any match is better than none"
* I countered "A '''correct''' match is better than none"
* the only way to make sure it's correct is to examine more data about the person, which will necessarily lead you to a real person page.
* Look at the ULAN data above: that's good data that gives you some basis for decision. A name alone does not.
--[[User:Vladimir Alexiev|Vladimir Alexiev]] ([[User talk:Vladimir Alexiev|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 19:34, 12 March 2015 (UTC)
: {{support}} {{ping|Vladimir Alexiev|Randykitty|Ghuron}} A GND Tn (Thesaurus name = undifferentiated) is not a stable disambiguation page. '''A Tn is a placeholder.''' It can be deleted, it can be upgraded into a Tp (Thesaurus person), or changed into a redirect. Works connected with a Tn will be checked by the library or archive who owns them and afterwards might be delinked. The database [http://swb.bsz-bw.de/DB=2.104/SET=4/TTL=1/LNG=EN/START_WELCOME?retrace=0 Online GND] (OGND) includes only Tp numbers. --[[User:Kolja21|Kolja21]] ([[User talk:Kolja21|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 00:04, 28 March 2015 (UTC)
 
=== Coreference AAT ===
AAT is a crucial thesaurus in cultural heritage.
* It has 40k concepts, see http://vocab.getty.edu/sparql
select (count(*) as ?c) {
?x a skos:Concept; skos:inScheme aat: }
* Of them only 363 AAT are coreferenced, or under 1%, see http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B1014%5D
 
Of these, 22,000 (54.4%) are mapped to 21,773 Wikidata items, also as of 11 September 2020. ([https://w.wiki/bsK Live query].)
I think that's BAD. I'm sure going to need that coref for the Europeana Food and Drink Classification Scheme that will be based on Wikidata and AAT:
* [http://www.slideshare.net/valexiev1/europeana-food-and-drink-classification-scheme presentation]
* [http://vladimiralexiev.github.io/pubs/Europeana-Food-and-Drink-Semantic-Demonstrator-Specification-(D3.19).pdf report]
 
[https://tools.wmflabs.org/mix-n-match/#/catalog/48 AAT is actively coreferenced on Mix-n-Match].
Update: the AAT-Wordnet coreference described below is brought into Wikidata. [https://tools.wmflabs.org/mix-n-match/#/catalog/48 AAT is actively coreferenced on Mix-n-Match]: 12985 (32%) matched, 3293 (8%) awaiting confirmation, 1553 (3.8%) confirmed no-matches, and 22543 (55.7%) awaiting matching. So it's way better than 2 years ago. Help coreference this pivot thesaurus that is of immense importance for Cultural Heritage! --[[User:Vladimir Alexiev|Vladimir Alexiev]] ([[User talk:Vladimir Alexiev|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 15:04, 19 September 2017 (UTC)
 
For historical information about the relationship between Wikidata and AAT, see the archived material at [[Wikidata:WikiProject Authority control/Archive#Coreference AAT|'''Coreference AAT''']].
==== Coreference AAT with Mix-n-Match ====
The Wikidata coref tool Mix-n-Match has mostly been used for people until now.
But I hope it can be used for concepts as well.
 
== GND ==
I made an export that includes AAT URL, preferred English label (without qualifier), parents (ascendants to root) and scope note (description). '''Could also add alternative labels, and labels in other languages''' (Dutch, Spanish, Chinese).
<pre>
select ?id (str(?lab) as ?label) ?parents (str(?scopeNote) as ?note) {
?x a gvp:Concept; dc:identifier ?id; gvp:prefLabelGVP/gvp:term ?lab;
gvp:parentString ?parents.
optional {?x skos:scopeNote [dct:language gvp_lang:en; rdf:value ?scopeNote]}
}
</pre>
I saved [http://vocab.getty.edu/sparql.xml?query=PREFIX+skos%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23%3E%0D%0APREFIX+gvp%3A+%3Chttp%3A%2F%2Fvocab.getty.edu%2Fontology%23%3E%0D%0APREFIX+xl%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2008%2F05%2Fskos-xl%23%3E%0D%0APREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0D%0APREFIX+dct%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E%0D%0APREFIX+gvp_lang%3A+%3Chttp%3A%2F%2Fvocab.getty.edu%2Flanguage%2F%3E%0D%0Aselect+%3Fid+%28str%28%3Flab%29+as+%3Flabel%29+%3Fparents+%28str%28%3FscopeNote%29+as+%3Fnote%29+%7B%0D%0A++%3Fx+a+gvp%3AConcept%3B+dc%3Aidentifier+%3Fid%3B+gvp%3AprefLabelGVP%2Fgvp%3Aterm+%3Flab%3B%0D%0A+++++gvp%3AparentString+%3Fparents.%0D%0A++optional+%7B%3Fx+skos%3AscopeNote+%5Bdct%3Alanguage+gvp_lang%3Aen%3B+rdf%3Avalue+%3FscopeNote%5D%7D%0D%0A%7D&_implicit=false&implicit=true&_equivalent=false&_form=%2Fsparql as XML] then converted to TDV: [https://www.dropbox.com/s/an3ve1lvk9rqthq/aat.rar?dl=1 aat.rar].
rset --results tsv aat.xml > aat.tdv
 
Maintenance lists: [[Wikidata:WikiProject Authority control/Tn]]
Also see https://meta.wikimedia.org/wiki/Talk:Mix%27n%27match#Coreference_AAT !!!
 
== ULAN ==
==== Coreference AAT through BabelNet ====
Mix-n-Match has good automatic matching, but that works for people.
 
The {{Q|Q2494649}}, also from the {{Q|Q11203476}}, is a dataset of entities in the art world, primarily artists but also museums, galleries, organizations, and companies, with 312,079 entries as of 12 September 2020. See http://vocab.getty.edu/sparql
So let's check what other vocabs that are coref to AAT may be coref to Wikidata:
According to [https://vu-nl.academia.edu/MichielHildebrand Michiel Hildebrand]'s famous CH LOD diagram:
[[File:Culture Datacloud.png|thumb|CH LOD, cultural heritage linked open data (thesauri only)]]
 
select (count(*) as ?c) {
* Wordnet. No such prop in Wikidata
?x a skos:Concept; skos:inScheme ulan: }
* I'd guess Wiktionary is coref to Wordnet, but Wikidata got no site links to Wiktionary
* RKD Concepts. There's prop "RKDartists" and "RKDimages" but none for concepts
* Rijksmuseum Concepts. There's "Rijksmonument" but none for concepts
* Joconde: aha! There's {{P|347}}, and it has [http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B347%5D 2275 instances], so that's better. Joconde is 18% coref to Wikidata but I don't know how much to AAT, maybe I can gain 1k here.
** Looked at the results: nope, Joconde are all paintings, not concepts
* Bibliopolis: never heard of it, and nope
* SVCN: never heard of it, and nope
 
Of these, 79,529 (43.2%) are mapped to 88,415 Wikidata items, also as of 12 September 2020. ([https://w.wiki/bwM Live query].) 45,032 are preliminarily matched based on labels (names) and need to be verified (expect a high percentage of false positives in this group).
Then it dawns on me.
* BabelNet corefs Wordnet and Wikipedia! In fact, it corefs to Wikidata, see http://babelnet.org/stats. (Very useful stat: http://babelnet.org/stats#Numberofpolysemousandmonosemouswords)
* Some months ago they didn't have RDF access. But they do now: http://babelnet.org/download (not download, but API: good enough)
* On the web view of [http://babelnet.org/synset?word=bn:00018522n&details=1&orig=chipotle&lang=EN chipotle] in Sources I see Wikidata
* On the RDF view http://babelnet.org/rdf/page/s00018522n I don't see Wikidata but I see DBpedia, so that's ok
 
[https://tools.wmflabs.org/mix-n-match/#/catalog/27 ULAN is actively coreferenced on Mix-n-Match], but this dataset requires some manual review after import. Items to watch for:
===== AAT-Wordnet coref =====
* Mix'n'Match contains only 183,912 of the ULAN items (those representing humans).
Ok, so off to look for that AAT-Wordnet coref.
* A few ULAN names are formatted LAST NAME, FIRST NAME, and will be imported that way by Mix'n'Match.
- Why yes, it's part of http://semanticweb.cs.vu.nl/europeana/skos/browse/
* Some punctuation in names will be imported with escape characters (//) by Mix'n'Match; these need to be removed.
- I got a file from somewhere that says
* Mix'n'Match may import {{Q|Q36424 }} or "active" dates as birth and death dates; these should be deprecated, ideally with ''<reason for deprecation>'' {{Q|Q80833195}}. The active dates can be correctly added using {{P|1317}} or {{P|2031}} and {{P|2032}}.
<pre>
* ULAN is coreferenced in VIAF.
<aat_wordnet20_mappings>
* ULAN contains values for {{P|P21 }} and {{P|106}}, but these are not imported by Mix'n'Match.
a void:Linkset;
* ULAN contains many alternative names and spellings, which are not captured by Mix'n'Match but can be very helpful for coreferencing to other sources. Adding these as aliases by hand is good!
dcterms:title "AAT-Wordnet 2.0 mappings by Anna Tordai (baseline)" ;
lib:source <http://semanticweb.cs.vu.nl/lod/getty/aat/> ;
void:dataDump <bl_aat_wn.rdf> , <bl_norm_aat_wn.rdf> , <bl_sing_aat_wn.rdf> .
</pre>
(Note: you can get those files from URLs like: http://semanticweb.cs.vu.nl/europeana/api/export_graph?graph=http://semanticweb.cs.vu.nl/lod/getty/aat/bl_sing_aat_wn.rdf&mimetype=text/plain&format=turtle)
 
== RKD artists ==
These are called "baseline" (i.e. mostly literal matches). A quick conversion to Turtle and a line count:
<pre>
$ wc -l bl*
2300 bl_aat_wn.ttl
4369 bl_norm_aat_wn.ttl
4303 bl_sing_aat_wn.ttl
10972 total
</pre>
 
{{Q|Q17299517 }} is a database of artist biographies from the {{Q|Q758610 }}. The database is bilingual (Dutch and English). Of ~422K entries in the database, about 40% are redirects to other items and have been marked at "not applicable" to Wikidata. 92,346 entries are mapped to 88,415 Wikidata items (as of 15 September 2020). 55,389 entries are preliminarily matched based on labels (names) and need to be verified (expect a high percentage of false positives in this group).
Run a query at http://semanticweb.cs.vu.nl/europeana/user/query (specify entailment=None or else!):
<pre>
prefix getty: <http://purl.org/vocabularies/getty/>
prefix aat: <http://purl.org/vocabularies/getty/aat/>
select * {?x skos:inScheme getty:aat; skos:closeMatch ?y}
</pre>
It returns 4592 (see below why).
 
[https://tools.wmflabs.org/mix-n-match/#/catalog/13 RKDartists is actively coreferenced on Mix-n-Match]. The structured data in RKDartists is very robust, and once an RKDartist ID has been mapped to a human in Wikidata, a [[User:BotMultichillT|Bot]] will automatically create statements for the available structured data, with references. The statements added by the Bot include labels in some European languages, date and place of birth, date and place of death, occupation, floruit or work period start/end dates, and work locations with start/end times.
===== AAT-Wordnet Overlaps =====
There is significant overlap between the files:
<pre>
$ cat bl* | sort| uniq | wc -l
4596
$ cat bl* |cut -d " " -f 1 | sort| uniq | wc -l
4581
</pre>
 
Note:
The following AAT concepts have 2 matches:
* Items created from RKDartists ID using Mix'n'Match may contain EN descriptions in Dutch; these should be replaced.
<pre>
* The database contains alternate forms and spellings of names, but these are not automatically added as aliases. Adding them manually will help in coreferencing to other datasets such as ULAN.
aat:bleachers
aat:boxcars
aat:cleavers
aat:feudalism
aat:groats
aat:jackstraws
aat:lats
aat:leotards
aat:morocco
aat:ninepins
aat:quoits
aat:shekels
aat:stairs
</pre>
 
== History ==
We need to reconcile them manually, eg
Please add here references, blogs etc on the topic. For news prior to 2019 see the [[Wikidata:WikiProject Authority control/Archive#History|archive]].
<pre>
aat:bleachers skos:closeMatch <http://www.w3.org/2006/03/wn/wn20/instances/synset-bleacher-noun-1> .
aat:bleachers skos:closeMatch <http://www.w3.org/2006/03/wn/wn20/instances/synset-bleachers-noun-1> .
</pre>
The AAT definition is:
* aat:bleachers vp:descriptiveNote "Use for benchlike tiered seating for spectators at, for example, outdoor sporting events, usually without weather or sun protection, affording less advantageous views than grandstands; may also be used for similarly constructed, often telescoping, indoor seating."@en .
* Inspection at [http://wordnetweb.princeton.edu/perl/webwn?s=bleachers&sub=Search+WordNet&o2=1&o0=1&o8=1&o1=1&o7=1&o5=1&o9=&o6=1&o3=1&o4=1&h=0 Wordnet 3.1] shows that the second one is right.
 
Tweet using tag [https://twitter.com/hashtag/coreferencing '''#coreferencing'''].
That's 4.6k matches, or 11% of AAT.
 
===== AAT-Wordnet2 Representation =====
The coref looks like this:
aat:wrought_iron skos:closeMatch <http://www.w3.org/2006/03/wn/wn20/instances/synset-wrought_iron-noun-1> .
And there's another file aat.ttl with rep like:
<pre>
aat:wrought_iron aat:parentPreferred aat:iron_alloy .
aat:wrought_iron vp:id "300011012" .
aat:wrought_iron vp:labelPreferred "wrought iron"@en .
aat:wrought_iron vp:labelNonPreferred "iron, wrought"@en .
aat:wrought_iron vp:labelNonPreferred "wrought-iron"@en .
</pre>
This is quite old rep. The new rep uses numeric URL: http://vocab.getty.edu/aat/300011012 (and a bunch more data). So we need to construct a numeric URL.
 
===== AATNED-Cornetto Mapping =====
Cornetto is NL Wordnet and AATNED is NL AAT. I got another file saying:
<pre>
<aatned_cornetto_mappings>
a void:Linkset ;
dcterms:title "AATNED-Cornetto mappings by Anna Tordai (baseline)";
lib:source <http://semanticweb.cs.vu.nl/lod/rkd/aatned/> ;
void:dataDump <bl_aatned_cn.rdf.gz> , <bl_norm_aatned_cn.rdf.gz> , <bl_sing_aatned_cn.rdf.gz> .
</pre>
Eg we have this for [http://www.getty.edu/vow/AATFullDisplay?find=300191645&logic=AND&note=&subjectid=300191645 AAT 300191645] "salinity":
<pre>
bl_aatned_cn.ttl: aatned:zoutheid skos:closeMatch cornetto:synset-zoutheid-1-noun .
cornetto-wn20.ttl: cornetto:synset-zoutheid-1-noun cornetto:eqNearSynonym instances:synset-brininess-noun-1 .
cornetto-wn30.ttl: cornetto:synset-zoutheid-1-noun cornetto:eqNearSynonym wn30:synset-brininess-noun-1 .
aatned.ttl: aatned:zoutheid core:notation "300191645" .
</pre>
 
The number of AATNED-Cornetto matches is as follows:
<pre>
> cat bl*|sort|uniq> bl_aatned_all.ttl
> wc -l bl_aatned_all.ttl
6917 bl_aatned_all.ttl
> cat bl_aatned_all.ttl|cut -d " " -f 1 | sort| uniq | wc -l
6857
</pre>
There are more matches than AAT-Wordnet. There are also overlaps: 60 AATNED concepts (0.9%) have two Cornetto matches.
 
We need to merge AATNED-Cornetto with AAT-Wordnet. The correlation is simply by id, eg
<pre>
aatned.ttl: aatned:zwerfkeien core:notation "300011671"
aat.ttl: aat:boulder vp:id "300011671"
</pre>
 
I guess the overlaps between them are quite big, eg for wrought_iron:
<pre>
aatned.ttl: aatned:smeedijzer core:notation "300011012"
bl_aatned_all.ttl: aatned:smeedijzer skos:closeMatch cornetto:synset-smeedijzer-1-noun .
cornetto-wn20.ttl:cornetto:synset-smeedijzer-1-noun cornetto:eqNearSynonym instances:synset-wrought_iron-noun-1 .
cornetto-wn30.ttl:cornetto:synset-smeedijzer-1-noun cornetto:eqNearSynonym wn30:synset-wrought_iron-noun-1 .
</pre>
 
===== DBpedia-Wordnet3 coref =====
The other problem is bigger:
* http://www.w3.org/2006/03/wn/wn20/instances/synset-wrought_iron-noun-1 is Wordnet 2.0 in 9-year old rep, with wn20schema:synsetId "113958999"
* while BabelNet: http://babelnet.org/rdf/s00081730n has
bn:s00081730n skos:exactMatch dbpedia:Wrought_iron, lemon-WordNet:wn30-14802262-n
* that uses a modern LEMON wordnet rep: http://lemon-model.net/lexica/pwn/wn30-14802262-n
* Note: [http://wordnetweb.princeton.edu/perl/webwn?c=8&sub=Change&o2=&o0=1&o8=1&o1=&o7=1&o5=1&o9=&o6=1&o3=1&o4=1&i=-1&h=0&s=wrought+iron Wordnet 3.1 has] IDs like "14826432" and "wrought_iron%1:27:00::"
It doesn't look like Wordnet3 and Wordnet2 share any IDs; we'll deal with that in next section.
 
Lets first do some queries at http://babelnet.org/sparql/ to see what we can see. Look for DBpedia-Wordnet matches:
<pre>
SELECT * WHERE {
?x skos:exactMatch ?y, ?z
filter(strstarts(str(?y),"http://dbpedia.org/resource/"))
filter(strstarts(str(?z),"http://lemon-model.net/lexica/pwn/"))
} LIMIT 30
</pre>
Download: https://www.dropbox.com/s/92gq5r1qm3yytkp/WN3toDBP.csv?dl=1.
 
It has 47607 rows like this (there's a decent chance this will cover the 6k AAT matches):
<pre>
"http://babelnet.org/rdf/s00075206n","http://dbpedia.org/resource/Sundowner_(drink)","http://lemon-model.net/lexica/pwn/wn30-07913081-n"
"http://babelnet.org/rdf/s00039711n","http://dbpedia.org/resource/Sonora_(genus)","http://lemon-model.net/lexica/pwn/wn30-01736256-n"
"http://babelnet.org/rdf/s00070026n","http://dbpedia.org/resource/Sealskin","http://lemon-model.net/lexica/pwn/wn30-04160261-n"
</pre>
 
===== Wordnet3-Wordnet2 coref =====
Since Wordnet3 and Wordnet2 don't share any IDs, we can try to use Wordnet2-Wordnet3 coref made by Jacco van Ossenbruggen and Marc van Assem (VU University Amsterdam) in May 2010 with this VOID (manifest):
<pre>
<wn30-wn20-mappings-jacco>
a void:Linkset ;
dcterms:title "synset-level mappings from Wordnet 3.0 to 2.0, created by jacco's code" ;
lib:source <http://purl.org/vocabularies/princeton/wn30/> ;
void:dataDump
<label-child-matches.ttl.gz> ,
<label-childparent-matches.ttl.gz> ,
<label-instance-matches.ttl.gz> ,
<label-meronym-matches.ttl.gz>,
<label-neargloss-matches.ttl.gz> ,
<label-parent-matches.ttl.gz> ,
<label-unique-matches.ttl.gz> ,
<nearlabel-matches.ttl.gz> ,
<glossmatches-m.ttl.gz> .
 
<wn30-wn20-mappings-sense>
a void:Linkset ;
dcterms:title "synset-level mappings from Wordnet 3.0 to 2.0, created by Mark using the Princeton WordSense mappings" ;
lib:source <http://purl.org/vocabularies/princeton/wn30/> ;
void:dataDump
<synset-matches-based-on-multiple-sense-mappings-princeton.ttl.gz> ,
<synset-matches-based-on-single-sense-mappings-princeton.ttl.gz> .
</pre>
It's a complex affair consisting of many steps, but the major step (contributing 87% of all matches) is glossmatches-m.ttl that looks like
wn30:synset-wrought_iron-noun-1 terms:replaces instances:synset-wrought_iron-noun-1 .
 
And looking at wordnet-synset.ttl, we find the required wn30 ID:
wn30:synset-wrought_iron-noun-1 wn20schema:synsetId 114802262 .
 
=====AAT-Wikidata Sheets=====
After much querying and manual cleaning (over a day of effort), I made some sheets in [https://drive.google.com/drive/u/1/folders/0B7BFygWDV2_PbVRhRm40Mko5dFE this google folder]:
* [https://docs.google.com/spreadsheets/d/10NjsXxZaWO5BOibRJEM1gq5-2iSi2iyC2EEd_3wgbTM/edit#gid=1325220919 AAT-DBpedia-Babelnet.xlsx]: 3324 potential matches, fairly clean, but need checking by more people
* AAT-DBpedia-Babelnet-80-judged.xlsx: example of correct & incorrect matches
* AAT-Wikidata-25-judged.xlsx: example of correct & incorrect matches on Mix-n-Match
 
{{Ping project|Authority control}}: I need your help!
 
{{Ping project|Visual arts}}: Yours too!
 
* Do some checks (add your initials in column "check")
* Add Q numbers to the sheet
* Merge WD items that already have {{P|1014}} (there are 8477) to the sheet to compare the matches (or remove them from the sheet if you're quite confident)
 
I could post the sheet as QuickStatements, but I think there are still 10% incorrect matches, especially for Styles and Periods (see [[Wikidata talk:WikiProject Visual arts/Item structure/Art movements]]. --[[User:Vladimir Alexiev|Vladimir Alexiev]] ([[User talk:Vladimir Alexiev|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 16:02, 7 March 2017 (UTC)
 
====AAT-LCSH coreferencing====
 
[https://gist.github.com/VladimirAlexiev/f4cbb20df566766c4a0f#file-aat-lcsh-tdv 445 AAT-LCSH coreferences] made by Getty editors.
 
400 of them are on the [http://vocab.getty.edu/sparql?query=select+*+%7B%0D%0A++%3Fx+skos%3AexactMatch%7Cskos%3AcloseMatch+%3Fy.%0D%0A++%3Fx+skos%3AinScheme+aat%3A%0D%0A++filter+not+exists+%7B%3Fy+skos%3AinScheme+aat%3A%7D%7D&_implicit=false&implicit=true&_equivalent=false&_form=%2Fsparql Getty LOD site] (see query below), 45 are newly extracted
<pre>
select * {
?x skos:exactMatch|skos:closeMatch ?y.
?x skos:inScheme aat:
filter not exists {?y skos:inScheme aat:}}
</pre>
 
== Geonames Feature Code ==
{{Ping project|Authority control}} {{Ping project|Companies}} {{Ping project|Cultural heritage}} {{Ping project|Visual arts}}
 
<!-- Hmm, these do not exist {{Ping project|Countries}} {{Ping project|Country subdivision}} -->
 
{{P|2452}} is applied only 33 times (see [https://www.wikidata.org/wiki/Property_talk:P2452 disscussion] and [https://www.wikidata.org/wiki/Wikidata:Property_proposal/Archive/43#P2452 archive], while there are 669 codes on Geonames.
I'll ask Magnus to add the Geonames list to Mix-n-Match.
 
Update Jul 2020: Things are much better at https://www.wikidata.org/wiki/Property:P2452:
* Fully matched 545 82.4%
* Preliminarily matched 7
* Unmatched 109 16.4%
* Total 661
* The actual total at http://www.geonames.org/export/codes.html is 680. --[[User:Vladimir Alexiev|Vladimir Alexiev]] ([[User talk:Vladimir Alexiev|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 12:31, 15 July 2020 (UTC)
 
http://www.geonames.org/ontology/mappings_v3.01.rdf has the following mappings:
<pre>
32 dbo http://dbpedia.org/ontology/
5 frgeo http://rdf.insee.fr/geo/
79 lgdo http://linkedgeodata.org/ontology/
31 schema http://schema.org/
</pre>
Can we use them somehow to push this coreferencing further?
 
:I'm a little wary about importing these wholescale, because geonames in not a CC0 database. It's one thing to be providing external links to GeoNames, it's another to be importing data.
 
:I did look at these values recently for English places with geonames links that are marked as both {{Q|532}} and {{Q|1115575}} (see eg {{Q|3137539}} for an example), to identify which Geonames link corresponded to which role; but I purposely decided ''not'' to add a {{P|2452}} statement.
 
:It might be useful to be able to map the codes to Q-numbers here, to facilitate sanity checking of exisiting or proposed co-references. However, even then there are difficulties -- for example, I found that PPLA3 or PPLA4 at Geonames didn't necessarily match to distinctions we would want to make in a {{P|31}} here. [[User:Jheald|Jheald]] ([[User talk:Jheald|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 14:02, 13 March 2017 (UTC)
 
==Supplement Wikidata items with properties from authorities (GND in particular)==
 
Data from DifferentiatedPersons of GND can be used to fill missing properties of according items, e.g.,
* date of birth/death (directly from gnd:dateOfBirth and gndo:dateOfDeath, for entries following YYYY or YYYY-MM-DD - everything else to be skipped)
* {{p|P1416}} can be obtained by a join of gndo:affiliation to wd organizations (may be sparse currently, but can be repeated later on)
* country ({{p|P17}} or {{p|P27}}??) requires translation from gndo:geographicAreaCode, which refers to a customized code table derived from ISO 3166 (not part of GND) ([http://www.dnb.de/SharedDocs/Downloads/DE/DNB/standardisierung/inhaltserschliessung/laenderCodesAlph.pdf?__blob=publicationFile table (pdf)], [http://www.dnb.de/SharedDocs/Downloads/DE/DNB/standardisierung/inhaltserschliessung/laenderCodesLeitfaden.pdf?__blob=publicationFile rules])
* aliases - require filtering of gndo:variantNameForThePerson, which carry no language tag, re. script and presumed language (would [http://search.cpan.org/~ambs/Lingua-Identify-0.56/lib/Lingua/Identify.pm Lingua::Identify] work here?)
For appropriate source statements see [https://www.wikidata.org/wiki/Wikidata:Project_chat#Source_statements_for_items_syntesized_from_authorities_-_recommendations.3F project chat]
-- [[User:Jneubert|Jneubert]] ([[User talk:Jneubert|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 06:35, 21 May 2017 (UTC) (with thanks to [[User:MisterSynergy]] and [[User:ChristianKl]])
 
:I am a big fan of standards but the ISO 3166 is used for modern countries, it does not as a consequence give the "nationality" of people who did precede a country. Thanks, [[User:GerardM|GerardM]] ([[User talk:GerardM|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 07:03, 21 May 2017 (UTC)
:NB yes there are some, but at Wikidata we know about many more former countries. [[User:GerardM|GerardM]] ([[User talk:GerardM|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 07:13, 21 May 2017 (UTC)
 
A "sibling" of Mix-n-Match now imports birth/death dates from authority files: https://www.wikidata.org/wiki/User:Magnus_Manske/Mix%27n%27match_date_import.
Also see discussion about this in relation to Getty ULAN: https://groups.google.com/forum/#!topic/gettyvocablod/TkdelW9RP1g --[[User:Vladimir Alexiev|Vladimir Alexiev]] ([[User talk:Vladimir Alexiev|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 09:38, 2 October 2017 (UTC)
 
== Property proposal for applying SKOS mapping relations to "external identifiers" ==
 
In order be able to map a thesaurus more completely, and - more general - to make Wikidata fit as a linking hub for knowledge organiziation systems, I've proposed a new property which allows to qualify individual links by properties of type "external identifier" as in-exact (close/broad/narrow/related) match.
 
Please feel free to comment at https://www.wikidata.org/wiki/Wikidata:Property_proposal/mapping_relation_type.
 
Cheers, [[User:Jneubert|Jneubert]] ([[User talk:Jneubert|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 12:27, 28 August 2017 (UTC)
 
== Grant proposal soweego ==
{{Ping project|Authority control}}
 
There is a new grant proposal '''soweego''' for authority control. See discussion at https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego#Endorsements.
 
I've considered it seriously but I think it doesn't address the main problem, see a list of 11 considerations (which can also be read as a sort of programme for next important steps for WD authority control). Please express your opinion there. --[[User:Vladimir Alexiev|Vladimir Alexiev]] ([[User talk:Vladimir Alexiev|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 10:19, 2 October 2017 (UTC)
 
:Where is this discussion with 11 considerations? Thanks, [[User:GerardM|GerardM]] ([[User talk:GerardM|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 12:22, 2 October 2017 (UTC)
 
== Remove Obsolete Getty Vocabularies IDs ==
Getty Vocabs have 8k obsolete subjects (identifiers).
You can find them at http://vocab.getty.edu/sparql with a query like
<pre>
select * {
?old a gvp:ObsoleteSubject; skos:inScheme aat: .
optional {?old dct:isReplacedBy ?new}
}
</pre>
Here are the numbers: I can put up the files somewhere: obsolete-AAT-2106.tsv, obsolete-TGN-1016.tsv, obsolete-ULAN-5574.tsv.
 
{{Ping project|Authority control}}
Any takers to replace any old values of {{p'|AAT ID}}, {{p'|TGN ID}}, {{p'|ULAN ID}} with the new values from the respective files?
 
{{ping|Magnus Manske}} rdf:type gvp:ObsoleteSubject should be removed from Mix-n-Match consideration.
:Done, but only changed 5 entries for the ~2100 obsolete ones, the others were either already N/A or matched with an item. --[[User:Magnus Manske|Magnus Manske]] ([[User talk:Magnus Manske|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 15:22, 21 February 2018 (UTC)
:Data should not be removed, but - if necessary - marked as deprecated, with a qualifier giving the reason for deprecation. <span class="vcard"><span class="fn">[[User:Pigsonthewing|Andy Mabbett]]</span> (<span class="nickname">Pigsonthewing</span>); [[User talk:Pigsonthewing|Talk to Andy]]; [[Special:Contributions/Pigsonthewing|Andy's edits]]</span> 11:02, 26 February 2018 (UTC)
 
== Swedish National Library moves direction Linked data ==
Monday june 11 the Swedish National Library is moving in direction Linked data and release a new system called LIBRIS XL ==>
# We have a new identifier LIBRIS-URI {{P|5587}} in Wikidata
 
Open issues are if we need to "upgraded" to support the new identifier in other systems. I have found
# [https://www.wikidata.org/wiki/User:Magnus_Manske/authority_control.js User:Magnus_Manske/authority_control.js]
## tool to read from VIAF.org and populate WD
# [[User:Magnus_Manske/Mix%27n%27match_date_import]]
# [https://github.com/gbv/wdmapper wdmapper]
See also:
* [https://librisbloggen.kb.se/2017/04/11/bibframe-in-libris-xl/ BIBFRAME in Libris XL]
* [https://id.kb.se/doc/model Datamodel] chosen by The National Library of Sweden (KB)
* Presentation from 2017 [http://www.dbc.dk/filer/tekstfiler-pdf-mm./rda-dag/bibframe-in-a-european-perspective Bibframe in an European perspective]
- [[User:Salgo60|Salgo60]] ([[User talk:Salgo60|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 14:07, 10 June 2018 (UTC)
:'''Status update 16 September 2018'''
:A project [https://phabricator.wikimedia.org/tag/wmse-library-data-2018 wmse-library-data-2018] done by {{u|Alicia_Fagerving_(WMSE)}} is converting and adding about 60 000 identifiers to Wikidata....
:I sent in today a question to VIAF asking about plans they have to start supporting {{P|5587}} - [[User:Salgo60|Salgo60]] ([[User talk:Salgo60|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 06:54, 16 September 2018 (UTC)
::'''Status update 28 September 2019'''
::* Still no one has specified how LIBRIS <-> VIAF should work together see Phabricator [https://phabricator.wikimedia.org/T223259 T223259]
::* We lack a change process for errors in VIAF and LIBRIS see example of error [[Talk:Q21522286#Summary]]
:: - [[User:Salgo60|Salgo60]] ([[User talk:Salgo60|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 18:32, 28 September 2019 (UTC)
 
==Introducing {{P|4390}} for full SKOS mappings==
 
For knowledge organization systems which do not deal with clearly defined entities (such as {{Q|5}}), you often find in-exact relations between the external entity and matching Wikidata items. E.g., the STW concept [http://zbw.eu/stw/descriptor/17176-6 Yugoslavia (until 1990)] is not an exact match to {{Q|36704}}, which is described as "1918–1992 country in Southeastern and Central Europe". For STW's [http://zbw.eu/stw/descriptor/12570-6 Executive selection], {{Q|265558}} is closely related, but categorically different (process vs. instrument). While in the latter case it ''might'' be useful to create an exactly matching item, in the former it clearly would be not.
 
Within the domain of traditional knowledge organization system, different mapping relations (such as "close match") have been used to cover such situations. The "external id" properties of Wikidata lacked such expressiveness, till the {{P|4390}} property was introduced. The property, to be used exclusively as a qualifier on external-id properties, allows to more precisely define relations as {{Q|39893449}}, {{Q|39893184}}, {{Q|39894595}}, {{Q|39893967}} or {{Q|39894604}}. These relation types reflect the according [https://www.w3.org/TR/skos-reference/#mapping SKOS mapping properties].
 
Since its introduction in October last year, the property has [https://query.wikidata.org/#%23Number%20of%20items%2Fstatements%20using%20qualifer%20P4390%20for%20each%20property%0A%23added%202016-02-24%20by%20Jura1%20%0A%0ASELECT%20%3Fproperty%20%3FpropertyLabel%20%3Fitems%20%3Fstatements%0AWHERE%0A{%0A%09{%0A%09%09SELECT%20%3Fproperty%20%28COUNT%28DISTINCT%28%3Fitem%29%29%20as%20%3Fitems%29%20%28COUNT%28%3Fvalue%29%20as%20%3Fstatements%29%0A%09%09WHERE%0A%09%09{%0A%20%20%20%20%09%09%3Fprop%20pq%3AP4390%20%3FqualifierP4390value%20.%0A%20%20%09%09%09hint%3AQuery%20hint%3Aoptimizer%20%22None%22%20.%09%0A%09%09%09%3Fitem%20%3Fp%20%3Fprop%20.%20%0A%09%09%09%3Fproperty%20wikibase%3Aclaim%20%3Fp%20.%20%20%0A%20%20%09%09%09%3Fproperty%20wikibase%3AstatementProperty%20%3Fps%20.%0A%20%20%20%20%09%09%3Fprop%20%3Fps%20%3Fvalue%20.%20%20%20%20%20%20%0A%09%09}%0A%09%09GROUP%20BY%20%3Fproperty%20%0A%09%09ORDER%20BY%20DESC%28%3Fitems%29%20DESC%28%3Fstatements%29%0A%09%09LIMIT%2025%0A%09}%0A%09SERVICE%20wikibase%3Alabel%20{%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%2Cen%22%20%20}%20%20%20%20%0A}%0AORDER%20BY%20DESC%28%3Fitems%29%20DESC%28%3Fstatements%29%0A%0A%0A seen some uptake], particularly in the biomedial field. The ongoing mapping of {{Q|26903352}} to Wikidata is based on qualified relations ([http://zbw.eu/beta/sparql-lab/?endpoint=http://zbw.eu/beta/sparql/stw/query&queryRef=https://api.github.com/repos/zbw/sparql-queries/contents/stw/wikidata_mapping.rq state of the mapping, shown as SKOS]). The effort aims at creating a finally complete mapping of the STW descriptors.
 
===Workflow for the mapping process===
 
Our - still experimental - workflow for the STW mapping is as follows:
 
# Use Mix-n-match catalogs for each sub-thesaurus ([https://tools.wmflabs.org/mix-n-match/#/catalog/507 #507], [https://tools.wmflabs.org/mix-n-match/#/catalog/1259 #1259]/[https://tools.wmflabs.org/mix-n-match/#/catalog/1260 #1260]) to assign STW descriptor IDs one-by-one to Wikidata items.
#: ''This sounds simple, yet often reveals qualitiy issues. Some of these, such as obvious duplicates in Wikidata, can be resolved immediately. Sometimes an ugly mess shows up, which can only be [https://www.wikidata.org/wiki/Wikidata:Project_chat#How_to_mark_or_report_items_which_seem_to_be_the_same? reported] to be solved by the community later on. Of course, quality flaws in STW might be revealed also. During this step, it may also be advisable to take notes about other items, which are not the closest ones and not linked via external-id entry in Wikidata, but may be worth linking from the side of the external vocabulary.''
#* For non-exact relationships, immediately open the newly linked item and manually qualify {{P|3911}}.
#* For STW descriptors which are lacking a counterpart in Wikidata and would make sense there, add an item semi-automatically, with {{Q|39893449}} to the STW ID and all avaialable information from the thesaurus (more on that later).
# Assign {{Q|39893449}} to the remaining unqualified {{P|3911}} via Quickstatements, with the input produced by a [https://github.com/zbw/sparql-queries/blob/master/bin/make_qs_input.pl script] executing a [http://zbw.eu/beta/sparql-lab/?endpoint=https://query.wikidata.org/bigdata/namespace/wdq/sparql&queryRef=https://api.github.com/repos/zbw/sparql-queries/contents/wikidata/mapping_relation_qualifier_qs.rq SPARQL query]. This step can be executed multiple times during the mapping process for one sub-thesaurus, in order to keep the list of unqualified entries short.
 
The sequence within the Mix-n-match input file turned out to be crucial for a smooth one-by-one workflow. We sorted the [http://zbw.eu/beta/sparql-lab/?endpoint=http://zbw.eu/beta/sparql/stw/query&queryRef=https://api.github.com/repos/zbw/sparql-queries/contents/stw/stw_mix_n_match.rq generated M-n-m input] by the minimal notation of attached subject categories for the descriptors, and within that alphabetically.
 
===Quality control for mappings using mapping relation types===
 
Maintenance and qualitiy control on a mapping have to take into account that multiple external-id values for a Wikidata item, or one external-id linked to multiple Wikidata items are possible and may perfectly make sense with in-exact mapping relations (e.g., STWs [http://zbw.eu/stw/descriptor/16957-4 Appenzell] is a "broad match" to {{Q|12079}} ''and'' {{Q|12094}}). This is not reflected in the "single value" and "distinct values" constraints. Therefore, we defined a number of [https://www.wikidata.org/wiki/Property_talk:P3911#Reports_for_the_maintenance_of_the_STW_ID_/_Wikidata_mapping QS reports for the STW mapping] to catch anomalies specifically in qualified mappings.
 
===Feedback welcome===
 
We are interested to exchange experiences with others who are mapping KOS to Wikidata, possibly with different workflows or other tools used. The reports and scripts linked above are meant to be customizable, we'd be happy to receive suggestions for improvment or github pull requests. -- [[User:Jneubert|Jneubert]] ([[User talk:Jneubert|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 15:09, 29 August 2018 (UTC)
 
: {{ping project|KOS}}
 
==Item creation from a thesaurus concept via Quickstatements==
 
During the above mentioned mapping process from {{Q|26903352}}, we sometimes want to create new items in Wikidata. With the "New item" button in Mix-n-match, only rudimentary information can be transferred to the new item. Therefore we generated a [http://134.245.93.73/beta/tmp/stw_qs_create.html list] of all not-yet-mapped STW descriptors, formatted for Quickstatements input. It includes labels and aliases (skos:preLabel/skos:altLabel) in all available languages, as well an instance-of {{Q|29028649}} statement, sourced from the STW descriptor, optionally a link to the according GND concept, derived from the STW/GND mapping, rarely a description (skos:scopeNote), but always of course a {{P|3911}} link.
 
The workflow is - during working through a mix-n-match list - for any missing concept simply to copy & paste the complete set of QS statements into the QS input window, removing aliases which are not appropriate for Wikidata (such as "oil platform" for "offshore industry"), and running the statements. The list is sorted exactly like the mix-n-match list and recreated every hour, so the same items are on top of both lists, and every case solved by either linking or creating an item disappears automatically from the list. In our experiences, this works quite smoothly.
 
If others want to adapt such a workflow for other vocabularies - thanks to SKOS and LOD standards that shouldn't be too difficult -, here is the [https://github.com/zbw/sparql-queries/blob/master/bin/missing_wd_item_from_stw.pl script] for generating the list, and the [https://github.com/zbw/sparql-queries/blob/master/stw/wikidata_item_candidate.rq query] called with it. -- Feedback, as always, welcome. [[User:Jneubert|Jneubert]] ([[User talk:Jneubert|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 12:11, 5 September 2018 (UTC)
 
== History ==
Please add here references, blogs etc on the topic.
 
* 2020: ...
https://twitter.com/hashtag/coreferencing: tweet using tag '''#coreferencing'''. Tweets on involving Getty, British Museum thesauri, some fancy shots...
 
== Useful resources ==
* 201809: [https://www.slideshare.net/jneubert/linking-knowledge-organization-systems-via-wikidata Linking Knowledge Organization Systems via Wikidata] (DCMI conference 2018)
* {{Q|48975668}}, [https://github.com/VladimirAlexiev/CH-names ''Name Data Sources for Semantic Enrichment''] (2015)
* 201711: [http://zbw.eu/labs/en/blog/wikidata-as-authority-linking-hub-connecting-repec-and-gnd-researcher-identifiers Wikidata as authority linking hub: Connecting RePEc and GND researcher identifiers] by [[User:Jneubert|Jneubert]]
* [https://zbw.eu/labs/en/blog/how-to-matching-multilingual-thesaurus-concepts-with-openrefine How-to: Matching multilingual thesaurus concepts with OpenRefine], by [[User:Jneubert|Jneubert]]
* 201611: [https://figshare.com/articles/Wikidata_and_Persistent_Identifiers/4235009 Wikidata and Persistent Identifiers] Presentation by Arthur Smith at [https://www.rd-alliance.org/pidapalooza-9-10-november-2016-reykjavik-iceland PIDapalooza 2016]
* [[Wikidata:WikiProject Authority control/Error reporting procedures]]: the page contains a list of Wikidata users with edit access to external databases
* [https://twitter.com/valexiev1/status/581474362454339585 20150327]: Starting in Apr 2015, VIAF will transition from English Wikipedia coreferencing to Wikidata coreferencing. As a result it will pick up a lot more multilingual labels, 700k persons and 300k organizations that don't occur in English Wikipedia. In [http://vladimiralexiev.github.io/CH-names/README.html#sec-3-2 Name Data Sources for Semantic Enrichment] I argued that VIAF and Wikidata have few names in common: I am glad that this development will quickly bridge the gap. http://outgoing.typepad.com/outgoing/2015/03/moving-to-wikidata.html
* [https://twitter.com/valexiev1/status/580807304028823553 20150325]: [https://nl.wikimedia.org/wiki/GLAM-WIKI_2015/Proposals/Wikidata,_a_target_for_Europeana%E2%80%99s_semantic_strategy%3F Our presentation proposal] with Europeana accepted for [https://nl.wikimedia.org/wiki/GLAM-WIKI Glam-Wiki 2015]
* [https://twitter.com/valexiev1/status/580313495956250625 20150324]: WikiProject Authority control (#wikidata #coreferencing) to be highlighted by [[User:Multichill]] at [https://nl.wikimedia.org/wiki/GLAM-WIKI_2015/Programme/Introductions/Wikidata GlamWiki 2015]
* [https://twitter.com/DavidHaskiya/status/571295264922869760 20150227]: Wikidata as linked data authority for Europeana: Presentation proposal [https://nl.wikimedia.org/wiki/GLAM-WIKI_2015/Proposals/Wikidata,_a_target_for_Europeana%E2%80%99s_semantic_strategy%3F "Wikidata, a target for Europeana's semantic strategy?"] to GLAM-WIKI 2015
* 20150207: ODI Culture Challenge proposal [http://collabfinder.com/project/1173/glam-wiki-on-steroids GLAM-WIKI on Steroids]: not well written, wasn’t sucessful
* 201502: project announced: [[Wikidata:Project_chat/Archive/2015/02#Wikidata weekly summary #143]]
* 201501: project proposed: [[Wikidata:Project_chat/Archive/2015/01#WikiProject Authority Control?]]
* 201307 [https://wikimania2013.wikimedia.org/wiki/Submissions/Authority_Addicts:_The_New_Frontier_of_Authority_Control_on_Wikidata Authority Addicts: The New Frontier of Authority Control on Wikidata] Wikimania 2013
* [[:meta:Grants:IdeaLab/Countering systemic bias through Wikidata authority control]] (ideas by [[User:Superm401]])
 
== {{capitalize|{{int participants}}}} ==
{{Ping project|Authority control}} Please become members of this project!
 
{{participants}}