Which types of article should not be recommended as links by the link-recommendation algorithm?
For example, we only recommend links to articles that are in the main namespace of the same Wikipedia and that are not redirect-pages. In different places we made specifications as to what types of articles we should not link to, such as
- disambiguation pages T261408#6840709
- dates (e.g. years) T253279
- dates and centuries T278864#6970551
A substantial fraction of these can be filtered by looking up statements of the corresponding Wikidata-item, most notably the value for the instance-of property. For example, we can identify the article Statistics (disambiguation) as a disambiguation page (without parsing the title) from the fact that the corresponding Wikidata-item (Q2333935) specifies that it is an "instance of" the Wikidata-item "Wikimedia disambiguation page" (Q4167410).
Currently, we filter an article from the set of candidate-links if it is an instance-of the following items:
- Wikimedia disambiguation page (Q4167410)
- Wikimedia list article (Q13406463)
- Year (Q577)
- Calendar year (Q3186692)
Other candidates are:
- century (Q578)
- calendar date (Q205892)
- point in time with respect to recurrent timeframe (Q14795564)
- ...
Aim:
- Identify other types of articles we do not want to filter based on linking conventions or feedback
- Identify the corresponding Wikidata item
- Add the list of Wikidata items to the filter in the repository and retrain the model (code)