In T272109: Assess prevalence of Wikidata infoboxes, we counted up the prevalence of Wikidata infoboxes on articles that have image matches. We did that because we think that those articles will need to be excluded from an initial iteration of an "add an image" task. We want to learn a bit more by counting up the prevalence of articles that have any infoboxes. This is because when we place an image in an unillustrated article with an infobox, we want to place it in the infobox. But different infoboxes in different languages are inconsistent in terms of how they label their "image" and "image caption" slots. For our first iteration, rather than figure out how to detect the image slot in each infobox, we think we want to only suggest illustrations for articles that have no infobox at all.
Therefore, we want to calculate these numbers:
- Total number of articles in the wiki
- Unillustrated articles in the wiki
- Articles with match from any source (polished): this means the count of unillustrated articles that have a match from any of the three sources, after the "polishing" steps to remove local images, etc.
- Have no infobox: of the unillustrated articles with a match, the count of how many of them have no infobox in them.
Here is a table with a sample row showing the output that we want:
wiki | Total number of articles | Unillustrated articles | Articles with match from any source (polished) | Have no infobox |
frwiki | 2,000,000 | 1,000,000 | 150,000 | 100,000 |
The list of wikis for which we want these numbers is:
- enwiki
- arwiki
- kowiki
- cswiki
- viwiki
- frwiki
- fawiki
- ptwiki
- ruwiki
- trwiki
- plwiki
- hewiki
- svwiki
- ukwiki
- huwiki
- hywiki
- srwiki
- euwiki
- arzwiki
- cebwiki
- dewiki
- bnwiki