[go: up one dir, main page]

Page MenuHomePhabricator

Support other sources for usage examples beyond Wikisource
Open, MediumPublicFeature

Description

For some languages, Wikisource isn't the best corpus to draw usage examples from. It would be great to be able to configure additional sources by language.

One obvious other source that should be made easy to specify is Wikipedia in that language. Languages such as Dagbani have no Wikisource, but do have a Wikipedia that could be used to find usage example.

Other languages may want non-Wikimedia sources. For example, for Hebrew it would make sense to draw usage examples from Project Ben-Yehuda. It would be good to be able to specify that, perhaps with a search URL and a pattern for creating the reference on Wikidata.

Event Timeline

Ijon triaged this task as Medium priority.Nov 26 2023, 5:27 PM

+1, for Ukrainian Wikipedia should definitely be supported. While there is a Wikisource in Ukrainian (and a very active one too!), the kinds of texts it contains are often badly suitable for what Luthor is trying to achieve. For example many texts are entries from public domain encyclopaedias or dictionaries, or they are literature that uses orthographies different from the current one. Neither of those factors disqualifies sentences from being used as examples strictly speaking, but it makes them suboptimal for that. On the other hand I am sure that Ukrainian Wikipedia would contain many more sentences that are a better fit.

Copyright has to be considered, as while Wikisource mostly contains public domain works, that are totally compatible with CC0 terms of Wikidata, Wikipedia is licensed under CC BY-SA 4.0 at the moment, that said it is likely not a factor for single sentences, unless someone abuses the tool to add the whole paragraphs as an example (which I am not sure even is possible technically on Wikidata side of things). If someone does abuse that, one should remember though that they can also do that without using Luthor so it is further not a factor, but I think it should have been mentioned explicitly.

Interestingly even Cambridge's website seems to be using Wikipedia for example sentences, see for example https://dictionary.cambridge.org/dictionary/english/withal