The English names for several languages include a LRM mark as they just replicate the native name (which is also incorrect in the first three cases), or just render the native name (autonym):
- [es-formal] = "español (formal)‎" (no LRM needed even in native Spanish!) – must be: "Spanish (formal)" in English
- [hu-formal] = "magyar (formal)‎" (no LRM needed even in native Hungarian!) – must be: "Hungarian (formal)" in English
- [nl-informal] = "Nederlands (informeel)‎" (no LRM needed even in native Dutch!) – must be: "Dutch (informal)" in English
- [gsw] = "Alemannisch" – should be "Alemannic" in English
- [sty] = "себертатар" – must be "Northern Tatar" in English
- [vo] = "Volapük" – should probably be "Volapuk" in English (without the combining diaeresis)
- [vro] = [fiu-vro]= "Võro" – should probably be "Voro" in English (without the combining tilde)
The following test page also HTML-encode the spaces to makes sure they are not duplicated in the middle (but this is not dramatic and not signaled as an error)
https://commons.wikimedia.org/wiki/Module_talk:Multilingual_description/sort/testcases
As a general rule, the English names of all languages should be plain ASCII only (of course, this does not apply to other translations or native names)...
This is also checked on the same test page where you can see the red cells in the last column) using the following basic regular expression:
/^[A-Z][ '()%-/0-9A-Za-z]*['()%-/0-9A-Za-z]$/
The reason for that is that the English names of languages is used in contexts where only ASCII is expected (spaces, parentheses, hyphens, single quotes, and slashes are still possible; applications are generally aware if these ASCII punctuations or spaces have to be replaced; decimal digits may occur in the name of some variants, like a year for an orthographic reform, but they generally don't cause problems)
Yellow cells on the test page just signal cases where the autonym and the English name are identical (not necessarily an error, but it may indicate a missing translation, either in English or in the native name; some of these cases are OK like "Esperanto", whose autonym is correctly capitalized for that language).
Note the LRM/RLM marks should not be used at all in any language
- For the few languages that display two native names in different scripts (when we don't specify the script variant), the solution is to write the LRM name first then the "/" then the RLM name.
- For correctly formatting lists of languages (showing their autonyms), the solution is to use Bidi isolation ("bdi" element in HTML, or the equivalent "bidi-isolation:isolate" in CSS) for each item in the multilingual list. LRM/RLM are deprecated (they are not isolates, but deprecated overrides). See T252568.
Isolates is the recommandation in the second version of the UBA (published many years ago) that was made to replace and deprecate all overrides (including the "bdo" HTML element, and RLM/LRM controls that was the only solution in the first version of UBA and in HTML4).
In all cases, any trailing RLM or LRM without any known character after it is wrong: their use should be limited to just very specific characters where one wants to change its weak or strong directionality or its mirroring, for a context of use within a text with known language/script (for example to change the strong directionality of Latin letters or digits in an Arabic text).
Such use of Bidi-overrides is very exceptional and only needed inside very specific names (like some brands/trademarks using these characters as if they were normal Arabic letters, or for uncommon notations of numbers when an arabic text wants to present these numbers with a strong RTL direction, instead of their default LTR direction, opposed to the normal direction or reading; note that even Arabic and Persian digits are LTR, as they are written starting from most significant digit to the left and then other digits in backward reading order...).
A specific context allows borrowing Hebrew letters in Latin texts and treat them as if they were a Latin letter with string LTR direction: LRM is then useful before that Hebrew letter only (is is found in some Latin names borrowing an Hebrew Aleph, but not needed for maths where there's a Aleph mathematical symbol which is already LTR)
The other case for using Bidi-overrides is for historic texts when a script was using their current modern direction (e.g. boustrophedon, or old Greek and Coptic written RTL). For such cases, "bdo" is still the best solution to embed a full line, and there's still no real need of RLM/LRM for just a single character except to force its mirroring (e.g. an arrow).
Request for patch of LocalNamesEn.php per comment T256649#7160228 below:
- [es-formal] = https://www.wikidata.org/wiki/Q64427343 - "Spanish (formal)"
- [hu-formal] = https://www.wikidata.org/wiki/Q64427347 - "Hungarian (formal)"
- [nl-informal] = https://www.wikidata.org/wiki/Q64427356 - "Dutch (informal)"
- [sty] = https://www.wikidata.org/wiki/Q4418344 - "Siberian Tatar"