[go: up one dir, main page]

Page MenuHomePhabricator

Make creating a new Language project easier
Open, Needs TriagePublic

Description

Meta-comments:

  • This task is somewhat similar to T158730, but from the end-user perspective. The end-user here is a person, or a group of people, who want to create a project in a new language, most often a Wikipedia.
  • If I subscribed you to this task, I believe it will interest you. If it doesn't please feel free to unsubscribe yourself, and accept my apologies.

The current process for creating a wiki in a new language is fully documented at https://meta.wikimedia.org/wiki/Language_proposal_policy , but I'll write something brief and practical here:

  • Make sure you have a standard ISO 639 language code. Don't proceed if you don't.
  • Add a language to translatewiki.net (technically, to UniversalLanguageSelector's langdb and translatewiki.net's LanguageSettings.php)
    • Translate the most-used MediaWiki messages. (About 500.)
  • Add a language to Incubator, by creating a main page at Wp/languagecode (and replace "Wp" with another project code if it's not Wikipedia).
    • Get a lot of people to write a lot of articles. The current threshold for approval is not precisely defined, but a rule of thumb is ~5 people working for three months, and several hundreds of articles.
  • Get the Language committee to approve the project, if the above things were done.
    • The Language committee assesses the fulfillment of the above points, and asks for approval from a third-party expert who knows the language.
  • If all of the above is done, create the project. Task T158730 and the page https://wikitech.wikimedia.org/wiki/Add_a_wiki describe this long and mostly-manual technical process.

This process could be better.

  • Adding a new language to translatewiki.net usually works well, although there were several complaints of languages that took months to get added. Usually it should take just a couple of days if the ISO code is valid. Perhaps something could be improved in this process.
  • Getting to the import threshold is a bit harder, however:
    • The Most-used messages list is close to to the import threshold of ~490 core messages, but doesn't correspond to it directly, because some of the most-used messages come from extensions. This may cause a situation in which a project has all the most-used messages translated and fulfills all the other Language Committee requirements, but doesn't actually have the message imported from translatewiki.net to the core MediaWiki code repository, so the project in this language will not have proper localization unless somebody verifies that the language was added to Names.php and the import worked. I usually do it for projects that are about to be created, but there's no proper procedure here, and this could be more automatic. See also: T234753: Update list of most often used messages for MediaWiki core at Wikimedia: 2019.
  • The Incubator is hard to use. It should be possible to start a new independent test wiki for each new language instead of putting them all in one site with cumbersome prefixes. For a much more detailed explanation of this, see: T228745: Allow creating an independent "incubator wiki" instead of hosting all new wikis in one Incubator wiki with prefixes

This proposal doesn't include any suggestions to change to the eligibility criteria that appear in the Language proposal policy. It is only about making the creation process smoother.

The biggest known technical hurdle to implementing this is, again, T158730, but it's certainly not the only one.

Thanks for reading so far. This is a big idea. It will take a long time. This is just the initial exploration. Everybody's thoughts are welcome.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
DeclinedNone
DeclinedPRODUCTION ERRORNone
Resolved Gilles
OpenNone
ResolvedLadsgroup
StalledNone
StalledNone
ResolvedJoe
ResolvedKrinkle
ResolvedKrinkle
ResolvedKrinkle
ResolvedLadsgroup
OpenNone
OpenFeatureNone
OpenNone
ResolvedNemo_bis
OpenNone
OpenNone
OpenNone
Opensrishakatux
Opentstarling
Resolvedtstarling
Opensrishakatux
ResolvedKCVelaga_WMF
ResolvedCMyrick-WMF
ResolvedKCVelaga_WMF
ResolvedLadsgroup
Resolvedfnegri
ResolvedLadsgroup
Resolvedfnegri
ResolvedAmire80
ResolvedLadsgroup
Resolvedsrishakatux
ResolvedLadsgroup
DuplicateNone
Resolvedfnegri
ResolvedLadsgroup
Resolvedfnegri
Invalidsrishakatux
In ProgressKCVelaga_WMF
In ProgressKCVelaga_WMF

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I have a suggestion related to Incubator: stop limiting the language codes to 3 characters. That way, IETF language code support would be vastly improved.

stop limiting the language codes to 3 characters

The following languages with more than 3 characters already exist in production:

~/dns/templates/helpers$ cut -d\' -f2 langs.tmpl | grep -E '^[a-z-]{4,}'
bat-smg
be-tarask
be-x-old
cbk-zam
fiu-vro
map-bms
minnan
nds-nl
roa-rup
roa-tara
simple
zh-cfr
zh-classical
zh-min-nan
zh-yue

My suggestion is not just for Wikimedia wikis. This is a general need for deployement of various wikis which would like to be more flexible in what is shared and what is not, and without necessarily needing a specific domain for each wiki sharing common namespaces (notable "User:" and "User talk:", as well as user preferences for a single registration, possibly even other namespaces like "Template:", "Template talk:", "Module:", "Module talk:", "File:", "File talk:", "Category:", "Category talk:", "Help:", "Help talk:"; with only "Project:", "Project talk:", being specific, and hosted under their own "interwiki" code).

Namespaces are the basic component to do that, and each namespace can have its own URL rewrite rules and resolution, depending from which namespace it is used.

So this is a desirable goal for Mediawiki itself. And would also address the question of test/incubator wikis in Wikimedia, or yearly conference wikis, or specific maintenance.

Basically each single wiki instance just needs only 2 namespaces, all other ones (including special namespaces) being sharable on a main instance. And instances do not necessarily need their own database instance (sharing the database instrance also allows sharing the SQL admins and privileges for "special" pages, given they are also unified using the same "user (talk):" namespace).

Note: we also need flexibility for how to map translations (also for each namespace froim which they are looked up): in namespaces, or pagename prefixes, or in "/suffixed" subpages. This would require improving the setup of the "Translate" tool.

stop limiting the language codes to 3 characters

The following languages with more than 3 characters already exist in production:

~/dns/templates/helpers$ cut -d\' -f2 langs.tmpl | grep -E '^[a-z-]{4,}'
(...)

bat-smg -> aliased to "sgs"
be-tarask -> conforming to BCP 47
be-x-old -> aliased to "be-tarask"
cbk-zam -> should be aliased to ???
fiu-vro -> aliased to "vro"
map-bms -> aliased to "bms"
minnan -> aliased to "nan"
nds-nl -> conforming to BCP 47
roa-rup -> aliased to "rup"
roa-tara-> should be aliased to "it-x-tara"
simple -> should be aliased to "en-x-simple"
zh-cfr -> aliased to "nan"
zh-classical -> aliased to "lzh"
zh-min-nan -> aliased to "nan"
zh-yue -> aliased to "yue"
nrm -> should be first aliased to "nrf", then the "nrm" alias deleted after (mostly) complete migration (and cleanup of Wikidata)

@Verdy_p:

cbk-zam -> should be aliased to ???

should be renamed back to cbk, see T124657

map-bms -> aliased to "bms"

Huh? Banyumasan = Bilma Kanuri?

simple -> should be aliased to "en-x-simple"

Just en-simple, no need to use "-x-" here.

nrm -> should be first aliased to "nrf", then the "nrm" alias deleted after (mostly) complete migration (and cleanup of Wikidata)

I'm afraid that this is currently contesting at this RFL page.

@Dzahn in addition to your list, there are also those codes that match your criteria existing in our Names.php:

'ady-cyrl' => 'адыгабзэ', # Adyghe
'aeb-arab' => 'تونسي', # Tunisian Arabic (Arabic Script)
'aeb-latn' => 'Tûnsî', # Tunisian Arabic (Latin Script)
'bbc-latn' => 'Batak Toba', # Batak Toba
'crh-latn' => "qırımtatarca (Latin)\u{200E}", # Crimean Tatar (Latin)
'crh-cyrl' => "къырымтатарджа (Кирилл)\u{200E}", # Crimean Tatar (Cyrillic)
'de-at' => 'Österreichisches Deutsch', # Austrian German
'de-ch' => 'Schweizer Hochdeutsch', # Swiss Standard German
'de-formal' => "Deutsch (Sie-Form)\u{200E}", # German - formal address ("Sie")
'en-ca' => 'Canadian English', # Canadian English
'en-gb' => 'British English', # British English
'es-419' => 'español de América Latina', # Spanish for the Latin America and Caribbean region
'es-formal' => "español (formal)\u{200E}", # Spanish formal address
'gan-hans' => "赣语(简体)\u{200E}", # Gan (Simplified Han)
'gan-hant' => "贛語(繁體)\u{200E}", # Gan (Traditional Han)
'gom-deva' => 'गोंयची कोंकणी', # Goan Konkani (Devanagari script)
'gom-latn' => 'Gõychi Konknni', # Goan Konkani (Latin script)
'hif-latn' => 'Fiji Hindi', # Fiji Hindi (latin)
'hu-formal' => "magyar (formal)\u{200E}", # Hungarian formal address
'ike-cans' => 'ᐃᓄᒃᑎᑐᑦ', # Inuktitut, Eastern Canadian (Unified Canadian Aboriginal Syllabics)
'ike-latn' => 'inuktitut', # Inuktitut, Eastern Canadian (Latin script)
'kbd-cyrl' => 'Адыгэбзэ', # Kabardian (Cyrillic)
'kk-arab' => "قازاقشا (تٴوتە)\u{200F}", # Kazakh Arabic
'kk-cyrl' => "қазақша (кирил)\u{200E}", # Kazakh Cyrillic
'kk-latn' => "qazaqşa (latın)\u{200E}", # Kazakh Latin
'kk-cn' => "قازاقشا (جۇنگو)\u{200F}", # Kazakh (China)
'kk-kz' => "қазақша (Қазақстан)\u{200E}", # Kazakh (Kazakhstan)
'kk-tr' => "qazaqşa (Türkïya)\u{200E}", # Kazakh (Turkey)
'ko-kp' => '조선말', # Korean (DPRK), T190324
'ks-arab' => 'کٲشُر', # Kashmiri (Perso-Arabic script)
'ks-deva' => 'कॉशुर', # Kashmiri (Devanagari script)
'ku-latn' => "kurdî (latînî)\u{200E}", # Northern Kurdish (Latin script)
'ku-arab' => "كوردي (عەرەبی)\u{200F}", # Northern Kurdish (Arabic script) (falls back to ckb)
'nl-informal' => "Nederlands (informeel)\u{200E}", # Dutch (informal address ("je"))
'pt-br' => 'português do Brasil', # Brazilian Portuguese
'ruq-cyrl' => 'Влахесте', # Megleno-Romanian (Cyrillic script)
# 'ruq-grek' => 'Βλαεστε', # Megleno-Romanian (Greek script)
'ruq-latn' => 'Vlăheşte', # Megleno-Romanian (Latin script)
'shi-tfng' => 'ⵜⴰⵛⵍⵃⵉⵜ', # Tachelhit (Tifinagh script)
'shi-latn' => 'Tašlḥiyt', # Tachelhit (Latin script)
'shy-latn' => 'tachawit', # Shawiya (Latin script) - T194047
'skr-arab' => 'سرائیکی', # Saraiki (Arabic script)
'sr-ec' => "српски (ћирилица)\u{200E}", # Serbian Cyrillic ekavian
'sr-el' => "srpski (latinica)\u{200E}", # Serbian Latin ekavian
'tg-cyrl' => 'тоҷикӣ', # Tajiki (Cyrllic script) (default)
'tg-latn' => 'tojikī', # Tajiki (Latin script)
'tt-cyrl' => 'татарча', # Tatar (Cyrillic script) (default)
'tt-latn' => 'tatarça', # Tatar (Latin script)
'ug-arab' => 'ئۇيغۇرچە', # Uyghur (Arabic script) (default)
'ug-latn' => 'Uyghurche', # Uyghur (Latin script)
'uz-cyrl' => 'ўзбекча', # Uzbek Cyrillic
'uz-latn' => 'oʻzbekcha', # Uzbek Latin (default)
'zh-cn' => "中文(中国大陆)\u{200E}", # Chinese (PRC)
'zh-hans' => "中文(简体)\u{200E}", # Mandarin Chinese (Simplified Chinese script) (cmn-hans)
'zh-hant' => "中文(繁體)\u{200E}", # Mandarin Chinese (Traditional Chinese script) (cmn-hant)
'zh-hk' => "中文(香港)\u{200E}", # Chinese (Hong Kong)
'zh-mo' => "中文(澳門)\u{200E}", # Chinese (Macau)
'zh-my' => "中文(马来西亚)\u{200E}", # Chinese (Malaysia)
'zh-sg' => "中文(新加坡)\u{200E}", # Chinese (Singapore)
'zh-tw' => "中文(台灣)\u{200E}", # Chinese (Taiwan)

Frankly, do we really need things like "-formal" and "-informal"? They can't be recognized by browser as no browser think that both are country codes.

Anyway, the

'eml' => 'emiliàn e rumagnòl', # Emiliano-Romagnolo / Sammarinese

Should also be contested because of T36217

@Verdy_p:

cbk-zam -> should be aliased to ???

should be renamed back to cbk, see T124657

map-bms -> aliased to "bms"

Huh? Banyumasan = Bilma Kanuri?

simple -> should be aliased to "en-x-simple"

Just en-simple, no need to use "-x-" here.

Has the "simple" variant been registered in the IANA database for BCP 47 ? If not, we need the "-x-" because it is a private extension in Wikimedia.

nrm -> should be first aliased to "nrf", then the "nrm" alias deleted after (mostly) complete migration (and cleanup of Wikidata)

I'm afraid that this is currently contesting at this RFL page.

NO This is the the same reason why we need to migrate existing "nrm" data to "nrf", so that Narom can be finally assigned the code (that's why the aliasing redirect can only be temporary to do the migration). There's no contestation here the request for new language for Narom is valid and pending since too long (well it can still be allocated in Incubatoro for Narom, given that there's no longer any current Norman data in Incubator, except some read-only archives that can be renamed to "nrf" too if one still needs them).

But most migration to do will be in Wikipedia and Wiktionnary. I think we can leave aside the migration of user talk pages (users will do this cleanup themselves even if their past links are now broken by going to a Narom page or nowhere instead of the past Norman page). In Wikidata, this migration can be easily automated by bots.

Frankly, do we really need things like "-formal" and "-informal"? They can't be recognized by browser as no browser think that both are country codes.

And they have absolutely no reason to recognize them as "country codes" (actually they are "region subtags" and not restricted to just "country codes", en they include also territories/dependency codes from ISO3166-1, and continental area codes from UN M.49, but exclude some codes from ISO 3166-1; note that ISO 3166-2 codes are not used at all as region subtags in BCP 47, and that the use of "region codes" is a legacy, deprecated in favor of ISO 639-3 codes for more specific languages already encoded as members of a registered macrolanguage, itself being registered in the IANA database).

Lots of legacy codes have been kept valid in BCP 47 but the extension mechanism has been simplified and formalized so that fallback resolvers will work as intended (fallback mechanisms are not part of ISO 639, only specified in BCP 47 which maintains the compatibility that the unstable ISO 639 never preserves, meaning that ISO 639 is always unreliable and should never be used as a "normative reference" for our use but only "informative" to exhibit how BCP 47 takes some of its sources; but the IANA database is still the only approved normative source of these codes, everything else is private annd should use private extension subtags).

The syntax recognized by BCP 47 parsers with "formal" and "informal", is the one for "registered language variants subtags", they would be valid and accepted by browsers if they were registered in the IANA database.

(This discussion of language codes is not really related to the topic of this task. There are better places to have it. Thank for understanding.)

I have a suggestion related to Incubator: stop limiting the language codes to 3 characters. That way, IETF language code support would be vastly improved.

In general, we are not encouraging that going forward. And you know that, as far as it goes.

The initial discussion in the Incubator wiki mostly supports this idea:
https://incubator.wikimedia.org/w/index.php?title=Incubator:Community_Portal&oldid=4370387#A_proposal_for_a_big_reform_of_the_Incubator

Remaining details to discuss:

  • How vandalism monitoring will work. There is some support for @Urbanecm's initial proposal (a few comments above), but it may need some more details before actually going forward.
  • Detailed steps will have to be discussed: which wikis will be moved out of the current incubator into the new wikis, what to do with the less active Incubator projects, etc.
  • The decision whether to close the Incubator wiki or leave it open is not part of this proposal. For now it stays open.

(Have I forgotten any important points?)

@Amire80, I'd venture that for now, since closing Incubator is not part of the proposal, we also don't need to discuss (yet) what to do with the less active Incubator projects.
In my personal view, there is clear consensus to begin the work to make this happen. Still, while the Incubator community agrees in principle that moving all active test projects into a space like the one we're proposing is a good idea, there is still enough concern about not creating a "Wild West" of new subdomains that we should be focusing on (1) putting the infrastructure in place and (2) deciding on which projects should be moved to test this infrastructure. For the moment, we're still putting new projects in Incubator.
Also, to clarify a couple of other things:

  • At the moment, I don't think we're discussing closing Beta Wikiversity (even in the longer run) and having all new Wikiversity tests start in one of these spaces. Tests nearing approval could be eligible, though. (At the moment, the only one remotely active enough is hewikiversity.)
  • I think a well-developed Wikisource looking to move to a subdomain can also be eligible. But the default for new Wikisources is going to remain Mulitilingual (Old) Wikisource, for a variety of very good reasons.

Currently @Wolverène oppose this idea, following reasons below:

  1. It doesn't solve the problem of active and reasonably proposed projects in extinct languages.
  2. It reminds me the story of the Wp/vot, when the "native speaker" wasn't actually one.
  3. How will it make the work easier? A separate URL may be also hardly understandable for some newcomers. If they need to learn the wiki markup in an actual Wikipedia, show them unfortunately forgotten test.wikipedia.org . And it's surprisingly but it's easier to delete a problematic project within the framework of the current Incubator, using bot and without managing domains (if I understand it right).
  4. Never seen anyone who really embarrassed about prefixes or categorization. I'm not excluding the fact that such people exist but that kind of people probably has difficulties with the wiki markup in general.
  5. I'm feeling like a stupid one because I don't understand what should be actually changed comparing to the current situation. I'm wondering how this reform will help the Langcom make decisions about project openings any faster.

I have no idea how to answer their concerns.

Currently @Wolverène oppose this idea, following reasons below:

  1. It doesn't solve the problem of active and reasonably proposed projects in extinct languages.
  2. It reminds me the story of the Wp/vot, when the "native speaker" wasn't actually one.
  3. How will it make the work easier? A separate URL may be also hardly understandable for some newcomers. If they need to learn the wiki markup in an actual Wikipedia, show them unfortunately forgotten test.wikipedia.org . And it's surprisingly but it's easier to delete a problematic project within the framework of the current Incubator, using bot and without managing domains (if I understand it right).
  4. Never seen anyone who really embarrassed about prefixes or categorization. I'm not excluding the fact that such people exist but that kind of people probably has difficulties with the wiki markup in general.
  5. I'm feeling like a stupid one because I don't understand what should be actually changed comparing to the current situation. I'm wondering how this reform will help the Langcom make decisions about project openings any faster.

I have no idea how to answer their concerns.

Thanks for forwarding this. Where was this posted?

Liuxinyu posted it, not me. But Wolverène posted it at the end of the discussion in Incubator (before my closing section).

Sent from Outlookhttp://aka.ms/weboutlook


From: Amire80 <no-reply@phabricator.wikimedia.org>
Sent: Monday, August 27, 2018 4:56 AM
To: koala19890@hotmail.com
Subject: [Maniphest] [Commented On] T165585: Make creating a new Language project easier

Amire80 added a comment.

In T165585#4533109https://phabricator.wikimedia.org/T165585#4533109, @Liuxinyu970226https://phabricator.wikimedia.org/p/Liuxinyu970226/ wrote:

Currently @Wolverène oppose this idea, following reasons below:

  1. It doesn't solve the problem of active and reasonably proposed projects in extinct languages.
  2. It reminds me the story of the Wp/vot, when the "native speaker" wasn't actually one.
  3. How will it make the work easier? A separate URL may be also hardly understandable for some newcomers. If they need to learn the wiki markup in an actual Wikipedia, show them unfortunately forgotten test.wikipedia.org . And it's surprisingly but it's easier to delete a problematic project within the framework of the current Incubator, using bot and without managing domains (if I understand it right).
  4. Never seen anyone who really embarrassed about prefixes or categorization. I'm not excluding the fact that such people exist but that kind of people probably has difficulties with the wiki markup in general.
  5. I'm feeling like a stupid one because I don't understand what should be actually changed comparing to the current situation. I'm wondering how this reform will help the Langcom make decisions about project openings any faster.

I have no idea how to answer their concerns.

Thanks for forwarding this. Where was this posted?

TASK DETAIL
https://phabricator.wikimedia.org/T165585

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Amire80
Cc: Ebe123, KuboF, KATMAKROFAN, alanajjar, Sahaquiel9102, Ooswesthoesbes, Barrioflores, Ninjastrikers, Baba_Tabita, StevenJ81, jhsoby, Pgallert, Yair_rand, KartikMistry, Kvardek_du, Urbanecm, TheDJ, HalanTul, PokestarFan, Liuxinyu970226, Hydriz, Eloquence, Verdy_p, Raymond, Nikerabbit, brion, Ijon, Nemo_bis, tstarling, SPQRobin, MF-Warburg, millosh, Amqui, Amire80, Aklapper, Jayprakash12345, Liudvikas, Srdjan_m, MuhammadShuaib, LNDDYL, Psychoslave, Luke081515, Gryllida, Shizhao, zeljkofilipin, Arrbee, Jay8g, greg

Liuxinyu posted it, not me. But Wolverène posted it at the end of the discussion in Incubator (before my closing section).

It's a bit weird, I cannot find it.

But nevermind - I'll just reply here.

  1. It doesn't solve the problem of active and reasonably proposed projects in extinct languages.

The question of extinct language is explicitly not a part of this reform discussion, as I had already written in one of my emails on the Langcom mailing list: https://lists.wikimedia.org/pipermail/langcom/2018-July/002162.html

It's not a problem that this proposal is trying to resolve. It's a matter for policy discussion for Language committee. This reform is more on the technical side of how Incubator wikis are managed.

  1. It reminds me the story of the Wp/vot, when the "native speaker" wasn't actually one.

This is also a matter of Langcom policy. If there are no native speakers, the incubator wiki is not supposed to be created. If it's created anyway, and no native speakers come along in reasonable time, this wiki should be closed in a fast-track process, as proposed from the beginning.

  1. How will it make the work easier? A separate URL may be also hardly understandable for some newcomers.

There's nothing to understand there. A URL is transparent to the editor.

If they need to learn the wiki markup in an actual Wikipedia, show them unfortunately forgotten test.wikipedia.org .

Wiki markup is not an issue. Prefixes, however, are an issue. See below.

And it's surprisingly but it's easier to delete a problematic project within the framework of the current Incubator, using bot and without managing domains (if I understand it right).

Deleting a whole wiki should be easier once the decision is made to delete it. The proposal suggests from the start to make it easy to delete. @Urbanecm, how difficult is it to delete a wiki? This was done with mo.wikipedia.org recently.

  1. Never seen anyone who really embarrassed about prefixes or categorization. I'm not excluding the fact that such people exist but that kind of people probably has difficulties with the wiki markup in general.

I worked with a lot of people writing in the Incubator: in Adyghe (ady), Dinka (din), Fon (fon), and in several other languages. It's one of the biggest hurdles for people, and it's completely artificial.

You can also see this discussed at this Wikimania presentation:

https://commons.wikimedia.org/wiki/File:Wikipedia_for_Indigenous_Communities.webm (especially after 27 minutes or so)

  1. I'm feeling like a stupid one because I don't understand what should be actually changed comparing to the current situation.

Here are the changes that will benefit the people who read and write in the Incubator:

  1. Many wikis instead of one.
  2. No need to use prefixes.
  3. The possibility to use Wikidata.
  4. The possibility to use Content Translation.
  5. The possibility to search conveniently only in your language.

I'm wondering how this reform will help the Langcom make decisions about project openings any faster.

This is not the intention. The intention is to make it easier to read and write in Incubator projects.

  1. If a new site is to be created for each incubator site, how will WMF turn them into a full site once they become eligible? Last time I heard about it, such redesignation seems to be very cumbersome and that's also why wp/yue and wp/nan still haven't be moved to the desired domain name after almost a decade from their initial proposal. Will it also take a decade for any new projects to get a full site if the proposal is to be adopted?
  2. Is it going to lengthen the entire wiki creation process, and also requires more bureaucratic processes, as well as requiting more manpower to handle each and every applications? Now it is incubator→Full site, in the proposal it will be incubator→Experimental site→Full site.
  3. Are those goals unachievable by overhauling incubator itself? It seems like Wikia is now going to change the url of their non-English wiki in order to save the SSL certification cost by changing urls in format of zh.community.wikia.com to community.wikia.com/zh, and each of these different language edition sites are still independent. Is that not achievable in Incubator?
  4. Likewise, is it possible to create such new experimental site in a way as easy as creating a new wiki site on wikia?
  1. If a new site is to be created for each incubator site, how will WMF turn them into a full site once they become eligible? Last time I heard about it, such redesignation seems to be very cumbersome and that's also why wp/yue and wp/nan still haven't be moved to the desired domain name after almost a decade from their initial proposal. Will it also take a decade for any new projects to get a full site if the proposal is to be adopted?

It's a very different kind of redesignation.

Moving zh-yue and zh-min-nan to yue and nan are difficult for reasons that are explained here: T172035: Blockers for Wikimedia wiki domain renaming.

These two domains, as well as several others such as nrm, als, simple, etc., were created many years ago, at a time when there was no Language committee, or the committee was less strict about standard language codes. Now it's much more strict, so it's not supposed to happen with new language codes.

Renaming these non-standard domains is a separate issue and it's not related to the Incubator.

Moving content from an incubator domain to a proper domain is more of a technical issue that should be resolved by engineers when this task will actually be executed. Input about this from Ops engineers is already welcome, but at this point I'm still trying to get functional community feedback and not to resolve the technical details.

This is supposed to become easier for everybody: the Incubator site readers, the Incubator site writers, the Language Committee, the engineers who manage domains and wiki installations, the vandalism patrolers, etc. If it makes anything harder for anybody, then it's a no-go, but we are now talking about ideas and not yet about the technical details of the implementation.

  1. Is it going to lengthen the entire wiki creation process, and also requires more bureaucratic processes, as well as requiting more manpower to handle each and every applications? Now it is incubator→Full site, in the proposal it will be incubator→Experimental site→Full site.

No, this is not the intention at all. Eventually new languages are supposed to appear immediately in their own incubator wiki site without going through incubator.wikimedia.org. So the whole thing with prefixes on page titles and importing will be gone, and this means less bureaucracy, not more.

  1. Are those goals unachievable by overhauling incubator itself? It seems like Wikia is now going to change the url of their non-English wiki in order to save the SSL certification cost by changing urls in format of zh.community.wikia.com to community.wikia.com/zh, and each of these different language edition sites are still independent. Is that not achievable in Incubator?

Wikia uses completely different software and it has a lot of paid engineers working on this, so it's not really relevant.

  1. Likewise, is it possible to create such new experimental site in a way as easy as creating a new wiki site on wikia?

Yes, that's kind of the idea: the easiness of creating a new wiki site will be similar. No waiting for Ops people, no running scripts, no configuring databases by hand - all automatic (that's what @Urbanecm is talking about when he says "we must have nice, smart and robust addWiki.php"). However, there are several big differences:

WikiaNext-generation Incubator
Any language.Only eligible languages approved by Langcom
Any topic.Only Wikipedia, Wiktionary, Wikiquote, Wikibooks, Wikivoyage, Wikinews, and maybe Wikisource and Wikiversity.
Any number of wikis in every language.One wiki per language
Any web user can create a new wiki.Only people with permission granted by the Langcom can create a new wiki.

Getting the WMF to provide easy creation of a wiki for any topic in any language is a curious and valid idea, but it's completely out of the scope of this task.

I think that we should better not to discuss anythings about renaming domain, because this isn't what Incubator concern (@C933103 are you still thinking that semi-renaming i.e. closing old one->exporting to Incubator->creating new one->importing from Incubator is appropriate?)

@Liuxinyu970226 Not sure why are you asking me this here when you yourself have stated that this seems to be an inappropriate place. Anyway I think it depends on projects and communities of each projects might also have different opinions

But what sort of impact would it have on e.g. user preference or wiki widget?

And also note that now different Cantonese wikiprojects seems to have different url prefixes

@C933103:

Not sure why are you asking me this here when you yourself have stated that this seems to be an inappropriate place. Anyway I think it depends on projects and communities of each projects might also have different opinions

This question is what I originally wanna ask you, what's your reason that you ask back to me this?

But what sort of impact would it have on e.g. user preference or wiki widget?

Specific issues are needed to investigate the efforts of this question...

And also note that now different Cantonese wikiprojects seems to have different url prefixes

Do you even visit Special:SiteMatrix every day, which in theory that special page will have two entry lines of Cantonese? If not then what's your concern about this? I don't think that remote users are affected by this unless they're SiteMatrix fan.

@Amire80 Anything that the extension can do in this task? It seems to me that this task is more of a process issue than a software issue related to the WikimediaIncubator extension.

Nothing immediate, but it's close enough to the topic.

This project is near the end of the ideation phase and it's moving into the implementation planning phase, and some things may have to be done in it. For example, it may be useful to implement functions to analyze activity of projects in the extension, etc.

I have looked through all the comments, but I still don't see any actions related to the WikimediaIncubator extension. This task is about Wikimedia Incubator, which fits into incubator.wikimedia.org, but it is not defining what needs to be done about the extension. Can you perhaps put a list of items that needs to be done by the extension in the task description?

Soon, when we start the technical architecture work. (I'm not totally sure who "we" are, I'm working on it.)

Per last comment, re-add the tag when the technical architecture has actually started.

I have been working with some small languages, mainly from the Americas, that wanted to figure out how a new Wikipedia was created. The main problem here is that lot of languages are in incubator per secula seculorum, because no one knows Incubator. So you can have, with a lot of work (not everyone is connected to a broadband or have time to volunteer) a small community working, but if they don't get results, they get quickly tired. Small languages tend to have a constant number of people volunteering, but they are all the time the same, so if someone doesn't even know they can volunteer in their language... community can't be created if they don't know each other BEFORE they start working on wikipedia.

Making things easier, like creating a new Wiki and then start translating and making content should make things easier. Press could cover that the new wiki is born and attract users... but in Incubator this is impossible.

Thank you for pointing to this proposal. Two points I would like to put across.

1) Language localization threshold is too high
I would like to propose that we halve this requirement. Mainly because we do not even have a process to "proofread" the translations. We are setting up a bar too high here for a small new language.

2) Working on incubator is very hard
Yes this cannot be over stressed, I have been working with the ndebele Wikipedia (nr) on incubator for some time now and keep crating articles that I need to move due to prefixes etc.

I would like to propose that e pilot where users can request a creation of new projects and then give them a year to grow. If they fail then e can close them. I know I can get ndebele working and growing if it was not on incubator.

Soon, when we start the technical architecture work. (I'm not totally sure who "we" are, I'm working on it.)

Any update with this ie. where to contribute :) ?

Any improvement that automates the procedure at https://wikitech.wikimedia.org/wiki/Add_a_wiki is a good step in the direction of implementing this. As the beginning of this task's description says, this task is a version of T158730 that addresses the user-facing aspects of creating new empty wikis, mostly in underresourced "small" languages (they are small only in their presence on the web, but some of them have millions of speakers).

The final step is allowing the creation of Incubators as (almost) independent wikis, and there are some things to figure out and agree upon before this happens, but automating the process is necessary and desirable in any case, so anyone with the relevant knowledge of Ops scripts for wiki creation, or desire to learn them, is welcome to contribute right now.

I develop the Wikidocumentaries project, which navigates Wikimedia content using Wikidata as the linking structure. Each page represents a topic in Wikidata. The Wikipedia article on the page is displayed in the user's language if it exists but it can be read in context in another language. This other article links to Content Translate to be able to translate to the user's language Wikipedia if the article is missing. With the introduction of language codes for Inari and Skolt Saami we now have the opportunity to navigate and display the content in those languages - except for the articles that are in the Incubator.

For our use case being able to display the article from Incubator (using any tricks, if necessary) and using Content Translate to translate to the user's language (possibly to Incubator) are key in serving small languages.

We will have another layer of difficulty later when we start recording topics in the local Wikibase. These are topics that could be rejected by Wikidata or the Wikipedias. This is why it is a micro history wiki, it welcomes nobodies, margins and minorities. For these topics we create articles locally in the underlying MediaWiki/Wikibase and would also like to use translation tools for them – and link between articles in the different locations. (I will support any project that works with intelligent Wikidata-based links/red links in MediaWiki articles).

https://wikidocumentaries-demo.wmflabs.org/Q1089774?language=en
https://wikidocumentaries-demo.wmflabs.org/Q1089774?language=fi
https://wikidocumentaries-demo.wmflabs.org/Q1089774?language=se Northern Saami, own Wikipedia
https://wikidocumentaries-demo.wmflabs.org/Q1089774?language=sms Skolt Saami, Incubator stub exists

(Excuse the currently unfinished language fallbacks.)

I've separated out the discussion about creating a new wikiproject for a language and being able to use that language without a wikiproject at T223664, because I believe that these are unnecessarily intertwined.

In my experience, a lot of small and under-resourced language communities would like to contribute to projects like Wikidata or Commons, but do not have the resources (human or otherwise) to go about translating the UI, or starting up an incubator project. So for instance, tagging and captioning photos, adding in labels, descriptions, native names in Wikidata, etc. And they'd like to start as soon as possible once they make the decision.

Yet the process right now does not allow them to do this and it is highly non-transparent, even for experienced users. I feel that this process should be disconnected from having to create an incubator project or translating the UI, streamlined, and documented in a community-friendly way to allow these communities to do exactly what they want with the resources they have.

So please feel free to join us at T223664 for a discussion from the viewpoint of lesser-resourced language community members as end users!

To make the discussion more focused, I split the biggest chunk of this task to a separate task: T228745: Allow creating an independent "incubator wiki" instead of hosting all new wikis in one Incubator wiki with prefixes

I didn't subscribe all the people from this task to that one. If you want, you can subscribe to it.

Perhaps it could be unified with T223664.