User talk:Erutuon

From Wiktionary, the free dictionary
Archived revision by Elvinrust (talk | contribs) as of 22:53, 4 May 2020.

Latest comment: 4 years ago by Elvinrust in topic Quarrying
Jump to navigation Jump to search

Archives:
200920102011201520162017201820192020

2 things

Hi :)

Could you help with this?

  1. review and commit my change to TranslationAdder.js removing the balancer buttons and reliance on trans-mid. I have used it daily since the change and have not seen any problems after removing it.
  2. Add me to this list for me to able to use JWB.--So9q (talk) 10:24, 9 September 2019 (UTC)Reply
Your change to the gadget looks okay, so I'll copy it to the gadget page.
I'm just an interface admin, so I can't edit Wiktionary:AutoWikiBrowser/CheckPage. You'll have to get the attention of a real admin (sysop). — Eru·tuon 18:07, 9 September 2019 (UTC)Reply

Lua memory usage

Hi, I found this via your common.js: User:Erutuon/scripts/simpleTranslations.js. It contains this: {{[[T:t-simple|t-simple]]}} for Latin-script terms with just lang, term, and gender, to reduce Lua memory usage, using [[User:Erutuon/simpleTranslations.js|JavaScript]]

Is this still relevant? If yes, would it not be a good idea to improve the TranslationAdder.js to insert these for da, no, nb, etc.? WDYT?

I saw that some pages have sub-pages /translations to work around the Lua memory issue. Can massive use of t-simple avoid that?--So9q (talk) 10:40, 9 September 2019 (UTC)Reply

No, the translation adder shouldn't use {{t-simple}}. It's just a workaround on pages that are in CAT:E because they are using too much Lua memory. And {{t-simple}} doesn't always reduce memory enough to remove the error messages; that's why there are translation subpages. — Eru·tuon 17:05, 9 September 2019 (UTC)Reply

Community Insights Survey

RMaung (WMF) 14:34, 9 September 2019 (UTC)Reply

Context deprecation and red message

In {{context}}, I restored the version that does not show the long red message. The point of deprecation as opposed to deletion is to make page histories legible. I did that after I noticed in page histories illegibility that I did not expect to be there, and then found the source of the illegibility.

I understand this was an attempt to prevent people from using the template. There is a better way, preserving history legibility: create an edit filter that is going to prevent people from saving an entry that contains a deprecated template. No one created such a filter yet and I don't know why; I fear I do not have enough user rights to edit these filters.

In any case, we have deprecation under control via Category:Pages using deprecated templates, which now contains 4 pages. I am cleaning up the category once in a while, and I remember similar counts. It is very manageable. With the edit filter, it would be even easier. --Dan Polansky (talk)

Not a bad approach to the problem. So much edit history is virtually unusable because of deprecation. DCDuring (talk) 14:47, 12 September 2019 (UTC)Reply
@Dan Polansky: I'm also generally in favor of keeping histories legible, but got a bit carried away so I added the error message. Since you are keeping an eye on the category, it makes sense to remove it. I do like the idea of an edit filter for frequently used deprecated templates, but I'm not an admin either. — Eru·tuon 00:15, 13 September 2019 (UTC)Reply

Administrator?

You do a lot of valuable work with templates and modules. Would you consider becoming an administrator? — SGconlaw (talk) 11:51, 13 September 2019 (UTC)Reply

Good idea. You would have access to more things. We wouldn't make you do more patrolling. DCDuring (talk) 13:10, 13 September 2019 (UTC)Reply
Not that I would mind if we had more people patrolling... —Μετάknowledgediscuss/deeds 16:51, 13 September 2019 (UTC)Reply
I'm grateful for what he does. I run into vandalism that he's undone all the time. Chuck Entz (talk) 22:23, 13 September 2019 (UTC)Reply
I'm surprised Erutuon is not an admin already! —AryamanA (मुझसे बात करेंयोगदान) 18:50, 14 September 2019 (UTC)Reply
He's been offered the position before: see here. 31.173.87.215 18:54, 14 September 2019 (UTC)Reply
I refused before, but I guess I'd be willing now if there's something I could do with the admin tools. Perhaps protecting vandalized modules and templates and moving pages. — Eru·tuon 19:31, 14 September 2019 (UTC)Reply
Great! Let me see if I can figure out how to nominate you. (Unless someone else wants to jump in and do it first ...) — SGconlaw (talk) 19:51, 14 September 2019 (UTC)Reply
@Sgconlaw: Done. Please endorse the nomination. 31.173.83.164 12:15, 15 September 2019 (UTC)Reply
Oh, thanks, 31.173.83.164! Erutuon, you need to indicate your acceptance on the voting page. — SGconlaw (talk) 14:46, 15 September 2019 (UTC)Reply

Erroneous conversion to t-simple

Hi, I just discovered that these entries have been converted by you to t-simple because of the Lua memory bug but in a way that does not show the information about gender.

* Danish: {{t-simple|da|næse|c|langname=Danish|interwiki=1}}

This is correct:

* Danish: {{t-simple|da|næse|g=c|langname=Danish|interwiki=1}}

--So9q (talk) 11:33, 16 September 2019 (UTC)Reply

Ouch. Good catch. I'm going to have to figure out if it's better to make parameter 3 be gender, or convert these to use |g= and change my script. — Eru·tuon 18:24, 16 September 2019 (UTC)Reply
Census of parameters in {{t-simple}} from the latest dump:
  • |1=: 16129
  • |2=: 16129
  • |3=: 3716
  • |4=: 1
  • |alt=: 141
  • |g=: 323
  • |interwiki=: 6342
  • |lang=: 1
  • |langname=: 15341
  • |lit=: 1
  • |sc=: 66
  • |tr=: 317
Since |3= is so common (because of me no doubt), {{t-simple}} now accepts the gender in either |3= or |g=. I also checked and there was only one instance with both |3= and |g=, which I corrected. — Eru·tuon 20:28, 16 September 2019 (UTC)Reply
Nice! Thank you, again, again :)--So9q (talk) 20:56, 16 September 2019 (UTC)Reply

English at top

Concerning this do you have a link to a policy or vote stating this norm? I found nothing in wt:EL and other style pages I looked at.--So9q (talk) 08:05, 18 September 2019 (UTC)Reply

@So9q: From ELE:
Priority is given to Translingual: this heading includes terms that remain the same in all languages. This includes taxonomic names, symbols for the chemical elements, and abbreviations for international units of measurement; for example Homo sapiens, He (“helium”), and km (“kilometre”). English comes next, because this is the English Wiktionary. After that come other languages in alphabetical order.
Giorgi Eufshi (talk) 10:43, 18 September 2019 (UTC)Reply
OK, that makes sense. --So9q (talk) 11:18, 18 September 2019 (UTC)Reply

Admin

Congratulations! Chuck Entz (talk) 13:01, 30 September 2019 (UTC)Reply

Yeah, you are awesome and admin --Vealhurl (talk) 17:52, 10 October 2019 (UTC)Reply
Indeed, congrats! — SGconlaw (talk) 20:16, 10 October 2019 (UTC)Reply

wikt:majolica n.

Re your reversion, removal of images: The word majolica has been dogged with confusion since it is used for two distinctly different products in different countries in different periods of time. All other dictionaries than Wiktionary define it inaccurately or omit one sense of the word. Hard to believe but true. The two products, the two meanings of majolica, the two majolicas are visibly different. I feel the deleted images assist understanding and warrant an exception to the 'minimal images' rule.
Davidmadelena (talk) 23:10, 15 October 2019 (UTC)Reply

@Davidmadelena: I have no objection to illustrating the two definitions – it's just not clear to me why so many images are needed. Why wouldn't two images, one for each definition, be enough? (This is an honest question – I hadn't heard of majolica before the entry showed up in my possibly incorrect headers cleanup page.) If you could find two images that clearly illustrate the differences in the two techniques, that would be ideal. To allow people to see more images, you can create a page on Wikimedia Commons (see c:Category:Majolica) and link it from the entry using {{commons}}. — Eru·tuon 23:33, 15 October 2019 (UTC)Reply
@Eru: Overnight I had reached the same conclusion: two images to clearly illustrate the difference. Done, and thanks, much better now.Davidmadelena (talk) 10:15, 16 October 2019 (UTC)Reply
I think you mean @Erutuon: :) Eru (talk) 16:36, 16 October 2019 (UTC)Reply
@Davidmadelena, Eru: My confusing signature is to blame... — Eru·tuon 16:39, 16 October 2019 (UTC)Reply

Removing control chars

Some of these should not be removed, but rather replaced with an em dash, e.g. [1]. Equinox 21:25, 18 October 2019 (UTC)Reply

@Equinox: Oh, yeah, that makes sense. I'll go and clean up after myself. — Eru·tuon 21:36, 18 October 2019 (UTC)Reply

Template:t-simple

Regarding diff, I thought the whole point of translation subpages was that they would avoid Lua memory problems without the need for the clumsy {{t-simple}} template. That's why I've been going through them and removing it from them. If you're readding it though, then we're working at cross purposes. —Mahāgaja · talk 09:40, 24 October 2019 (UTC)Reply

@Mahagaja: I switched translations in fire/translations to {{t-simple}} because it was running out of memory. In general I'm in favor of having translation subpages use {{t}}, {{t+}} if they can without running out of memory; if fire/translations can be switched back (maybe I should make a script for this), it should be. — Eru·tuon 16:07, 24 October 2019 (UTC)Reply
Good heavens, you're right: it was running out of memory. That's kind of appalling. But I agree that using {{t-simple}} in that case is unavoidable. —Mahāgaja · talk 19:50, 24 October 2019 (UTC)Reply

Thank you

I'm extremely new to Lua. Having a solid background in JavaScript has helped me transition, but I appreciate the improvements you've offered. I just wanted to tell you that I've been working on a major update to the module script, which I've been editing offline because...Wiktionary's editor isn't as convenient as EditPad for indentation, regular expressions text search and replacement, etc.

Some background information: I know ideally, if I can get more people to help me out with Marshallese maintenance on Wiktionary (and on Wikipedia, where I'm mostly responsible for it there, too), I can't just treat scripts like something I can write and maintain unilaterally. But for now, the script is still very much in flux, not just in the state of code but in the wisdom of coding decisions, etc. For instance, I think I made a huge mistake embedding separate MED vs. Choi vs. Willson IPA symbols, because they don't actually represent different dialects, but merely different published researchers' occasionally conflicting phonological analyses of the language. Honestly, the state of Marshallese linguistics publications can be a bit of a mish-mash of different researchers doing their own things and not always agreeing on conventions, which has led me occasionally having to get a tad...creative. Lately I've been asking for more peer review on w:Talk:Marshallese language to help improve the occasionally confused and OR-prone state of the article and pronunciation templates, and what the scripting I write here is something I hope can eventually be used there as well where appropriate. That effort on Wikipedia, like this script, and the Wiktionary:About Marshallese proposal, are still all very much a work in progress, and for the most part I've had to maintain it all myself, and inadequate peer review means the mistakes I make tend to become the decisive word in how the wikis describe the language, sometimes for years on end until someone (or myself) notices the problem.

So thank you for your help with scripting and setting up some simple test cases, etc. While I'm still improving the script offline, I've made note of your improvements and am trying to add them in the offline editing before I submit and test features of a new update, all while trying not to break currently deployed invocations in the process. - Gilgamesh~enwiki (talk) 08:03, 31 October 2019 (UTC)Reply

Glad that my tinkering was appreciated. I just encountered some module errors due to outdated input in {{mh-ipa-rows}} and provided more informative module errors, and then possibly made the errors useless by removing u from the supported characters. (All the erroring instances had u.) Wiktionary:About Marshallese still needs updating though. — Eru·tuon 17:18, 2 November 2019 (UTC)Reply
Thanks again. And yeah, my bad.
Again, some background, and what motivated me to make such drastic changes today: When writing Marshallese templates on Wikipedia and Wiktionary years ago, I devised an ASCII-based symbol system loosely based on the MED phoneme transcription developed by Byron W. Bender and used in the Marshallese-English Dictionary. But in my effort to simplify it into an ASCII-inputtable system, I changed Bender's a e ẹ i notation to a e o u, since at the time we were treating the vertical vowel system phonemes as underspecified for backness or roundedness—which is true, they are underspecified for that, but at the time we were representing the phonemes using central vowel symbols /a ɜ ɘ ɨ/. But in the most recent discussion at w:Talk:Marshallese language where I asked for review by other editors to improve the quality of the language's representation and to reduce original research, it was agreed that only one of the published linguists had represented the phonemes with central vowel symbols at all, and that was Choi (1992). No one else used his ad hoc system, and it excluded one of the vowels altogether, representing only three. Other published researchers had either phonetically represented the vowels only as allophones, or echoed Bender's half century of Marshallese research using front vowel symbols (instead of central vowel symbols) to represent the underlying phonemes, which meant that the a e o u notation used before had come to make even less logical sense now. We agreed to change the way the article on Wikipedia represents the phonology. Many of those edits are still pending—I've been focusing most of my edits so far on Wiktionary because it will be most affected by these changes. Anyway, it was observed that before Bender started using a e ẹ i, he represented them in his earlier works from 1968 and 1969 as a e & i using an ampersand instead of ẹ, and I realized that since those four characters are still ASCII, they're as good as any symbols to represent those phonemes in modules and templates. I changed every instance I could find in the word entries, and I checked Category:E to check for stragglers, but at the time there weren't any, so I thought I'd gotten them all. Obviously, it seems I missed two of them.
But yes, the use of "o" and "u" as symbols are in the process of being retired, and I edited the parse function to no longer recognize them when I thought I'd at least already updated all the examples in the word entries. (I still need to edit Wiktionary:About Marshallese, and the examples in talk pages have the lowest priority at the moment.)
This time, I was quick to incorporate your most recent changes to the module code in my offline editing copy. But I admit...I don't understand the syntax text:gsub or what that snippet of code does. I didn't know Lua at all before a couple of weeks ago, and I've adapted to writing it much more quickly than I imagined possible, but that's thanks to where I've been able to convert my equivalent JavaScript knowledge. Though I understand your error-checking edits were to diagnose straggling "u" as the culprit, I don't fully understand what your added code actually does in regards to error message reporting. Could you please explain it, if possible? When I've gotten errors from the module, I've mainly just been browsing the stack trace and the line numbers of where the error was generated. - Gilgamesh~enwiki (talk) 18:52, 2 November 2019 (UTC)Reply
Thanks for the further explanation.
I learned JavaScript (and C) after learning Lua, so I can try to explain the colon syntax by comparison with JavaScript. In JavaScript, a.method() is a method call and passes an implicit this, equal to a, to the method. In Lua, a:method() is the closest equivalent; it passes a as the first argument to the method. a.method() would call the method with no arguments. The functions in the string library are available when a string value is indexed (via the __index field in the metatable for strings), so if text is a string, text.gsub gives a function equal to string.gsub, and text:gsub(pattern, replacement) is equivalent to string.gsub(text, pattern, replacement), and is analogous to text.replace(regex, replacement) in JavaScript. text.gsub(a, b) would fail to pass text as the first argument to the function, so is equivalent to string.gsub(a, b): a is the string, and b is the Lua pattern. (Lua will throw a runtime error because the replacement value is required: "string/function/table expected".) In JavaScript, it would be sort of similar to do { const replace = text.replace; replace.call(a, b); }.
The error messages I added were to avoid the incomprehensible error for indexing of a nil value for map[a][d] and such indexings. If local map = {} and local a = "u", then accessing map[a][anything] will cause the error "attempt to index field '?' (a nil value)" because map[a] is nil (there is no value indexed by a) and nil values can't be indexed in vanilla Lua. So I added a check that will prevent the "indexing of nil" error message, since I like error messages to be somewhat understandable (even though average users can't fix them). The error message might be wrong, since I was writing it quickly, and it's possible the check is no longer needed, if the module ensures that the transcription has correct phonotactics or syntax before that point. — Eru·tuon 20:29, 2 November 2019 (UTC)Reply
Thank you. I didn't even know that calling syntax was possible in Lua, but it looks elegant. I'm tempted to use it more. - Gilgamesh~enwiki (talk) 06:52, 3 November 2019 (UTC)Reply
So, to be clear, arg:func() is syntactic sugar for func(arg), right? And arg:func(a, b, c) is equivalent to func(arg, a, b, c)? - Gilgamesh~enwiki (talk) 07:12, 3 November 2019 (UTC)Reply
Apparently it's not quite that simple... But I'd love to understand it. - Gilgamesh~enwiki (talk) 08:13, 3 November 2019 (UTC)Reply
No, any old local or global variable can't be accessed with method syntax. For arg:func() to work, indexing arg.func (or arg["func"]) has to yield a function. So, setting func as a field in a table with local arg = { func = table.insert } enables it to be used as a method: arg:insert("elem"). (The same can be done by setting the metatable for the table: local arg = setmetatable({}, { __index = { func = table.insert } }).)
In the Scribunto variety of Lua, we can only modify the fields or metatables of tables. As mentioned, strings have a metatable that allows using the functions in the string library as methods, but it can't be modified. — Eru·tuon 15:35, 3 November 2019 (UTC)Reply
I see... - Gilgamesh~enwiki (talk) 23:01, 3 November 2019 (UTC)Reply
Well, I hope I'm making sense. Methods in JavaScript and Lua pretty similar apart from the this thing and the difference between prototypes and metatables. — Eru·tuon 17:24, 4 November 2019 (UTC)Reply

Also, if you don't mind my asking, are there any thoughts or critiques you could offer on how I structure the module code, the things I'm doing in the functions, etc.? I'm trying not to make my code too convoluted, but I'm also consciously aware I'm exercising some degree of feature creep. And when I realized you were also exporting the internal conversion functions, I changed the export naming convention so that all such functions are prefixed with an underscore to indicate they are internal functions not intended for normal exported use rather than the actual exports functions. - Gilgamesh~enwiki (talk) 19:00, 2 November 2019 (UTC)Reply

In regard to design, it would be simpler (at least conceptually, and for the testcases module) if the transcription-generating functions took a string and yielded a string, rather than an array of strings. Then multiple transcriptions can be handled by applying the functions multiple times. And it would be consistent with {{IPA}} to have the separate inputs in numbered parameters, rather than separate them with commas in a single parameter, and to bracket them separately: for instance, {{mh-ipa-rows|j&ngw&wil|jengwewil}} instead of {{mh-ipa-rows|j&ngw&wil, jengwewil}} yielding /tʲeŋʷewilʲ/, / tʲɛŋʷɛwilʲ/ as the phonemic transcription instead of /tʲeŋʷewilʲ, tʲɛŋʷɛwilʲ/. But this might complicate {{mh-ipa-rows}} or the module, so you should be the one to decide. — Eru·tuon 18:51, 5 November 2019 (UTC)Reply

Okay, so to be clear...calling gsub with tbl is equivalent to function(match) return tbl[match] or match end? I thought if the item wasn't in the table, it might return nil or something, which is why I wrote it as a function that returns the item or match. Also, I noticed you replaced all those substitutions with "("..V..")(ː*)%1". I was honestly not aware it was possible to reference a capture within the same pattern. - Gilgamesh~enwiki (talk) 20:40, 4 November 2019 (UTC)Reply

Yes, that's correct. Similarly, if a function supplied to gsub returns nil for a particular match, no change will be made to that match. For instance, both ("bat"):gsub(".", { ["b"] = "c" }) and ("bat"):gsub(".", function(char) if char == "b" then return "c" end end) return "cat". (Whereas in JavaScript if you do "bat".replace(/./g, function(char) { if (char === "b") { return "c"; } }) you get "cundefinedundefined". Heh.) — Eru·tuon 20:55, 4 November 2019 (UTC)Reply
I appreciate what you've further done with the testcases, in making tests appear on the main module's page itself. And since I really didn't write any of the testcases script and am not sure what to change without breaking it, I should probably let you know that the MED/Choi/Willson stuff is not coming back. I don't know what I was thinking, putting linguists' conflicting vowel symbols in pronunciation sections as if they were different dialects—that was really unwise of me to begin with. - Gilgamesh~enwiki (talk) 12:10, 5 November 2019 (UTC)Reply
@Gilgamesh~enwiki: In case you haven't noticed, I've made the testcases on Module:mh-pronunc/documentation to compare the outputs of Module:mh-pronunc and Module:mh-pronunc/sandbox. In each of the table cells for which the sandbox module differs, its output is shown below the output of the main module. — Eru·tuon 16:13, 6 November 2019 (UTC)Reply
Yes, I noticed. It helps. Though I still don't quite understand how you're getting that word list programmatically, as it hasn't seemed to have updated since I added new word entries on the wiki. - Gilgamesh~enwiki (talk) 16:36, 6 November 2019 (UTC)Reply
The list of pages and template inputs isn't automatically updated; I generated it from this list of all {{mh-ipa-rows}} templates, which I made two days ago with Pywikibot. I can regenerate it soon if you like. — Eru·tuon 16:41, 6 November 2019 (UTC)Reply
Oh. Okay, that makes sense. - Gilgamesh~enwiki (talk) 17:24, 6 November 2019 (UTC)Reply

Since you've been helping me maintain the module code, I thought I should let you know that I made some major changes to the code structure. I wrote a new local function, gsubBatch, to help reduce boilerplate in the source, since gsub is called a lot and I wanted to streamline it. - Gilgamesh~enwiki (talk) 23:55, 13 November 2019 (UTC)Reply

@Gilgamesh~enwiki: I like it. You might want to take a look at this edit applying the useful behavior of the function replacement value. I think it makes the code more readable. — Eru·tuon 21:01, 14 November 2019 (UTC)Reply

My gsubBatch function may not have been as wise as I once thought. Though it makes code more elegant to read, it can actually make it harder to debug, because errors that occur inside anonymous functions don't seem to report their line numbers if they generate an error, which in a long batch makes it harder to determine where the error came from. I may find myself restructuring code again, but if a lot of sequential gsub calls are necessary, I think I'd rather reduce the length of some variable names, because the sheer amount of boilerplate can be awful. - Gilgamesh~enwiki (talk) 00:55, 18 November 2019 (UTC)Reply

@Gilgamesh~enwiki: Hmm, this should be an improvement. However, if you aren't aware, you can click the Lua error to get a backtrace (assuming JavaScript is working). — Eru·tuon 05:20, 18 November 2019 (UTC)Reply
If I adopt a gsubBatch mechanism again, I'll look into it. - Gilgamesh~enwiki (talk) 17:05, 19 November 2019 (UTC)Reply

I just noticed a strange abundance of words in the table spelt "Wiktionary:About Marshallese", with six different phonological forms. :) Also, been adding more words up to moments ago. - Gilgamesh~enwiki (talk) 17:05, 19 November 2019 (UTC)Reply

@Gilgamesh~enwiki: Yeah, I wasn't sure if you had gotten all the new transcriptions, so I ran the Pywikibot script. It prints the contents of the transclusions of {{mh-ipa-rows}} in Wiktionary:About Marshallese as well as in entries; then I have to remove the unwanted titles. I added a list of titles to exclude so that in the future the unwanted titles can be automatically removed. Perhaps alternative spelling entries could just be soft redirects using {{alternative spelling of}}, without any definition or pronunciation (because both of those are the same for all spellings). I changed M̧ajōļ to an alternative spelling entry for M̧ajeļ based on something you said in the Wikipedia discussion, but am not sure about the others. — Eru·tuon 17:18, 19 November 2019 (UTC)Reply
Yeah, the orthography takes a while to get a feel for. I'm still learning new mini-rules about it, especially recently since I started writing that script. Of the examples at the top of my head, where Bender phonemes are otherwise identical...
  • io̧kwe over iakwe or yokwe. io̧kio̧kwe isn't difficult from there.
  • eok over yuk, etc. The Marshallese new orthography, strictly speaking, has no Y.
  • jukwa over juga. The new orthography has no G, either. Just AĀBDEIJKLĻMM̧NŅN̄OO̧ŌPRTUŪW.
  • wōja over oja, and similar examples.
  • Wūjae over Ujae, and similar examples.
  • I'm not 100% sure whether Bok-ak or Bokaak should be considered primary. I'm guessing Bok-ak, because Bokaak unusually spells out an epenthetic vowel that the new orthography largely avoids.
  • Between spaces, hyphens and unspaced unhyphenated compound words, there's really no difference in pronunciation, so just one can be picked from multiple. Multiple words undergo assimilations in uninterrupted speech, and individual morphemes of words can be enunciated as needed. The logic of that is...a work in progress; I'm still trying to reconcile the differences between normal vowels and epenthetic vowels when they neighbor glide consonants {y h w}. Anyway, I'd probably go with unhyphenated words or hyphenated ones, and hyphenated words over spaced words.
  • Note overall that as I've written vowel simplifications into the module, I've largely been following orthographic norms in deciding which surface vowel to express. And I've been trying to leave notes as to "{this} is [that], not [that]", etc.
And thank you again. :) - Gilgamesh~enwiki (talk) 19:49, 19 November 2019 (UTC)Reply
And Jāmo̧ over Jemo̧. - Gilgamesh~enwiki (talk) 22:35, 19 November 2019 (UTC)Reply

Efficiency

I may have significantly increased the module's execution time, which may be extending table load times. I changed it so that forRemainder is actually (pretty much unconditionally) called twice and the duplicate result discarded. This is for careful mode (variable name subject to change), to satisfy inconsistencies between the way Bender (1968) and Willson (2003) described the language, and the more careful pronunciations prescribed by Naan (2014). Basically, in careful mode, the nasal consonant cluster assimilations are avoided, there's a handful more cases where clusters have epenthesis instead of assimilation, and the behavior of epenthetic vowels neighboring glides has changed. I don't necessarily see an inconsistency in including both, since most languages (including English) have words or phrases that differ notably in pronunciation when spoken more rapidly or more slowly, and can change how people perceive the word in their own speech. Compare "ornge" vs. orange, where some people primarily speak it as two syllables, and some (like me) say it as one syllable. - Gilgamesh~enwiki (talk) 20:02, 20 November 2019 (UTC)Reply

Yes, execution time is definitely way up according to the "Lua time usage" measurement (at the bottom of the edit page). According to the profile as I am writing this, 4160 ms (85.2%) of that is mw.ustring.gsub. It's not a very efficient function because it's implemented using PHP regex and calls go over the Lua–PHP boundary. Sometimes the number of calls can be reduced by generalizing the patterns (regexes) and using a function replacement. — Eru·tuon 20:17, 20 November 2019 (UTC)Reply
By the way, I like how the "careful" mode avoids assimilations. Assuming Arņo is a native word, it seems strange for the r to be assimilated into a ņ, when the only reason for the r to be in the spelling is if it is sometimes pronounced. Otherwise, it should be Aņņo. Similarly with Aujtōrōlia, which could be Auttōrōlia, though since it's a loanword and the j might be needed to represent the original s, it's not very strong evidence against assimilation. — Eru·tuon 20:44, 20 November 2019 (UTC)Reply
Youch... So would it actually be more efficient to pass a function substitutor argument than a string substitutor argument? I'm all for increasing the efficiency of the script by whatever practical means available. It is also my very first Lua script.
And yes...Marshallese orthography has always been a strange creature. The new orthography since the 1970s is not purely phonemic, obviously, if you compare it with Bender's phonemes, but is designed so that syllables in isolation are reasonably easy to learn how to pronounce once you learn which sound each letter stands for, and is something foreigners (most of whose languages do not have vertical vowel systems) can more easily learn to pronounce. Native speakers of the language already know words in isolation, and know how to string them together into compound words and sentences, so their orthography can simply string together morphemes and allow epenthesis, sandhi, assimilations, etc. to take their natural course. In this way, it also preserves the morphemic structure and thus more of the etymology of words, in an orthographic approach also preferred in languages like French and Icelandic. Arņo is a compound name of two morphemes: ar "lagoon beach" and ņo "wave". If you simply write the assimilations and write it Aņņo, the etymology is relatively more obscured. What seems to be relatively new to the equation is learning how to pronounce words as they are written in a stable orthography already provided. This means that some consonant clusters that were previously routinely assimilated, may now be enunciated more carefully by people who have learnt to read and write at school. Spellings like kw increasingly are no longer taken as single consonant phonemes, but as sequences of k and w. Two-syllable words like io̧kwe may instead come to be analyzed as three-syllable words because of how they are written. rn is pronounced as two different consonants because it is written that way. I've seen evidence of these trends in the pronunciation guides prescribed by Naan (2014), my discovery of which led me to rethink how to write the Lua module. I honestly can't say I know how realistic these "careful" pronunciations are among native Marshallese speakers (some of it may well be more artificial than not), but it certainly seems to be increasingly how Marshallese is taught, at least in a college environment. If only we had more access to more native Marshallese speakers, but internet access is too expensive and unreliable for most of the population. (I'm impressed that the undersea fiberoptic cable connecting Majuro to Guam manages to span the Marianas Trench.) - Gilgamesh~enwiki (talk) 22:09, 20 November 2019 (UTC)Reply
I just noticed you made changes to the script. I haven't fully assessed the changes yet, but I've seen just enough to pique my interest. - Gilgamesh~enwiki (talk) 22:31, 20 November 2019 (UTC)Reply
Yeah, I think a function substitution can be more efficient. The function replacement handling assimilation is slightly faster, if the "Lua time usage" figures for the "before" and "after" versions of the module are accurate. (But sometimes the figures vary unpredictably. Greater differences are less likely to be the result of chance.) It means only one mw.ustring.gsub call to handle all assimilations, and perhaps the overhead of calling a function for every series of two consonants is less than the overhead of multiple calls to mw.ustring.gsub. I think that's plausible because of all that PHP has to do for each mw.ustring.gsub call.
I didn't realize Arņo was a compound (naturally, since I'm pretty ignorant). That does provide an explanation for the spelling, even if there's assimilation. — Eru·tuon 22:35, 20 November 2019 (UTC)Reply
Is it all right if I rename the substitutor function's variable names? Not just because I generally start non-consonant variable names with a lowercase letter, but C2 already exists as a separate higher scope variable, and using a different variable name may reduce the risk of variable name confusion and make the code more readable.
And s'fine. A lot of common Marshallese morphemes are only two letters long, and there was no Wiktionary Marshallese entry for ar yet anyway. - Gilgamesh~enwiki (talk) 22:42, 20 November 2019 (UTC)Reply
Yeah, the variable name duplication is not a good idea. I noticed it and was displeased. I do prefer somewhat descriptive variable names over "a, b, c, d" though. — Eru·tuon 22:47, 20 November 2019 (UTC)Reply
I tend to think of captures as a, b, c, d as a sequence of captures, and easier on the eyes than letter-numbering them like c1, c2, c3, c4, etc. Anyway, I think I know what you're trying to accomplish. Your code broke some of the (as of yet unused) nʷtˠ logic, but what you're doing here looks very, very clever and I think I know how to take it and run with it with other parts of the code. - Gilgamesh~enwiki (talk) 22:58, 20 November 2019 (UTC)Reply
Well, the variable names C1, A1, C2, A2 were abbreviations of "consonant 1", "articulation 1", "consonant 2", "articulation 2" (though that's not completely accurate terminology, since it's more like primary and secondary articulation), so more descriptive than either a, b, c, d or c1, c2, c3, c4. — Eru·tuon 23:03, 20 November 2019 (UTC)Reply
I've thought of it: x, xx, y, yy. It helps that neither X nor Y are in the standard new orthography. And when I realized what you were doing, I rewrote your function. May I demonstrate...? - Gilgamesh~enwiki (talk) 23:36, 20 November 2019 (UTC)Reply
Ahh, that's much more readable! — Eru·tuon 01:20, 21 November 2019 (UTC)Reply
Thanks. :D And I'm not even done yet. You gave me the idea, and I'm running with it. About to try another edit. - Gilgamesh~enwiki (talk) 02:14, 21 November 2019 (UTC)Reply

In response to your question, "Why did the epenthetic vowel disappear between the p and the k in Āneeļļapkaņ?", the pattern is not matching the /pʲkˠ/ when mw.ustring.gsub is called the second time, because /lˠlˠ/ is not changed when mw.ustring.gsub is called the first time, and is matched both times. Here is a technique for cases like this that also allows mw.ustring.gsub to be called only once. (Gah, in the edit summary I meant to say "getting the surrounding consonants with mw.ustring.sub", not "mw.ustring.gsub".) — Eru·tuon 02:54, 21 November 2019 (UTC)Reply

Your solution with the i and j indices was clever. (I renamed them xvi and yvi.) It all...seems to work now. Now let's see if I can rewrite the logic of another expensive regex batch without breaking it too badly.
Oh, and...the table's Rālik vs. Ratak logic seems reversed. When both forms are the same, it shows two table cells. But when the forms differ, it only shows the Rātak form.- Gilgamesh~enwiki (talk) 03:13, 21 November 2019 (UTC)Reply
How much time do you think was shaved off the module's execution, comparing right after I added "careful" mode to when we rewrote this regex batch? - Gilgamesh~enwiki (talk) 03:15, 21 November 2019 (UTC)Reply
Whoops, fixed the logic. Glad you spotted it.
It is apparently somewhat faster; I previewed Module:mh-pronunc/documentation three times with the old version and the new version, and got 5.3 or 5.4 or 7.1 seconds and 4.5 or 4.6 or 3.0 seconds respectively. Significant variation, so it's hard to say just how much faster, but there wasn't overlap. The number of calls to mw.ustring.gsub in Module:mh-pronunc in the generation of the testcases table (counted thus) has been reduced from 228,294 to 156,516.
We should probably be editing Module:mh-pronunc/sandbox to avoid changing transcriptions in entries (and avoid asking the server to update pages).... — Eru·tuon 07:18, 21 November 2019 (UTC)Reply
So, edit sandbox for experimental code, and the main module for stable milestones? Yeah, I can see how that's a good idea. - Gilgamesh~enwiki (talk) 13:36, 21 November 2019 (UTC)Reply

I've been considering an alternative approach to programming the phonetic algorithm. As it currently stands, the regex approach is effective in thoroughly processing the input text, but it's also proven a lot more inefficient than I predicted. Putting more logic into substitutor functions improves the performance somewhat, but in a process where regex replaces matches one by one, it's not as practical in making necessarily adjustments to vowels that were already replaced. For example, this existing code:

				-- {yekʷey, yewan} are [ɛɡʷɛ, ɛwɑnʲ], not [ɛ̯ɔɡʷɛ, ɛ̯ɔwɑnʲ]
				text = gsub(text,
					"(ɦʲ@*)([ɔou])(@*.ʷ.?ʷ?@*[æɛeiɑʌɤɯ])", function(a, b, c)
						return a..VOWELS_Y[b]..c
					end)

Unlike other logic that replaces text based on what already exists to the match's left-hand side, this replacement can only be made if the stable value of the vowel on the right is already known. This is how I earlier solved the Ānewātak problem so that its phonetics were properly displayed as [ænʲeːwæːtˠɑk] instead of [ænʲeowæːtˠɑk]. In a more optimized approach, that could be fixed in a second regex pass, but I think I have a better idea—I just don't know beforehand how practical it will be.

Basically, my idea is, instead of relying so much on regex, just parse the input text and represent its data as a doubly linked list of table objects, where each node represents either a consonant or a vowel. Code could loop through the link nodes, make changes in them informed by nodes that come before or after, and can make secondary changes to previous node data as needed. Then, when the linked list is done being manipulated, convert it back to text.

But can this all be done in Lua using only linked lists and logic, more efficiently than batches of regex replacements can do it? - Gilgamesh~enwiki (talk) 18:46, 22 November 2019 (UTC)Reply

I'm not sure, but I think it could end up being faster because the overhead of many mw.ustring.gsub calls is considerable. It could also reduce memory because fewer intermediate strings would be created. But I'm speculating.
I haven't done anything quite like this; the closest thing is the pair of functions make_tokens in Module:grc-utilities and tr in Module:grc-utilities. The former processes Greek characters into "tokens" (sub-sequences, mainly to handle diphthongs and single vowels correctly), and uses objects to represent the characteristics of the Greek characters, and the latter processes the tokens to create a transliteration. Not super elegant, but my version of the tokenization function was much faster than the previous one, probably because it got rid of most of the calls to mw.ustring functions.
Using a doubly linked list is an interesting idea. It could be more elegant, though I can't imagine all the details of how it could work. — Eru·tuon 03:24, 24 November 2019 (UTC)Reply
Well, practically any grc script has to be easier to maintain than the pre-Scribunto version, which I wrote back in the day. That was such a beast... - Gilgamesh~enwiki (talk) 14:37, 24 November 2019 (UTC)Reply
Wait...you said mw.ustring functions were inefficient. Does that include mw.ustring.sub? - Gilgamesh~enwiki (talk) 14:40, 24 November 2019 (UTC)Reply
mw.ustring.sub is noticeably inefficient when there are many calls, for instance when you iterate through strings using for i = 1, mw.ustring.len(str) do local character = mw.ustring.sub(str, i, i) end. In the previous version of the tokenization function, mw.ustring.sub was called about up to three times for every code point in the string. My impression is that that explained most of the inefficiency in the old version of the function, though it's not a great testcase because the old and new versions are so different. The overhead is probably not as noticeable in the function replacement in Module:mh-pronunc though, where it currently has only 2,028 calls, as opposed to 115,872 for mw.ustring.gsub to create the testcases table. (And I guess mw.ustring.gsub probably has greater overhead.) It's not so efficient that the function should be avoided altogether.
I should say, the module is already efficient enough in entries (it looks like {{mh-ipa-rows}} takes about a twentieth of a second in entries), so don't feel obligated to remodel it for that reason at least. (Not to discourage you from rewriting it if you want to – I do quite a bit of random rewriting of modules for various reasons.) — Eru·tuon 23:08, 24 November 2019 (UTC)Reply
It's not just Wiktionary I have to think about. I want to also be able to migrate the code to Wikipedia. Most WP articles where it would be relevant might need the entry only once, but not on articles like Kwajalein Atoll where there are Marshallese names provided for all the notable islets and many of them are notable, but most not notable enough to get separate articles of their own. And some of these islands have two or three separate Marshallese names depending on context. Obviously, being WP, pronunciations aren't embedded in the same format as Template:mh-ipa-rows, and perhaps that means fewer functions called, but toPhonetic would certainly be called multiple times in an article like that. I'd rather not add that much extra load time there. - Gilgamesh~enwiki (talk) 00:12, 25 November 2019 (UTC)Reply
Also, as I've tried to write linked list code, I'm realizing that I'm still creating a beast of a different kind: Far fewer mw.string, but immensely more bloated code. I get the impression that functions like mw.string.sub are so expensive because the strings are probably encoded in UTF-8, but logic required to seek codepoint indices—or worse, conceivably to convert between UTF-8 and UTF-16 and back—may involve a lot of overhead if called often enough (I'm not sure which, if any of these things, is actually being done). Obviously we're working with a lot of Unicode text and the data needs to be preserved in that format.
I wonder...what if I completely redesign the internal code format (returned by parse and passed to the other internal functions) to use only ASCII surrogates and byte-based string functions for the text-crunching, and then convert them to Unicode forms to represent their final forms? Are there also byte-based functions available for regex that are more efficient? - Gilgamesh~enwiki (talk) 00:12, 25 November 2019 (UTC)Reply
I just had a thought. Many calls to mw.ustring.sub can be expensive, right? But most of the time I only need a single Unicode character. What if I...split a string into an array of characters first, and just reference the array's indices? No dynamic linear behavior involved in retrieving an indexed Unicode code point from a byte string. - Gilgamesh~enwiki (talk) 02:00, 25 November 2019 (UTC)Reply
Hm, yeah, maybe some Wikipedia articles could invoke the module enough to noticeably increase Lua time usage. There are quite a few words in Kwajalein Atoll that could have IPA transcriptions.
I certainly hope mw.ustring.sub doesn't do any conversion between UTF-8 and UTF-16. That would be madness. I found that the implementation of mw.ustring.sub calls mb_substr in PHP, which calls mbfl_substr, but I didn't figure out what it does to UTF-8.
The byte-based functions are the string library functions (the ones that can be called as methods on strings). They are much more efficient because they call directly into C and don't have to deal with UTF-8 or Unicode categories. But using ASCII replacements for the Unicode characters sounds like a bit of a pain; it could make the intermediate forms a bit harder to understand.
Yeah, using an array of characters should be cheaper if you're calling mw.ustring.sub to get multiple characters from the same string. To be super cheap, I would use string.gmatch: function get_character_array(str) local arr, i = {}, 1 for char in string.gmatch(str, "[%z\1-\127\194-\244][\128-\191]*") do arr[i] = char i = i + 1 end return arr end. — Eru·tuon 05:38, 25 November 2019 (UTC)Reply
I'm increasingly wondering if UTF-16 isn't involved under the hood at all. But then, Unicode code point operations on UTF-8 data still means that the functions cannot know in advance which byte index contains which code point index, which means that it has to measure from the start of the string. That means linear behavior, and that isn't much better than converting the whole string to UTF-16.
Anyway, the string-to-character-array code I had in mind was mw.string.split(text, ""), called only once before a major mw.string.gsub operation whose substitutor function would have otherwise needed mw.string.sub multiple times per match. I hadn't considered your string.gmatch approach before, but it looks interesting—might there be a way to expand it to work with three- and four-byte UTF-8 code points?
And yeah, trying to find an ASCII-based surrogate code has proven...challenging, to the point I think maybe I won't do it. I tried to design a Unicode-to-ASCII-to-Unicode cipher mostly based on X-SAMPA, but it had its constraints, and a lot of X-SAMPA sequences use two or more ASCII characters where Unicode IPA would only use one code point. It's fortunate I'm pretty knowledgeable in X-SAMPA, which greatly improved since I wrote an offline JS utility (downloadable here) that automatically converts X-SAMPA input to IPA as you type. (I wrote it several years ago, and my coding conventions have certainly improved since then, so don't be too horrified if you view source. If I could write the identical utility today, there would be so many things I'd change. But I digress.) So, to try to come up with a one-code-point-to-one-character cipher, I had to think of ways to simplify some sequences. [æɛeiɑʌɤɯɒɔou] already has a one-to-one conversion with {EeiAV7MQOou, but when writing regex sequences, { would have to become %{, so I could just replace it with a instead. The secondary articulations is where it gets trickier, as the equivalents of [ʲ ˠ ʷ] are ' _G _w. Since I only use [w] as a final phonetic presentation form, I could conceivably just use j G w, but it's again complicated where the X-SAMPA equivalent of [ɦ] is h\. Lots of these little things call for lots of little simplifications, until you get to the point where the internal string /ɦʲænʲeɦʲelˠlˠæpʲkˠænˠ/ (Āneeļļapkaņ) has a pseudo-X-SAMPA appearance of hjanjehjelGlGapjkGanG, and...I end up kinda not wanting to go that route anymore. Regex and the algorithm can already get complex enough without making the internal IPA so much harder to read. - Gilgamesh~enwiki (talk) 16:27, 25 November 2019 (UTC)Reply
Oh, just now realized that your "[%z\1-\127\194-\244][\128-\191]*" does support three- and four-byte code points. - Gilgamesh~enwiki (talk) 16:36, 25 November 2019 (UTC)Reply
Wait, your example code just grows an array by assigning new indices to the end of it? That seems bad to me from a JS background, where an array becomes much more inefficient unless you grow it with array.push(element). You sure that doesn't hurt array storage efficiency on the JIT site? (Or does Scribunto/Lua not use a JIT anyway?) I'd probably find myself writing it with push's Lua equivalent, table.insert. - Gilgamesh~enwiki (talk) 16:41, 25 November 2019 (UTC)Reply
Huh... Okay, then, your approach is better. :) - Gilgamesh~enwiki (talk) 16:44, 25 November 2019 (UTC)Reply
Hm, is it generally safe (and hopefully performs better) to use byte-string-based regex functions on UTF-8 strings in situations where it doesn't have to care how the Unicode code points are encoded? UTF-8 searches, UTF-8 replacements, etc. It seems to me like it would only really get unsafe if you tried to mix non-ASCII characters into single-character regex logic ([xyz] x? x* x+ etc.), as it would test for the byte rather than the codepoint. But stuff like simple substring replacements and multi-character captures (xyz) could be fine even with UTF-8 code points included. - Gilgamesh~enwiki (talk) 17:02, 25 November 2019 (UTC)Reply
table.insert isn't any more efficient than t[i]. As mentioned in the link, it's actually slower because of the two meanings that table.insert has (table.insert(t, val) vs. table.insert(t, i, val)). Scribunto doesn't use LuaJIT. It would probably improve performance to allocate the entire array at once with { nil, nil, nil, ... }, but that requires knowing the number of code points and having a function that can return that many nils.
Yep, those are two cases in which the string library doesn't work with multi-byte characters; also several of the character classes like %s are Unicode-dependent in the mw.ustring library. I wrote a little about this at WT:LUA § Ustring patterns and created Module:User:Erutuon/patterns, which contains a function that tests whether a pattern will match correctly (according to UTF-8 and Unicode semantics) in the string library functions.
I imagine that converting UTF-8 to UTF-16 and back requires memory allocation, so there should be a significant performance penalty if mw.ustring.sub is implemented that way. Certainly indexing UTF-8 by code point is slower than byte indexing, but I imagine with this decoding technique it could be fairly fast. — Eru·tuon

I've given the the theoretical Unicode-to-ASCII-pseudo-X-SAMPA cipher more thought, and I believe if I were to use it, it would look something like this:

p b t d z k ɡ m n ŋ r l ĭ ī ɣ ɦ ɧ _ ʲ ˠ ʷ æ ɛ e i ï ɑ ʌ ɤ ɯ ɒ ɔ o u ◌̯ ː ◌͡◌
p b t d d k g m n N r l y Y H h H _ j G w a E e i I A V 7 M Q O o u ^ : =

Because, on second thought, hjanjehjelGlGapjkGanG is rather hard to read, but then, so is /ɦʲænʲeɦʲelˠlˠæpʲkˠænˠ/. These are internal formats, not display formats (even the internal IPA is pseudo-IPA), and at least X-SAMPA is well documented enough for a pseudo-X-SAMPA approach to be viable. I'm still working with code ideas offline. - Gilgamesh~enwiki (talk) 21:23, 26 November 2019 (UTC)Reply

I've tried a variety of coding approaches, and I'm realizing there may be no real substitute for batches of regex. Regexp can be written fairly concisely, and the more bloated code comes, the harder it is to read. And after multiple attempted rewrites, I've found that I've stopped writing comments to reduce mental gear-shifting. Well-written code doesn't need many comments anyway. I just want to write something that balances readability with efficiency. Fortunately, I've had decent success with the pseudo-X-SAMPA approach in concept, and I can minimize the use of UTF-8 regex functions and rely more on faster functions like string.gsub. (At least I hope it's faster...) - Gilgamesh~enwiki (talk) 08:16, 2 December 2019 (UTC)Reply

This revision does seem to be noticeably more efficient than this: about 1.7 seconds versus 2.7 or so. Since some of that is the less efficient Module:mh-pronunc, I guess the sandbox module takes 1.7 - 2.7 / 2, or 0.4 seconds. But there is a tradeoff between efficiency and readability. 20:34, 2 December 2019 (UTC)
I wonder...how are Lua's regular expressions functions implemented? string.gsub, string.find, etc. I cringe to think that the engine has to compile a new regex edifice every time the regex code is passed to one of these functions. I hope they are at least being cached between calls, either in an internal hashtable or attached to the internalized pattern strings themselves. - Gilgamesh~enwiki (talk) 02:08, 3 December 2019 (UTC)Reply
Since Lua patterns are so much simpler than proper regular expressions, they're just interpreted. You can see the pattern-interpreting function used by all of the string-library pattern-matching functions, except string.find when the plain flag is set, here. — Eru·tuon 04:15, 3 December 2019 (UTC)Reply
I see... I hadn't considered that. Keeping it simple means implementing it simple. - Gilgamesh~enwiki (talk) 04:27, 3 December 2019 (UTC)Reply

I finished writing the new draft and ironing out the bugs, and replaced the non-sandbox version with it. How does the performance compare now with the previous version? - Gilgamesh~enwiki (talk) 21:32, 5 December 2019 (UTC)Reply

Wow! Considerably faster for the whole testcases table: less than half a second. — Eru·tuon 22:52, 5 December 2019 (UTC)Reply
Seems like a winner, then. And the code is readable? The pseudo-X-SAMPA isn't too much trouble? I had to deviate significantly for some symbols, like c J h H y Y a I @ which do not represent their conventional X-SAMPA counterparts, for the sake of being more regex-pattern-friendly and single-character-friendly. The way I use them, c is actually [t͡s], J is [d͡z], h and H are transitional representations of unsurfaced and surfaced glides, y is {yi'y} ([i̯]), Y is {'yiy} ([iː]), a is [æ] ({ isn't as readably regex-friendly), I is a dotless [ı] replace ı with ɪ, invalid IPA characters (ı) that is friendlier to IPA tie bars, and @ is the diacritic [◌̆]. Otherwise (unless I've forgotten any), the symbols are the same as their X-SAMPA counterparts (or _-notated forms thereof), which are mostly the same as their IPA counterparts when they are plain Latin lowercase letters. The system works well. (Right now, in edit preview, it complains that [ı] replace ı with ɪ, invalid IPA characters (ı) is invalid IPA, but the choice is really just to keep the tie bar from hovering so much higher than over other pairs of vowels when [i] is present—[u͡i] vs. [u͡ı] replace ı with ɪ, invalid IPA characters (ı). If it proves problematic, it can be reverted to [i]—I just wanted to polish the presentation a bit, which makes a different with certain IPA typefaces like Gentium and certain browsers like Firefox.) - Gilgamesh~enwiki (talk) 01:35, 6 December 2019 (UTC)Reply
It looks pretty readable to me, since I'm familiar with a fair amount of X-SAMPA.
An alternative to using the dotless i would be to use  ͜ (U+035C COMBINING DOUBLE BREVE BELOW) if either of the two vowels is i: [u͜i]. I prefer that because the dotless i confuses me: it looks somewhat like ɪ, and I think I'm used to seeing the dot when there's a tie bar. The equals sign could be converted to the tie character above or below before the rest of the ASCII characters at the end. — Eru·tuon 04:40, 6 December 2019 (UTC)Reply
That is a very good point. I think I'll do what you suggest. - Gilgamesh~enwiki (talk) 04:49, 6 December 2019 (UTC)Reply
You know, it has been my conventional wisdom for decades that regular expressions are one of the slowest devices in scripting, and that practically any other conventional means of parsing text is preferable for speed. But that isn't always true, is it? At least, not in Lua. In some cases, string.gsub actually seems faster than trying to do the same thing procedurally, even if you try to do it all with arrays of one-character strings. These calls are actually a lot faster than I gave them credit for—I knew they would be faster than mw.ustring.gsub, but not that they might actually be faster than my attempts to do the same thing procedurally. I suppose it also helps that, this time, I eliminated most throwaway lookup tables, and instead generate them only once and cache them.
All that said...I still kinda hate Lua. Too many thens and nots and not enough curly braces, and arrays starting at 1 instead of 0 is consistently maddening. I miss JavaScript. Would love to write modules in modern JS. - Gilgamesh~enwiki (talk) 05:09, 6 December 2019 (UTC)Reply

I made a small change that could significantly improve performance, at least for some regex replacements, but I don't know how well. The change is:

local function string_gsub2(text, pattern, subst)
	local result = text
	result = string.gsub(result, pattern, subst)
	-- If it didn't change the first time, it won't change the second time.
	if result ~= text then
		result = string.gsub(result, pattern, subst)
	end
	return result
end

Still looking for small ways I can improve efficiency. - Gilgamesh~enwiki (talk) 19:44, 21 January 2020 (UTC)Reply

toMOD

I wrote a simple new function, toMOD, that I need tested, perhaps with a new column in the table. It converts standard orthographic spelling to the format used by the Marshallese-English Online Dictionary, converting ĻļM̧m̧ŅņN̄n̄O̧o̧ to ḶḷṂṃṆṇÑñỌọ. This has potential applications in Marshallese reference templating, where a word in standard orthographic spelling can be automatically converted to MOD's spelling so that references can link directly to dictionary entry anchors on that site without us needing to directly embed a differently-spelt word in the external link. No such template has been written yet. It may be a good idea for each row of the "term" column and a potential MOD column to share a table cell where the forms have identical spelling. And, in any event, the separate MOD spelling should probably not link to a Wiktionary entry with that spelling, as it is and always was a non-standard alteration to Marshallese orthography which is largely limited to the MOD, Naan and associated media intended for offline distribution to available computers in the Marshall Islands. I imagine that, if the standard orthography were considered friendlier to older Windows and Mac computers and their available font rendering, MOD and Naan would be using the standard orthography out of the box, but for the time being they are what they are. - Gilgamesh~enwiki (talk) 07:44, 10 December 2019 (UTC)Reply

That is a useful function to have. I think it would be useful to display the MOD spelling in the entry, unlinked – that would allow people to search for the MOD spelling (ḷọñ and find the entry (ļo̧n̄), provided there's no entry for a homograph of the MOD spelling. — Eru·tuon 22:09, 10 December 2019 (UTC)Reply
I thought most modern browsers allow Ctrl-F text searches that recognize letters and ignore diacritics. Right now I press Ctrl-F and type unmarked "lon" and it finds both of those words you just mentioned. However, just displaying the MOD spelling in the entry might be doable...might need some new templates. But I think I've been hesitant to dive into new Marshallese entry templating design too soon when there are still so many aspects of the language's grammar I don't fully understand. For instance, all Marshallese adjectives are verbs, and beyond suspecting that adjectives are stative verbs (equivalent to English "to be <adjective>"), I don't know what else that actually means. Yet for now, a Marshallese entry template doesn't have to be complicated—it can just redirect to the standard entry template, but display the MOD spelling as an alternate where they differ.
By the way, I've not yet figured out how display actual wiki markup using Scribunto/Lua—everything I print out seems to be the same as the contents of <nowiki></nowiki>. If I knew how to write scripts that generate more complex wiki markup output, I might be able to migrate more of the functionality of {{mh-ipa-rows}} to a template.
It also occurs to me that Module:mh-pronunc is getting big, at over 30K now. Conventional wisdom suggests splitting it up into multiple scripts that can be imported into each other as needed, but then a multi-file project isn't as simple to mirror at Wikipedia. (A copy exists at wikipedia:Module:mh-pronunc, and its comment at the top links back here.) So maybe, the most portable, reusable portions could be maintained as one script, and more site-specific applications can be separate scripts that can stay on this wiki. For instance, mh-ipa-rows is useful at Wiktionary but notso much at Wikipedia. - Gilgamesh~enwiki (talk) 03:04, 11 December 2019 (UTC)Reply
Oh, by search I'm mean the search engine for Wiktionary. Right now ļo̧n̄ is the 17th result in the search for ḷọñ, but if it is displayed in one of the templates, it should be higher in the results. I was thinking the MOD spelling could be displayed in the pronunciation template, but that isn't quite appropriate, and anyway alternative spelling entries probably need a MOD spelling, but might not have a pronunciation template. Probably the template that displays the MOD spelling should be placed in the Alternative forms section.
I've maintained a sort-of mirrored version of a set of Wiktionary modules on Wikipedia (Module:Unicode data), but the Wikipedia and Wiktionary versions have drifted apart in some ways; it's tedious copying the source code. It might be easier with a Pywikibot script, but I can't edit the Wikipedia module anymore because it's been template-protected. — Eru·tuon 04:05, 11 December 2019 (UTC)Reply
I didn't realize that's what you meant—I put it in (newly-created and under-featured) {{mh-head}} for now. At least the MOD spelling is being displayed, though. And I don't think it may be the best idea to put the MOD spelling in an alternative forms section, because it may prompt a naive third-party editor to turn the unlinked term into a linked term and create a word entry. My concern is that it may motivate an unnecessary duplication of many entries with the non-standard orthographic variants. It also doesn't help that some sources for the language write Marshallese words without any diacritics, and it seems dan was created from one of these sources as an unknowing duplicate of dān. - Gilgamesh~enwiki (talk) 08:05, 11 December 2019 (UTC)Reply

If I may ask, could you please update the table? I was updating it manually, but then I added so many new entries that I got behind. Most of the new entries are words that start with ri-—demonyms, mainly. - Gilgamesh~enwiki (talk) 05:08, 15 December 2019 (UTC)Reply

Done. And finally the script is fully automatic: it reads the "excluded titles" list and updates the list of template input without me copy-pasting anything. — Eru·tuon 09:59, 15 December 2019 (UTC)Reply
Thank you. What do you think of the state of the script and entries now? It's still only a tiny selection of the language, but I've been trying to steadily add more words. I'll also try to add words of phonological interest that help continue to refine the script. - Gilgamesh~enwiki (talk) 11:01, 15 December 2019 (UTC)Reply

Overhauling Template:mh-head

Marshallese doesn't have all the complex noun cases of an agglutinative language, but it does have some inflected forms, and {{mh-head}} would seem to be the appropriate place to list these. I have an idea of what I want to accomplish, but it may require some additional Scribunto/Lua API I'm not that familiar with, since I think template-only logic would become unnecessarily bloated. I was wondering if you could help me write such a template and backing script. I need to figure out how vanilla {{head}} creates its inflection list and handles the appropriate automatical categories with language-sensitive sorting keys, and how I can extend or replicate that in a script, with possibilities like default inflected forms, more than one of the same kind of inflected form, etc. I can conceptualize what I want to achieve, but API-wise I'm in over my head. - Gilgamesh~enwiki (talk) 02:14, 24 December 2019 (UTC)Reply

I think I found some resources to start with, chiefly Module:headword. - Gilgamesh~enwiki (talk) 18:02, 24 December 2019 (UTC)Reply

Yeah, the language-specific headword-line modules call full_headword in Module:headword and if necessary format_categories in Module:utilities to format extra categories that don't begin with the language name. In the Marshallese module there could be a main function that generates the MOD spelling and it can call one of the pos_functions to handle part-of-speech-specific stuff. I'm not sure what is a good module to base the Marshallese one on though. Much of Module:eo-headword is probably understandable because the morphology is simple at least. — Eru·tuon 19:52, 24 December 2019 (UTC)Reply
Now that I understand the technical aspects better of implementing the template, I realize I still need a better understanding of the grammar, so I'll put it off for the time being. After all, I'm sure there may be all sorts of unforeseen errors in the Wiktionary entries that could be remedied with a better understanding of both Marshallese grammar and the MOD entry structure. - Gilgamesh~enwiki (talk) 05:04, 25 December 2019 (UTC)Reply

Distributive verbs

I think sometimes I forgot just how much technical work you do here at Wiktionary, beyond just helping me with a Marshallese module. I created a new category, Category:Marshallese distributive verbs, but {{auto cat}} shows this category is not supported. What would be involved in creating new grammar categories? - Gilgamesh~enwiki (talk) 13:45, 14 January 2020 (UTC)Reply

Some brief background: Marshallese distributive verbs basically modify a noun or verb with the rough inflected meaning of "there are a lot of [something]s." This particular grammatical form is demonstrated extensively in example sentences throughout the Marshallese-English Online Dictionary. - Gilgamesh~enwiki (talk) 13:53, 14 January 2020 (UTC)Reply

The "distributive verbs" category should only be added to the category system (Module:category tree/poscatboiler/data/lemmas probably) if it's going to be used in other languages and the meaning is roughly the same for all of them – meaning if there are distributive verbs in another language with a different meaning, that doesn't allow us to have a single description for every language's distributive verbs category. At least to start with, it can have manual content. — Eru·tuon 23:38, 15 January 2020 (UTC)Reply
That seems logical. Since I'm not specifically aware of distributive verbs being in any other language, I couldn't guarantee they would mean the same thing in those languages. As it is, Marshallese already uses at least a few relatively exotic grammatical forms that only one or a few other languages use—for instance, besides Category:Marshallese noun construct forms, there's only Category:Hebrew noun construct forms as subcategories of Category:Noun construct forms by language. Then there's also adjective verbs, which I initially categorized as Category:Marshallese adjectives, but then wondered if they shouldn't be better in Category:Marshallese stative verbs (there are no adjectives that are not verbs), when in reality these grammatical categories don't always easily fit in the existing conventional hierarchy, and I'm not proficient enough in the language myself to make confident decisions about their placement, and I fear I may be introducing errors that might have to be fixed in bulk at a later date. - Gilgamesh~enwiki (talk) 06:28, 16 January 2020 (UTC)Reply

@Erutuon Wow, you are a busy bee. I think I have even greater respect for what you do here than I did even just 24 hours ago. As much as I would appreciate your continued feedback in my ongoing endeavors, I can still wait. - Gilgamesh~enwiki (talk) 23:28, 15 January 2020 (UTC)Reply

Bug

@Erutuon There's a bug in the module's debug table, most noticeable with words whose Bender spellings start with "yiy" and a vowel. In line with references explaining how Marshallese words can be enunciated phoneme by phoneme, I'm testing an experimental enunciate-mode, where short prosodic breaks [|] are inserted in the middle of consonant clusters. The problem is...the International Phonetic Alphabet specifies these as pipe characters |. I already tried hard-coding {{!}} in the module output, but it only looks like {{!}}. So now I'm using a normal pipe character, but there's a bug in the way the module's debug table displays it. What's only displaying æ.e.kʷwɤtʲ] should actually be displaying [i | æ.e.kʷwɤtʲ] - Gilgamesh~enwiki (talk) 19:03, 16 January 2020 (UTC)Reply

@Gilgamesh~enwiki: Fixed, in the testcases module, by escaping the pipes. They are part of template syntax, and in this case the stuff before the pipe was being treated as attributes for the table cell. — Eru·tuon 19:14, 16 January 2020 (UTC)Reply
Thank you. :) - Gilgamesh~enwiki (talk) 19:25, 16 January 2020 (UTC)Reply
Just FYI: it's unnecessary to ping someone on their talk page, because they already get a notification just from someone else editing their talk page. Chuck Entz (talk) 04:11, 17 January 2020 (UTC)Reply
Ahh, good to know. - Gilgamesh~enwiki (talk) 06:37, 21 January 2020 (UTC)Reply

Ratak and Rālik specific word categories

How do I set this up? So things work in {{lb}}, and so forth. I know similar categories exist for Category:Indian English, Category:New Zealand English, etc. The Ratak Chain and Rālik Chain dialects of Marshallese are mutually intelligible, and differ mainly by some regular variations in pronunciation reflex, and some vocabulary differences. But many of the different forms are often still written differently depending on dialect. For instance, m̧m̧an "good" is the common stem, em̧m̧an is the Rālik reflex, and m̧ōm̧an is the Ratak reflex, but in both dialects the prothetic vowel vanishes if the stem takes a bare vowel prefix: rūm̧m̧an (ri- + m̧m̧an) means "good person." I want to start making articles for the stem forms, and have their dialect reflex entries (by spelling) automatically categorized through {{lb|mh|Ratak}}, {{lb|mh|Ralik}}/{{lb|mh|ālik}}, etc. I should add that I don't know if the dialects themselves have supplemental language codes, the same way Tosk Albanian is "als" (Albanian, South) and Gheg Albanian is "aln" (Albanian, North).

I'm not sure what to name the categories, though—"Rālik Marshallese"? "Rālik dialect Marshallese"? "Rālik Chain Marshallese"? I'm not sure what the most stable nomenclature would be. In the Marshallese-English Online Dictionary, they're also frequently just called "Dial. W" and "Dial E.", since Rālik ("sunset") is the western chain and Ratak ("sunrise") is the eastern chain, but the two dialects' native isogloss line still runs between the two chains themselves.

I should probably additionally add...I'm not 100% sure that I know what I'm doing. It's one thing to know how templating and scripting languages work (which I increasingly know), and another thing entirely to know how existing templates and scripts are set up so I extend them for specific editing needs. - Gilgamesh~enwiki (talk) 01:14, 20 January 2020 (UTC)Reply

@Gilgamesh~enwiki: Categories for most language varieties are added to entries via Module:labels/data/subvarieties. You can add definitions for the labels {{lb|mh|Ratak}} and {{lb|mh|Ralik}} there, with categories and linked display text if desired. Personally, I like the shorter category name: "Rālik Marshallese". The category page can explain what it means. It looks like there aren't ISO codes for Rālik and Ratak, but if they might be referred to in etymologies (for instance, {{der|en|<code for ralik>|word}}), then they could be given Wiktionary codes in Module:etymology languages/data too. — Eru·tuon 19:34, 20 January 2020 (UTC)Reply
Thank you, I'll check out the subvarieties. And if nothing else, "mh-ralik" and "mh-ratak" may suffice as ad hoc language codes if ever needed. - Gilgamesh~enwiki (talk) 19:41, 20 January 2020 (UTC)Reply

Enunciated columns in the debug table

In addition to the previous section I just wrote, I was wondering...do we risk the module timing out if we add additional enunciated columns to it? Seeing that enunciated mode has since been fully deployed to articles wherever a consonant cluster exists in the phonemic form, acting on previously unread documents that Austronesier and I discussed at wikipedia:Talk:Marshallese language—see kajin M̧ajeļ for a good example of how normal phonetic and enunciated IPA can differ. And it's not just the absence of consonant assimilations or epenthetic vowels, but also some different vowel reflexes simply as a consequence of the last vowel before a consonant cluster being the last vowel of its prosodic fragment and the first vowel after a consonant cluster being the first vowel of its prosodic fragment—see eakeak, tuen̄ and utut to see what I mean. (Incidentally, you may be pleased to see that Arņo now shows two different consonants when enunciated.)

As for how the added columns would work, enunciated forms would only differ between dialects if their normal phonetic forms already differ (because of the limits in the differences between dialect reflexes), so I'm thinking something like: phonetic (Rālik), enunciated (Rālik), phonetic (Ratak), enunciated (Ratak), with each dialect's phonetic and enunciated columns merging if they're the same, and all four columns merging if all four forms are the same.

If we'd be taxing our Scribunto/Lua allowances too much for the one table, I could instead set it to show enunciated mode in the sandboxed version as a temporary visual aid during relevant discussions, but still there are now effectively four different phonetic modes to debug. - Gilgamesh~enwiki (talk) 16:11, 20 January 2020 (UTC)Reply

@Gilgamesh~enwiki: At the moment there's no risk of the testcases timing out, even if they take twice as much Lua processing time as they do now, because it's still under a second, and they've got a limit of ten seconds. The page does take a bit long to parse now though: the "real time" can be as much as 2 seconds (not quite as long as for Wiktionary:List of languages: ~6 seconds).
I'll take a look at how to handle the enunciated mode. I do like it using spaces; it looks quite intuitive to me. — Eru·tuon 09:26, 21 January 2020 (UTC)Reply
Well, it looks like if the table starts to balloon that big, we may have to start excluding other words that simply won't get displayed. Perhaps some of the least bug-prone words with the least complicated logic involved, like for instance those with invariable /ʲVʲ, ˠVˠ, ʷVʷ/ vowels and no clusters, like jeen and ļan̄. But for now, nothing needs to be removed and it may never get to that point. And I suppose there's still a chance I could improve the module's efficiency in other areas. - Gilgamesh~enwiki (talk) 13:14, 21 January 2020 (UTC)Reply

Voicing of fricatives in Old English pronunciation transcriptions

I saw that you've recently edited a bunch of Old English entries to replace /z/ with /s/, leaving the comment that [z] is an allophone of /s/ in Old English. That is arguably true, but I think the removal of /z/ from Old English transcriptions brings up a few more issues that ought to be addressed. First, the reason I say the allophonic status of [z] is "arguable" is because there are in fact some contexts where the use of a voiced vs. a voiceless fricative may not be completely predictable from the phonological context. See "Phonemically Contrastive Fricatives in Old English?", by Donka Minkova, for a description of some of the relevant evidence and references to prior literature that discusses the topic (Minkova does support the interpretation that the voiced and voiceless fricatives were allophones in Old English). The other issue, more important in my opinion, is a matter of consistency: two other voiced fricatives, [v] and [ð], are commonly analyzed as allophones of /f/ and /θ/. So a transcription like "/ˈt͡ʃiyvese/" for ciefese seems fairly problematic: if we decide to use /s/ here, I think it would be better to also use /f/, giving /ˈt͡ʃiyfese/. And in fact, considering that the allophonic realization of voiceless fricative phonemes as voiced fricatives doesn't come naturally to modern English speakers, and that (as mentioned above), the distribution of the voiced and voiceless allophones in Old English is somewhat complicated, I think it would be worthwhile to include a phonetic transcription using [v] and [z] in addition to a phonemic transcription with /f/ and /s/ for words like this.--Urszag (talk) 21:36, 31 October 2019 (UTC)Reply

@Urszag: Sorry, I really did make a mess with my edits. I will search for phonemic transcriptions with /ð/ and /v/ and correct them as well.
It would be easier to just generate Old English transcriptions with Module:ang-pronunciation, which I started but never completed. I agree that there should be phonetic transcriptions for words in which /f s θ/ are voiced. Words with hard allophones of /j/, like eċġ, whose phonemic transcription /ejj/, would also benefit from phonetic transcriptions (assuming that the "hard" and "soft" pronunciations of ġ are indeed allophones) because the change from /j/ to [d͡ʒ] is a bit surprising. — Eru·tuon 14:46, 1 November 2019 (UTC)Reply

Review of NEC rewrite

WDYT about the result? Should I move the function processor() and function setup_click_keyup() out of the setup_infl()?--So9q (talk) 19:17, 4 November 2019 (UTC)Reply

I'm still very confused by the script, but it looks much improved. I have some cleanup ideas. It's probably a good idea to add a nec- prefix to the NEC parameters in the URL, to avoid collisions, and it's traditional to use hyphens in class names rather than underscores. I've made the script use mw.util.getParamValue instead of a custom function.
I loaded the scripts, and some of the translation links are colored; but clicking the links doesn't show the NEC. Maybe I broke User:So9q/new-entry-creator.js when I edited it? — Eru·tuon 20:12, 4 November 2019 (UTC)Reply
I just tested and it still works for me clicking translation links. Although for now CreateTranslation.js only support fetching the first PoS. There is a bug with lang=code not being set also.--So9q (talk) 16:30, 6 November 2019 (UTC)Reply
Oh, it's working now for me too. That's odd. — Eru·tuon 17:07, 6 November 2019 (UTC)Reply

Adding aliases to Module:family tree

You've done a lot of work on this. Now that we have aliases for etymology languages, I'd like to display them, either in the family tree or in an info box, similar to what we have with {{langcatboiler}}. Maybe we should have {{etym lang cat}} for etymology language categories; currently these categories, when they exist, aren't standardized in name or contents. Benwing2 (talk) 05:40, 15 November 2019 (UTC)Reply

@Benwing2: I've thought of creating a template for etymology language categories, but I got hung up over an unresolved issue. At the moment, many etymology language categories just have a category for the canonical name (Category:Attic Greek), though there is also Category:Kölsch Central Franconian corresponding to Kölsch (ksh). Entries are added to the categories using {{lb}} and {{tlb}}. Ideally lemmas and non-lemma forms would be in different categories, but I didn't know how to do that. It would be weird to have to specify lemmas or non-lemma forms in {{tlb}}, like having {{tlb|grc|Epic Greek lemmas}} or {{tlb|grc|Epic Greek non-lemma forms}} display as "(Epic)" but add different categories, and I didn't know how to accommodate that in Module:labels and couldn't think of another good way to add the categories. So I never came up with any kind of action plan. Maybe this issue doesn't have to be solved right away though. — Eru·tuon 19:52, 15 November 2019 (UTC)Reply
One possibility is to allow etymology languages in {{head}}, which knows about the POS and hence whether it's a lemma or not. The only other way I can think of without having the POS or lemma status marked explicitly in {{tlb}} is for {{tlb}} to look through the page text, which is expensive and likely error-prone. Benwing2 (talk) 18:11, 16 November 2019 (UTC)Reply

Χαῖρε! On 21st century Wiktionary we shouldn't perpetuate the biases of 19th century Englishmen; Doric is real Ancient Greek! Not a subdialect of Attic...

Χαῖρε, hello, nice to (virtually) meet you...
With regard to recent edits on ἅρπα I wasn't sure where to post this, I was just responding specifically vis-à-vis the Doric Greek morphology of ἅρπα but ran long touching on the broader subject of Greek dialects and their inclusion on Wiktionary, so I'll post this full comment on your talk page too...

Inqvisitor (talk) 08:24, 16 November 2019 (UTC)Reply

Hi, it looks like your post in WT:RFVN is substantially the same. In future, please post in just one place. You can bring my attention to the post by including a link to my user page (Erutuon). That will send me a notification. — Eru·tuon 09:04, 16 November 2019 (UTC)Reply

On the reversal of my edit on the article on ışık

You reverted my edit on the page ışık. Why is that? The declension adds nothing to the article (the nominative declension is the word itself and the accusative declension is already given in the {{tr-noun}} template: "ışık (definite accusative ışığı, plural ışıklar)"). In my opinion, the templates {{tr-infl-noun-c}} and {{tr-infl-noun-v}} shouldn't be used anywhere on Wiktionary as they provide no information that {{tr-noun}} doesn't already provide already but only bloat the site. --Fytcha (talk) 18:16, 6 December 2019 (UTC)Reply

@Fytcha: There are a lot more forms in the table than just the definite accusative and the plural (ışık, ışığı, ışıklar, ışıkları, ışığa, ışıklara, ışıkta, ışıklarda, ışıktan, ışıklardan, ışığın, ışıkların, ışığım, ışıklarım, ışığımız, ışıklarımız, ışığınız, ışıklarınız), but they are hidden by default. You've got to click two "more" buttons on the right side of the table to see them. — Eru·tuon 18:22, 6 December 2019 (UTC)Reply

Another Rustacean :)

I noticed that you are working in Rust. It has become my favourite language recently, although for Wiktionary bot work I still use Python. —Rua (mew) 11:01, 9 December 2019 (UTC)Reply

I've become quite fond of it as well, and now often miss features like return values from blocks and match blocks when programming in Lua. — Eru·tuon 19:36, 9 December 2019 (UTC)Reply
@Rua, Erutuon: I'm interested in things you dislike about Rust. I looked at it a while ago, and there was a lack of libs for doing standard stuff (talking to a database etc.), but that's probably changed in the meantime. - Jberkel 00:26, 10 December 2019 (UTC)Reply
Yeah, the development is going pretty fast. Not just the language itself, but library infrastructure as well. —Rua (mew) 10:14, 10 December 2019 (UTC)Reply

If you ever have time

I hate to bother you all the time. If you ever have time, could you check el:Module:sarritest The only person in el.wikt who knew Lua is now a 'vanished' user. sarri.greek (talk) 00:00, 11 December 2019 (UTC)Reply
Thank you so much! sarri.greek (talk) 18:48, 11 December 2019 (UTC)Reply

@sarri.greek: Let me know if you need any more help or further explanation. — Eru·tuon 18:51, 11 December 2019 (UTC)Reply
The basic ideas of Lua, I cannot grasp. I have tried all kinds of combinations of the words 'local', 'frame', but I cannot make the collective function.main work. It is just an excercise, it is not important.
One general question, if i may: When we have a module which produces declensions automatically like el:Module:κλίση/el/ουσιαστικό, is it better/preferable to do all the paradigms IN the Module? Or create wikitext Templates with the parameters for the endings? They are so many! and the Module page becomes so long! sarri.greek (talk) 16:08, 13 December 2019 (UTC)Reply
It turns out I had reversed the logic for getting args. That's not uncommon with me.
Do you mean separate templates for each declension? I suppose either way works, but I like to be able to edit all the paradigms at once and compare them, so having them in a single module helps. For Ancient Greek, the module is Module:grc-decl/decl/staticdata/paradigms. If each is in a separate template, then there are more pages to edit. — Eru·tuon 19:04, 13 December 2019 (UTC)Reply
Thank you SO much. For the many pages of paradigmata: I was worried about what is best for ...errr... you call some actions 'expensive' or bad, or not good. I will study the examples you have shown me. sarri.greek (talk) 19:09, 13 December 2019 (UTC)Reply
Ahh, I see. I'm not sure which is least expensive in memory and Lua processing time. — Eru·tuon 19:20, 13 December 2019 (UTC)Reply

Req

Hi Erutuon. Can you run a bot to do this:

moving translations with ku code and Latin script to kmr code and Northern Kurdish dialect
moving translations with ku code and Arabic script to ckb code and Central Kurdish dialect

also this:

changing translations with ku code and Latin script to kmr code
changing translations with ku code and Arabic script to ckb code

also we shouldn't allow ppl to add translations with ku code; they should use Kurdish dialects codes (kmr, ckb, ...) instead of using ku code directly. Thanks.--Calak (talk) 16:50, 13 December 2019 (UTC)Reply

Hmm, I know how to identify scripts, but don't have a method to modify translations yet. I can at least make a list to start with. — Eru·tuon 08:13, 14 December 2019 (UTC)Reply
Oh, no! You don't need to modify translations, you should change "ku" code to "ckb" or "kmr" per its script.--Calak (talk) 11:15, 14 December 2019 (UTC)Reply
Right, by modifying translations I mean changing moving translations from "Kurdish" to "Northern Kurdish" etc. while using the correct format (the first diff). For that, it would be nice to have a method that would move translation x from language a to language b and format everything correctly. It seems complicated though. Perhaps someone else has worked this out already. But I might be able to change language codes easily (the second diff). — Eru·tuon 22:25, 14 December 2019 (UTC)Reply
OK. How about to prevent people from using ku code in translations? Can you add a code (in TranslationAdder gadget) to do this?--Calak (talk) 16:19, 15 December 2019 (UTC)Reply
@Calak: Hmm, perhaps the TranslationAdder could suggest inserting the translation under ckb, kmr, or sdh instead of ku? I might be able to figure out how to do that but I've mostly stayed away from that gadget because its code confuses me. — Eru·tuon 09:14, 17 December 2019 (UTC)Reply

It is OK Erutuon. I will be thankful if you can apply any one of them.--Calak (talk) 07:12, 21 December 2019 (UTC)Reply

Reverted Edit

Hello, it is not an "odd alternative pronunciation". Several million people pronounce it that way, whereas the mispronunciation of "decade" has about five variants on the site for about 10 speakers. ABAlphaBeta (talk) 08:39, 17 December 2019 (UTC)Reply

@ABAlphaBeta: I'm sorry for my hasty reversion. I've restored the alternative pronunciation that you probably meant (as User:Mellohi! pointed out to me), but moved it into {{fr-IPA}}: {{fr-IPA|écuidistant|équidistant}}. I know very little about the fine details of French pronunciation and you may be right. Words with équi- (or ultimately derived from aequus) are transcribed with either /e.kɥi/ or /e.ki/ on Wiktionary, and while the soundfiles of équidistant on the French Wiktionary and on Forvo has /e.kɥi/, perhaps some people pronounce it with /e.ki/ like équilibre and other words because it may be as confusing for French speakers as it is for foreigners like me. — Eru·tuon 09:10, 17 December 2019 (UTC)Reply

Deletion reasons

Hi. In October, you added "Incorrect title: a mixture of Latin- and Cyrillic-script characters". Do you think this could be merged into the existing "Bad entry title"? How do they differ? Equinox 08:05, 20 December 2019 (UTC)Reply

@Equinox: Well, it's certainly a subtype, but I prefer to be clear since it's not always easy to see what's wrong with the title. I was thinking maybe something like "mixed script" or "incorrect lookalike characters" would work as well. At the time there was a backlog of these titles, and I was getting tired of re-entering the deletion reason since the "content: ..." bit prevented the input box history from working. But perhaps it won't be needed now that there's this abuse filter. It displays a message showing which characters are in which script, which seems to enable editors to create the entry at the right title, so there aren't any new badly titled entries to delete. — Eru·tuon 08:33, 20 December 2019 (UTC)Reply
Yeah, went and removed it. — Eru·tuon 08:56, 20 December 2019 (UTC)Reply

Help needed at simple.wikt

Hi Erutuon, can you help me with the Lua Module:number list on simple.wikt? Minorax (talk) 05:10, 29 December 2019 (UTC)Reply

@Minorax: Sure... I did fix one problem that caused a module error. — Eru·tuon 05:30, 29 December 2019 (UTC)Reply
So that was the problem, forgot about that. Thank you :) Minorax (talk) 05:37, 29 December 2019 (UTC)Reply
And since simple.wikt only contains English words, Module:number list/data/en isn't really needed as a subset of the module, is it possible to merge it into the main module? Minorax (talk) 05:41, 29 December 2019 (UTC)Reply
It's possible, but I wouldn't recommend it. Putting data in the main module adds many lines, making it harder to edit, and if you want to keep the Simple Wiktionary module in sync with the English Wiktionary module, it will be harder to copy code. — Eru·tuon 05:51, 29 December 2019 (UTC)Reply
Alright :) Minorax (talk) 05:52, 29 December 2019 (UTC)Reply

Wiktionary:Todo/multiword Spanish lemmas with a hyphen

Hey. After your excellent page Wiktionary:Todo/multiword Spanish lemmas not idiom or proverb, I'd like a request of all Spanish entries with a hyphen. There shouldn't be many entries on the list, as Spanish doesn't use them so much. Thanks in advance, anyhow. --ReloadtheMatrix (talk) 19:35, 1 January 2020 (UTC)Reply

@ReloadtheMatrix: Done because I have files of all entry names for all languages. Whoops, that was wrong, it's supposed to be lemmas. Fixed. — Eru·tuon 19:43, 1 January 2020 (UTC)Reply
Awesome. You rule. Any chance of having the prefixes and suffixes removed? --ReloadtheMatrix (talk) 19:49, 1 January 2020 (UTC)Reply
Done. — Eru·tuon 19:56, 1 January 2020 (UTC)Reply

Gah, the search engine includes results for redirects, which is why dimensional was in the list (-dimensional redirects to it). [Edit: Anyway, fixed.] — Eru·tuon 20:34, 1 January 2020 (UTC)Reply

Adding a pronunciation table for Albanian

Hello,

I'd like to ask you whether you could add a pronunciation table for Albanian with the same structure as the Ancient Greek pronunciation table. I could also provide you with the content for doing so. Apart from that, I'd like to know how links may be added to a template without having to place linking brackets around every term encompassed by it. HeliosX (talk) 17:14, 2 January 2020 (UTC)Reply

@HeliosX: What Ancient Greek pronunciation table are you referring to? And what sort of template are you talking about? — Eru·tuon 19:36, 3 January 2020 (UTC)Reply
I meant the current Ancient Greek pronunciation table that requires the letters to be entered in the page and, for example, this template. There should be, for instance, a link to "ali" and the noun ending in "-ã" or "-i" even though the terms are separated through "ale" in the declension table. It is not sure whether "ali" and the noun ending in "-e" should be linked because the usage of [e] or [i] in positions that allow both is usually somewhat similar and phonologically coherent. HeliosX (talk) 20:00, 3 January 2020 (UTC)Reply
@HeliosX: Do you {{grc-IPA}}? — Eru·tuon 20:07, 3 January 2020 (UTC)Reply
Yes, I meant this one. HeliosX (talk) 20:08, 3 January 2020 (UTC)Reply
Ahh, I see. I was confused because "table" made me think of Appendix:Greek pronunciation. I could probably make a pronunciation template for Albanian. I'm not very familiar with Albanian, so I would have to use any information that you can provide, and w:Help:IPA/Albanian and w:Albanian phonology.
I still don't understand the problem with {{rup-noun-f-ã}}. I also don't understand why there are so many forms in each cell in the table. Does every noun of this type have two indefinite plurals, one in -i and one in -e? — Eru·tuon 20:18, 3 January 2020 (UTC)Reply
Thank you for any possible aid with this. I'd have to divide the information about Albanian phonology as far as I'm concerned and as I've gotten to know into three IPA tables.
Firstly, the terms of Standard Albanian, which is mostly the same as Tosk, should have three major IPA rows. The first row would be Tosk and its phonemes are all given in the second phonology overview that you were referring to. However, the vowel [ə] is only pronounced when being stressed, in the first syllable of a word or if the word ends with a consonant after the vowel [ə]. The pronunciation due to position in the first syllable applies as well to any terms that are derived so that it is realized always in asnjë and asgjë. Contrastingly, only in the accusative forms atë, këtë, dikë, çdokë, askë and kurrkë and some terms beginning with "atë-" it may be pronounced even in the Tosk rendition of Standard Albanian. Also, the letter "r" is realized either as [ɽ] or often [ɹ] whereas [ɾ] probably does not occur. Hence, there could be a first pronunciation only with [ɽ] and, in the same row, a second pronunciation solely with [ɹ] in addition to denoting that, furthermore, [ɽ] and [ɹ] can be intermixed in a single word. Another matter concerns itself with "ë" that might also be pronounced as [ʉ] but that should only be noted next to the IPA row. The establishment and attribution of this phoneme is also a bit insecure but I've taken note of it.
The orthography of Albanian is based on the Korçan dialect of Tosk and, despite not having very many speakers at all, it should be included in the second row because it provides an explanation of the orthography. The vowel [ə] is pronounced everywhere but its speakers may not do so frequently in consideration of having learnt the general phonology of Standard Albanian, omitting these vowels in quite many positions. Nowadays, it seems that "r" is only realized as [ɹ].
Even though Gheg frequently may have its own variants for Standard Albanian vocabulary and grammar, its speakers also employ Standard Albanian and would pronounce it differently. Making only the distinction to the pronunciation of the latter in Tosk, the letter "r" has got the phoneme [ɾ], the affricates [t͡ʃ] and [d͡ʒ] can be extended to "gj" and "q", allowing two variants to be placed into the same IPA row.
In words that do not belong to Standard Albanian but only to Tosk, a second IPA table with the realization in its own dialect includes [c] and [ɟ] for the letters "q" and "gj" apart from the affricates [c͡ç] and [ɟ͡ʝ] in a single row. Those words don't have any pronunciation in Gheg but as well in the Korçan dialect.
In words of Gheg Albanian according to its own pronunciation, not including the other dialects, the information about vowels from this article can be continued for the third IPA table. Nevertheless, "ë" is still realized as [ə] unless the orthography shows that it has been altered. It can be denoted that it may also be pronounced as [ʌ] like in another dialect of Tosk but it does not have to be written into the pronunciation itself. Additionally to the Gheg pronunciation characteristics already entailed by the first IPA table, the consonantal clusters "nd" and "mb" can be pronounced as [nd] or [ⁿd] and [mb] or [ᵐb] and, written differently but derived from those, "n" and "m" as [n] or [nˠ] and [m] or [mˠ]. In order to differentiate the instants of "n" and "m" only as variants for "nd" and "mb" it should perhaps be recognized whether the term is a variant, referring to the templat usage, of a term that has "nd" or "mb" in the position of "n" and "m". The characters "q" and "gj" are not realized as [c͡ç] and [ɟ͡ʝ], but, in extension of their other possible pronunciation, also as [t͡ɕ] and [d͡ʑ]. The consonant [h] is sometimes weakened in particular when not being word-initial and, apparently, [l] can be palatized into [lʲ] at least before [ə], [ɔ] and possibly [o].
In Aromanian, both indefinite plurals may be formed. I would need links for each form that is phonetically close so that those would be, giving just an example, ali featã, ali feati and ale featã, ale feate even though there is written only "ali, ale featã, feati, feate" in the declension table. HeliosX (talk) 18:29, 6 January 2020 (UTC)Reply
I don't know how to display "ali, ale featã, feati, feate" but link to ali featã, ali feati, ale featã, ale feate. Which words would link to which entries? "ali" could link to either ali featã or ali feati, and "featã" could link to either ali featã or ale featã.
In general if these are just phrases like "to the girl", we would not give them their own entries, and each word would be linked separately – ali, ale featã, feati, feate – as in the declension table in θεός (theós) where the forms of the definite article (ho) are linked separately from the forms of θεός (theós). That removes the problem of how to show individual words but link to phrases. But I am just guessing that ali and ale mean "to the", because they don't have entries yet. (I also don't know what the different final vowels mean.) — Eru·tuon 07:28, 6 January 2020 (UTC)Reply
They almost can't be used without a, ali or ale, I found it without any separate particles of the genitive and dative cases for example in "Soarili, cã s-avea disprãs di surorili-a lui, dzenili di munti, iu chindurea cathi tahina, di li-adutsea ghiumi-mplini di lunjin, ta si-sh speal fatsa di liatsa noptsljei" in "Lunjina dit sinduchi" by Aromanian writer Dini Trandu but the author evidently employs those and a might simply have been blended together with the definite form of liatsã as this would have resulted in "liatsa-a noptsljei" according to the author's orthography. The vowels [e] and [i] can be used both and I think that the latter has been influenced maybe by Greek phonology and grammar with "-i" as frequent ending of feminine declensions. However, they could be regarded in the same way as the Albanian article used along with the genitive exoclitics, which are not included in the entries that are linked to. HeliosX (talk) 23:42, 6 January 2020 (UTC)Reply
Well, even if the genitive or dative case form doesn't occur without these other words, we don't give entries to phrases unless their meaning is not sum-of-parts as explained in WT:SOP – for example if the meaning of ali featã is not a combination of the meaning of ali and the meaning of featã. — Eru·tuon 22:06, 6 January 2020 (UTC)Reply
Having reconsidered this in comparison to the linkings in the Albanian declension tables, I'd agree that these particles don't have to be linked. HeliosX (talk) 16:07, 8 January 2020 (UTC)Reply
Well, I should clarify – I suggested that a, ale, ali should be linked separately from the noun forms, like the Ancient Greek definite articles in the declension table of θεός (theós): ali, ale featã, feati, feate. Regarding the Albanian pronunciation template, I will try to get to it eventually. I have some other projects that I'm working on at the moment. — Eru·tuon 08:31, 9 January 2020 (UTC)Reply

Toilbot unusual edit

[2] DTLHS (talk) 00:32, 5 January 2020 (UTC)Reply

Thanks. My regex to match a PoS header followed by a headword-line template wasn't good enough. — Eru·tuon 00:45, 5 January 2020 (UTC)Reply

Changing all derivations from Proto-Albanian

Hello,

maybe you could use a tool for multiple edits if such tool has been devised or a programmed account to change all these instants of derivations from Proto-Albanian to inheriting. HeliosX (talk) 16:35, 6 January 2020 (UTC)Reply

Barnstar!

The da Vinci Barnstar
For helping us create a smart template in the Further reading of Hungarian-language entries. Thank you so much. Adam78 (talk) 22:39, 13 January 2020 (UTC)Reply

update

Hey. Can you update User:Erutuon/abbreviation headers at the next dump please? I estimate it will be around 28% the size of the current page. --Yesyesandmaybe (talk) 10:45, 18 January 2020 (UTC)Reply

@Yesyesandmaybe: Yep! It's in the script that updates the other header pages. — Eru·tuon 19:23, 18 January 2020 (UTC)Reply

Module errors from edits to documentation submodules

Please check CAT:E. Chuck Entz (talk) 17:39, 20 January 2020 (UTC)Reply

@Chuck Entz: Fixed. Thanks. I wish I'd caught it earlier. — Eru·tuon 19:15, 20 January 2020 (UTC)Reply
Well, at least the pages with the errors aren't where a lot of people would see them. It's not a big deal, but the sooner something like this is fixed, the better. Glad I could help. Chuck Entz (talk) 19:22, 20 January 2020 (UTC)Reply

Etymology at epigone.

Hello, Erutuon. I wonder if you will take a moment to visit the English language epigone page when you are able, and check on what I suspect might be an error in the etymology given there. I believe the statement within the Etymology there, that ἐπίγονος comes "from ἐπιγίγνομαι" to be incorrect, as it suggests that γόνος is derived from γίγνομαι. Rather, I think that γόνος, as did γένος, entered Ancient Greek more directly as a lemma from earlier IE sources, instead of being derived from γίγνομαι (please note the Etymology at γόνος, wherein that is indicated, and wherein γόνος is indicated to be merely the equivalent of γίγνομαι + -ος). This is much the same in Latin, wherein the noun genus cannot be said to be a derivative of the verb gigno, but rather, that it is a related word with both deriving from separate IE lexemes. It seems to make more sense to me that the noun ἐπίγονος should be derived as is shown on its page, rather than from ἐπιγίγνομαι. As for myself, I am loath to change any existing etymologies, as I am really not that learned in linguistic history, and so would like to have your more experienced eyes on this (I believe it was Victar who rightly "slapped me down" on an earlier foray of mine into the IE realm). I thought that, instead of just including an etymology template on the page, I might rather just bring it to the attention of someone who probably can assess the etymology properly. Thanks. — This unsigned comment was added by 68.112.86.146 (talk) at 19:33, 20 January 2020 (UTC).Reply

Redirect problem

[3] DTLHS (talk) 16:19, 22 January 2020 (UTC)Reply

@DTLHS: Thanks. I'll exclude redirects and look for the other redirects that my bot messed up. — Eru·tuon 19:38, 22 January 2020 (UTC)Reply

Requested edits

You reverted my edit on Wikitionary:Requested entries because there is no page for Yogotti, but I was told that Wikitionary:Requested entries was the place you request new words. WikitionaryGuy (talk) 23:26, 22 January 2020 (UTC)Reply

@WikitionaryGuy: You must have been misinformed. Wiktionary:Requested entries links to the pages where you post requests. In this case, if Yogotti is an English word, you would post it in Wiktionary:Requested entries (English). — Eru·tuon 00:35, 23 January 2020 (UTC)Reply

ToilBot "Normalizing" Vandalism

Is there any way you could have your bot avoid normalizing entries that have been edited too recently? I keep finding cases where someone vandalizes an entry and ToilBot tidies it before any patrollers can get to it- thus blocking it from the rollback tool. The only way around that is undoing via the edit history, which is slower and much less convenient. Chuck Entz (talk) 04:19, 23 January 2020 (UTC)Reply

@Chuck Entz: Sure. That's pretty annoying. I'll work on a way to skip pages that have been edited within a certain number of hours before I run the script on a large number of pages again. — Eru·tuon 04:41, 23 January 2020 (UTC)Reply
@Chuck Entz: Update: now the script finds pages whose most recent edit is in Recent Changes, and it starts from the oldest edits in Recent Changes and stops at edits from 12 hours ago, if it gets that far. I might change the start date because the oldest edits in Recent Changes are from 1 month ago, and some pages are probably edited more often than that. But do you think 12 hours is enough time? — Eru·tuon 19:18, 20 March 2020 (UTC)Reply
I would be more comfortable with 24 hours, but there are others who do more rollbacks than I do- @SemperBlotto, @Surjection and @Robbie SWE, to start with. Chuck Entz (talk) 20:33, 20 March 2020 (UTC)Reply
24 hours would probably be enough for me. — surjection?23:05, 20 March 2020 (UTC)Reply
Okay, I've changed it to 24 hours margin for vandal-fighting. — Eru·tuon 23:48, 20 March 2020 (UTC)Reply

Esperanto ordinal numbers

I see you worked on Module:eo-headword and also applied protection to the page. Could you help me at Wiktionary:Grease_pit/2020/January#Esperanto_ordinal_numbers? I can't edit the page myself. Robin van der Vliet (talk) (contribs) 15:41, 24 January 2020 (UTC)Reply

ἵημι problem

Hi! In ἵημι the "Aorist: εἵμην" misses the first three persons of the indicative, although in the wikitext they are present; could you please check why don't they appear? Thank you very much! --Epìdosis (talk) 12:06, 31 January 2020 (UTC)Reply

@Epìdosis: I see the forms missing in both the header and inside the table. That's because the singular uses first-aorist forms, ἧκᾰ, ἧκᾰς, ἧκε(ν), which are shown in a different table because of the limitations of {{grc-conj}}. And so {{grc-conj}} shows the first-person singular indicative middle εἵμην (heímēn) in the header. — Eru·tuon 21:57, 31 January 2020 (UTC)Reply

Ops, my error! Thank you very much, --Epìdosis (talk) 21:59, 31 January 2020 (UTC)Reply

Update 2

Hey. Can you gimme another update of User:Erutuon/abbreviation headers at the next dump? I reckon about 60% of the terms have since then been corrected (at least in the Abbreviations subpage anyway), and I find myself visiting pages I've already corrected. TIA --AcpoKrane (talk) 11:58, 18 February 2020 (UTC)Reply

@AcpoKrane: Yep, I'll update it when the right dump files come out, as usual. — Eru·tuon 23:50, 22 February 2020 (UTC)Reply
Done. Just realized I forgot to do it after the last dump (2020-02-01). — Eru·tuon 23:48, 23 February 2020 (UTC)Reply

Nesting in translations

Hi,

Do you know, which module contains the nesting? So that if you add, e.g. a Kurdish translation, you can add "Kurdish/Kurmanji" in the "Nesting"? --Anatoli T. (обсудить/вклад) 02:33, 26 February 2020 (UTC)Reply

@Atitarev: Yes, that's in MediaWiki:Gadget-TranslationAdder-Data.js, under var nesting = {. — Eru·tuon 23:06, 26 February 2020 (UTC)Reply
Thanks but it's not obvious to me how language code "ku" allows nesting "Kurdish/Kurmanji". I'd like to fix Eastern Mari ("chm") as "Mari/Eastern Mari", add a Mongolian nesting "Mongolian/Uyghurjin". --Anatoli T. (обсудить/вклад) 00:05, 27 February 2020 (UTC)Reply
@Atitarev: Right, MediaWiki:Gadget-TranslationAdder-Data.js only controls nesting that is automatically generated by the TranslationAdder gadget; by editing source code manually, anyone can nest any language any way they want, and that's where the Kurdish/Kurmanji nesting for ku comes from. I think "Mongolian/Ughurjin" requires a different mechanism, which may not exist, because the nesting table in MediaWiki:Gadget-TranslationAdder-Data.js is by language code; it doesn't describe any sub-nestings for writing systems. I'm guessing that the "Serbo-Croatian: Cyrillic: ... Roman: ..." that is in quite a few translations sections was added manually, not by the gadget. — Eru·tuon 00:36, 27 February 2020 (UTC)Reply
Thanks. Any language will allow "language name"/Cyrillic or "language name"/Roman. I have fixed the "Eastern Mari" nesting and it seems I can just use Mongolian/Ughurjin or Mongolian/Cyrillic if there is no Mongolian translation present. --Anatoli T. (обсудить/вклад) 04:49, 27 February 2020 (UTC)Reply
Okay. I don't see any way to do "Mongolian/Ughurjin" in the translation adder (and wouldn't be able to add that capability), but if that's not necessary, great. — Eru·tuon 19:06, 27 February 2020 (UTC)Reply

Wiktionary:Todo/multiword Spanish lemmas not idiom or proverb update

Hey E. Can you rerun Wiktionary:Todo/multiword Spanish lemmas not idiom or proverb after the next dump? I linked, over the space of 4 and a bit months, all of the decent entries in there. What I'm looking for exactly is all NEW multiword entries made since the original list, so after making it, would you be able to remove all entries which appear in the original list? Only then will I be able to say that my quest has been completed. Thanks in advance --AcpoKrane (talk) 09:00, 27 February 2020 (UTC)Reply

@AcpoKrane: I just used a bot script, so no need to wait. This should be it. — Eru·tuon 23:10, 27 February 2020 (UTC)Reply
That's just beautiful. --AcpoKrane (talk) 11:41, 28 February 2020 (UTC)Reply

Day to Days

How to I change the descendant trees ?

https://en.wiktionary.org/wiki/Reconstruction:Proto-Germanic/dagōs Personisgaming (talk) 15:48, 18 March 2020 (UTC)Reply

Sure, you got it!

Nobody else edits as fast around here (except Equinox, of course). Anyway, if I get blocked before I'm done, would you mind adding {{audio|en|LL-Q1860 (eng)-Vealhurl-{{subst:PAGENAME}}.wav ‎|Audio (UK)}} in the Pronunciation section to all of these words that I recorded today? That would allow me to do other stuff, like, Spanish idioms or nominating people for adminship. --Gorgehater (talk) 22:30, 27 March 2020 (UTC)Reply

Templatehoard

Do you still update it? I'd like to generate some new wanted entry lists. – Jberkel 09:36, 5 April 2020 (UTC)Reply

@Jberkel: Updated. (I need to figure out how to streamline the process; it's kind of tedious running all the commands.) I tried running the wanted entry script after the 2020-03-01 dump came out, but the first command failed. — Eru·tuon 23:41, 5 April 2020 (UTC)Reply
Thanks! Maybe use a simple Makefile to automate the commands? I'll take a look, sometimes there are resource-related problems, unlike Rust Java needs a lot of memory :) – Jberkel
Ok, all regenerated. It was a silly bug in the CBOR deserialization. – Jberkel 22:29, 7 April 2020 (UTC)Reply
@Jberkel: I made a Makefile and it's now much easier to generate the template dump and entry index: just a single command for each. — Eru·tuon 21:48, 23 April 2020 (UTC)Reply
Cool, I'll renegerate the pages. – Jberkel 14:15, 25 April 2020 (UTC)Reply

User:ToilBot worsened paadje

Why did User:ToilBot worsen my contribution [4]? If you don't mind, I would like to revert it. There may be many cases where "usage case" is incorrectly used, but this wasn't one of them. It was just one sentence, that should be a hint for your bot to not touch it. --85.148.244.121 06:04, 11 April 2020 (UTC)Reply

Do not revert it. We have standardised headers, which allows us to keep track of the millions of pages on the wiki. Think of it this way: in an idealised, complete entry, there may be many relevant usage notes, or there may only be one, but all usage notes will be under the header 'Usage notes'. —Μετάknowledgediscuss/deeds 06:08, 11 April 2020 (UTC)Reply
That's OK and why I asked it, but are we still allowed to call us "the English-language Wiktionary" [5] if we refuse to speak English and even have bots to remove English from content which uses it? In an idealised English-language wiktionary, we would be writing English (and that still has plural and singular, if that changes the undeclined word would probably win). On the other hand, I don't even speak standard English very well (Sassenach for Alba); for me, it's OK, I just asked. --85.148.244.121 07:45, 11 April 2020 (UTC)Reply
Well, "Usage notes" looks like English to me – certainly not Klingon at least. It does strictly speaking violate the rules of grammatical agreement in paadje, but Wiktionary can do what it wants because there's no Académie Anglaise to punish it for crimes against English grammar. More seriously, it would be a headache to try to make the headers agree in number with the contents of the sections, and it would make entries a bit less machine-readable, so Wiktionary has chosen one grammatical number for each header ("Usage notes" in plural, "Pronunciation" in singular) and I enforce it. This is the current convention, and changing it now might cause various bots and tools to break. — Eru·tuon 08:35, 11 April 2020 (UTC)Reply

Update to {{en-conj-simple}}

If you have time, I was wondering if you would see if {{en-conj-simple}} could be tweaked so that the archaic second person singular present tense (for example, walkest) and archaic third person singular present tense (walketh) forms could be made into links that, if clicked on, would create the inflections in an accelerated manner, in the way that it works with {{en-verb}}. There might have to be a warning somewhere that editors should check whether these verb forms are attestable. This isn't urgent. — SGconlaw (talk) 16:52, 12 April 2020 (UTC)Reply

@Sgconlaw: I've added the second-person singular past-tense form (-edst) and made the table unconditionally link the forms, because up till now they were only linked if the target page existed; linking to nonexistent pages is a requirement for adding acceleration. I think I'll add acceleration to all the forms, not just -eth and -est, as none of them have it yet. — Eru·tuon 20:07, 20 April 2020 (UTC)Reply
Thanks. I had no idea the -edst form existed. The format looks odd, though (what’s the significance of the two columns in the “past tense” section?) – perhaps it should match the present tense column? — SGconlaw (talk) 20:10, 20 April 2020 (UTC)Reply
The past-tense columns were basically "modern" and "Elizabethan", but I've changed it to the format of the present-tense column. — Eru·tuon 20:45, 20 April 2020 (UTC)Reply
@Sgconlaw: Okay, finished the process. Added new acceleration protocols to Module:accel/en for the archaic forms. Let me know if you notice any problems. — Eru·tuon 21:07, 20 April 2020 (UTC)Reply
Is the acceleration working? I clicked on the links cherishest and cherishedst (note: not saying these words exist) in the sample on the documentation page, and they just led to blank pages. — SGconlaw (talk) 04:22, 21 April 2020 (UTC)Reply
@Sgconlaw: Those links work for me. Do you have the acceleration gadget enabled in your preferences (search for "accelerated creation links" on the page)? — Eru·tuon 05:13, 21 April 2020 (UTC)Reply
Accelerated links in {{en-verb}} work for me. Didn’t know I had to do something extra for these – will check. — SGconlaw (talk) 05:18, 21 April 2020 (UTC)Reply
Okay, {{en-conj-simple}} should work if {{en-verb}} does. Oh, the problem is that acceleration doesn't work in the template namespace. Try clicking the links in the conjugation table in cherish instead. — Eru·tuon 05:30, 21 April 2020 (UTC)Reply
Ah, that was the issue. Yes, it's working fine! Thanks again. — SGconlaw (talk) 07:57, 21 April 2020 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── Should this extra pipe be removed? — SGconlaw (talk) 19:11, 21 April 2020 (UTC)Reply

@Sgconlaw: Huh. That definition should be {{en-archaic second-person singular past of|translate}}. Aha, here was the problem. — Eru·tuon 19:19, 21 April 2020 (UTC)Reply
SGconlaw (talk) 04:33, 22 April 2020 (UTC)Reply

Chaucer quotes in English section

Hey. Could I get a list of all Chaucer quotations in the English (but not Middle English) section of an entry? It's because they shouldn't be there, they should be in Middle English. You could put it at Wiktionary:Todo/English Chaucer. Thanks in advance --Vitoscots (talk) 17:25, 20 April 2020 (UTC)Reply

@Vitoscots: You already saw, but I made the page. It includes things besides quotes, but it has excerpts so you don't have to waste your time visiting the entry. — Eru·tuon 23:45, 20 April 2020 (UTC)Reply
Love ya, Eru! --Vitoscots (talk) 00:17, 21 April 2020 (UTC)Reply

Chaucer list for Shakey and Milly

Hey. Can we get a list of undated Milton and Shakespeare quotes? I guess looking for

We already have those wheels, in two forms:
  1. Search for, eg, 'hastemplate:"rfdate" insource:/rfdatek\|en\|Chaucer/'
  2. Use categories like Category:Requests for date/Chaucer.
HTH. DCDuring (talk) 15:00, 21 April 2020 (UTC)Reply
@DCDuring: Yeah, that works for {{rfdatek}}, which has the author in the template, but {{rfdate}} doesn't (though you can find examples of {{rfdate}} applied to Shakespeare, for instance, among the search results for hastemplate:"rfdate" Shakespeare); for instance in drug:
#* {{rfdate|en}} {{w|William Shakespeare}}, ''{{w|Timon of Athens}}''
#*: Hadst thou, like us from our first swath, proceeded / The sweet degrees that this brief world affords / To such as may the passive '''drugs''' of it / Freely command, thou wouldst have plunged thyself / In general riot {{...}}
Also I think Wonderfool has an enthusiasm for lists. They help keep him motivated because he can check things off and write down how much work is left.
@Vitoscots: I'll see what I can do. It's more complex than the previous Chaucer request. Gotta figure out what the format typically is. — Eru·tuon 17:49, 21 April 2020 (UTC)Reply
Yeah, you gotta keep your volunteers motivated, boss. --Vitoscots (talk) 17:51, 21 April 2020 (UTC)Reply
Either alternative technique yields lists from which the completed items disappear, which provides even more motivation. And what about my motivation, having added nearly ten thousand instances {{rfdatek}} and {{rfdate}} only now to have WF reject my handiwork? DCDuring (talk) 18:33, 21 April 2020 (UTC)Reply
I didn't reject your handiwork, DCD. I was attacking your rfdefs with my steely knife. --Vitoscots (talk) 19:40, 21 April 2020 (UTC)Reply
I don't get it. Seems like my making a list is a good way to make your work come to fruition (with dates finally being added)! — Eru·tuon 21:19, 21 April 2020 (UTC)Reply
Your Chaucer list was less selective than "my" lists, so you must have essentially ignored the presence or absence the rfdate and rfdatek templates and the resulting categories. DCDuring (talk) 23:41, 21 April 2020 (UTC)Reply
@DCDuring: Ahh, I see what you mean now. I thought you were talking about the Milton and Shakespeare lists. The purpose of the Chaucer list is to catch Chaucer quotes that need to be moved from the (Modern) English to the Middle English entry, so yeah, {{rfdate}} and {{rfdatek}} aren't involved. (There were false positives because I just searched for "Chaucer" in English sections without trying to figure out if it was the author of a quote, or if it was the Chaucer as opposed to another Chaucer.) But the Milton and Shakespeare lists are only occurrences in conjunction with {{rfdate}} so they are making use of your work inserting {{rfdate}}. — Eru·tuon 00:39, 22 April 2020 (UTC)Reply
I see. All of the Chaucer quotes now in English should be in Middle English. We had long been accommodating an excellent contributor who thought Middle English quotes, even of alternative forms should appear in the entry of the English descendant. Having the dates should make it especially obvious. BTW, it would be nice to locate each quote in the manuscript fragment it was found in. I think there are four of them, but I haven't seem dates for the fragments. BTW, you have seen how many authors there are with rfdatek and rfquotek categories, right? DCDuring (talk) 00:49, 22 April 2020 (UTC)Reply
Who was the excellent contributor, out of interest? --Vitoscots (talk) 00:03, 23 April 2020 (UTC)Reply
@DCDuring: Yes, seems like a tremendous number. Maybe it would be useful to print the templates in some kind of list format, showing the definition under which {{rfquotek}} was placed, and the quote that {{rfdatek}} was placed on (somewhat like WT:Todo/Undated Milton and WT:Todo/Undated Shakespeare). Then people could more quickly look over the requests to find ones they can fill, without having to visit hundreds of pages and look over the text of them. The list could be put on a Toolforge site, though then it would be harder to give editors the satisfaction of crossing out the requests they had filled. — Eru·tuon 04:25, 24 April 2020 (UTC)Reply
To make easier yet, you could include a link to a search for the quote on Google Books (and Wikisource and Gutenberg?}. The searcher might still have to shorten the search string to find the original wording of the quote, but the job would often be very easy indeed. I had thought about that while adding all the templates, but I just wanted to get the ball rolling. DCDuring (talk) 04:58, 24 April 2020 (UTC)Reply
Okay, did the Shakespeare and Milton {{rfdate}} bit. I included the quotes in the list to make your job easier. — Eru·tuon 21:19, 21 April 2020 (UTC)Reply
Nice. It didn't take long to clean all those up. --Vitoscots (talk) 22:09, 24 April 2020 (UTC)Reply

Re: Category timestamp

Re your question on #wikimedia-tech, just checking: you are aware that the timestamp is not supposed to reflect when a category was added to a page, aren't you? See mw:Manual:Categorylinks_table#cl_timestamp. Timestamps are often updated en mass after some template change, for instance. That said, some SQL queries can often shed light on what's going on. Nemo 10:12, 22 April 2020 (UTC)Reply

@Nemo bis: Thanks, I wasn't aware of that dynamicpagelist used cl_timestamp for category additions. That explains why the list is sometimes random. — Eru·tuon 17:34, 22 April 2020 (UTC)Reply

Please

keep things under control from now on. I'm taking a long break --Vitoscots (talk) 19:33, 26 April 2020 (UTC)Reply

Most deleted pages

Was just reading this Special:Diff/47099083/59225779 and wondering if this can be queried – which pages have been deleted many times but do currently exist? Do we have that data? – Jberkel 19:03, 27 April 2020 (UTC)Reply

@Jberkel: Well, here is the "times deleted leaderboard", with a column indicating which titles actually exist. User talk:Equinox is at the top of the currently existing titles because Equinox doesn't believe in archiving his talk page... sigh... — Eru·tuon 19:38, 27 April 2020 (UTC)Reply
Great, thanks! lots of NSFW type words as expected but some interesting ones as well. – Jberkel 20:30, 27 April 2020 (UTC)Reply

WT:Todo/Undated Bible

Hi. Good work with WT:Todo/Undated Milton, by the way. Could we get something similar for WT:Todo/Undated Bible. It has been noticed that undated Bible quotations are all over this fricking website. For some reason, DCDuring (talkcontribs) never tagged them with {{rfdatek}} so they don't show up in the categories. --Elvinrust (talk) 22:43, 4 May 2020 (UTC)Reply

Quarrying

Also, it would be fun to see the league table for the most one-sided thank relationship (an unrequited-love list, if you will), like this where we can all see that I'm so meta even this acronym (talkcontribs) is totally stalking JohnC5 (talkcontribs) (951 thanks), but with a part where JohnC5's number of thanks to ISMETA (341) is deducted from that total. --Elvinrust (talk) 22:49, 4 May 2020 (UTC)Reply

Which makes me think of another fun list - most affectionate thank-couples (combined thank-totals...John as ISMETA will win that, hands down) --Elvinrust (talk) 22:51, 4 May 2020 (UTC)Reply
Ooh, and a list of the most aggressive relationships, coz I'm gonna make a documentary about it called Users who Revert Other Users. --Elvinrust (talk) 22:53, 4 May 2020 (UTC)Reply