[go: up one dir, main page]

Saltu al enhavo

Modulo:mchklngcode

El Vikivortaro
 MODULO
Memtesto disponeblas sur la dokumentaĵa subpaĝo.
Ĉi tiu modulo estas multfoje bindita.
Se vi konas la eblajn sekvojn, tiam vi povas zorgeme ekredakti.
Se vi ne kuraĝas redakti tiam vi povas proponi la deziratan ŝanĝon en la diskutejo.



--[===[

MODULE "MCHKLNGCODE" (check language code)

"eo.wiktionary.org/wiki/Modulo:mchklngcode" <!--2024-Aug-27-->
"id.wiktionary.org/wiki/Modul:mchklngcode"

Purpose: checks validity of 1 or 2 parameters that are supposed to
         contain language code in 2 stages: by testing whether they
         are obviously invalid, and if this does not apply whether
         they are known

Utilo: kontrolas validecon de 1 aux 2 parametroj kiuj enhavu
       lingvokodon en 2 pasxoj: testante cxu ili estas evidente
       nevalidaj, kaj se tio ne veras cxu ili estas konataj

Manfaat: mengontrol validitas 1 atau 2 parameter yang seharusnya
         berisi kode bahasa ...

Syfte: kontrollerar giltighet av 1 eller 2 parametrar som ska innehaalla
       spraakkod ...

Used by templates / Uzata far sxablonoj:
* deveno3 elpropra Lingvo t

Required submodules / Bezonataj submoduloj / Submodul yang diperlukan:
* "loaddata-tbllingvoj" T76 in turn requiring template "tbllingvoj" (EO)
* "loaddata-tblbahasa" T76 in turn requiring template "tblbahasa" (ID)

This module is special in that it takes parameters both those sent
to itself (own frame) and those sent to the caller (caller's frame).
This module needs parameters that are different from parameters
submitted to the calling template. Self-test is still possible.

!!! BEWARE control string is taken only if exactly one of xx= and yy=
!!! is available, and has correct length, otherwise silently ignored

!!! BEWARE allow digit in middle position via xx= is removed

Incoming: - 1 or 2 anonymous parameters
            - 1 or 2 parameters forwarded from the caller using "{{{1}}}"
              or "{{{ling}}}" or similarly, wall "|" is not needed,
              maybe "{{{ling|eo}}}" for an optional parameter,
              conversely "{{{ling|}}}" is bad
          - 1 optional named parameter
            * "yy=" control string with 8 char:s and 7 values, (one tristate
              letter, 6 boolean digits "0" or "1", and one separator "-",
              pattern ".1-11111")
              * (pos 0) desired type of result b t k
              * (pos 1) do check 2 codes 0 1
              * (pos 2) separator "-"
              * (pos 3) allow "-" 0 1
              * (pos 4) allow "??" 0 1
              * (pos 5) allow long codes such as "zh-min-nan" 0 1
              * (pos 6) allow digit in middle position of 3-letter codes 0 1
              * (pos 7) skip test against ban table 0 1
            * "xx=" control string with 5 char:s (one tristate letter,
              3 boolean digits "0" or "1", and one fourstate digit)             !!!FIXME!!! deprecated
              - tri-state letter : desired type of result (default is "b"):
                - "b" -- boolean (0 evil -- 1 tolerable) for conditional logic
                         in classic templates ("evil" is invalid,
                         "tolerable" is unknown or known)
                - "t" -- tristate (0 invalid -- 1 unknown -- 2 known)
                - "k" -- category (2 categories without EOL between them
                         if applicable, or empty string if known, see below)
              - boolean: do check 2 codes (by default only 1 code is checked)
              - fourstate digit: allow "-" or "??" (default "0")
                - "0" -- do not allow any
                - "1" -- allow "-"
                - "2" -- allow "??"
                - "3" -- allow both
              - boolean: allow digit THIS IS REMOVED NOW
              - boolean: do NOT disallow some common bad codes ("epo", "por",
                         ...) by ban table (default false ie do disallow)

Parameters accepted from both own and caller's frame:
          * 2 named optional hidden parameters
            * "detxt=true" (dec-encode AKA nowiki-encode the output and
              that way make the category insertions on error visible,
              any other value is ignored)
            * "nocat=true" (suppress categorization in "k" mode,
              any other value is ignored, also ignored if "detxt=true"
              since "detxt=" overrides "nocat=") !!!FIXME!!! deprecated

Returned: - "b" : string "1" if the parameters/codes are tolerable (known or
                  unknown), string "0" if the parameters/codes are evil
                  (obviously invalid), or this module itself becomes
                  victim of misuse
          - "t" : string "2" if the parameters/codes are known (both known),
                  string "1" if the parameters/codes are unknown but not
                  obviously invalid, string "0" if the parameters/codes
                  are invalid (at least one is obviously invalid), or this
                  module itself becomes victim of misuse
          - "k" : up to 3 categories (of 2 possible types) without EOL between
                  them, or empty string if the parameters are accepted,
                  no junk categories if this module itself becomes
                  victim of misuse

For "b" and "t" the principle is "the worst result counts", but for "k"
the 2 triples of categories are based on separate evaluations of the 2 codes.

The validity check for obviously invalid code
requires in order to return result "pass":
- must be 2 or 3 ASCII char:s long, and consist only of lowercase letters
  (optionally, digit in the middle position or long codes can be allowed)
- must not be on the ban list (this check can optionally be deactivated)
- optionally string "-" or "??" (but not "???") can be allowed in this
  stage (but still cannot be accepted later as "known")

La kontrolo pri evidente nevalida kode postulas
por redoni rezulton "tolerebla":
- longo estu 2 aux 3 ASCII signoj, kaj enhavu nur minusklajn literojn
  (opcie, cifero en la meza pozicio aux longaj kodoj povas esti permesitaj)
- ne trovigxu sur la forbara listo (cxi tiu kontrolo povas opcie
  esti senaktivigita)
- opcie signocxeno "-" aux "??" (sed ne "???") povas esti permesita en cxi tiu
  pasxo (sed dauxre ne povas esti akceptita pli tarde kiel "konata")

Note that the operation modes "b" or "t" and on the other side "k" are
separated and it is NOT possible to merge them. Result from "b" is fed into
"#ifeq" and possible categories would be ignored. Thus this module will be
usually called several times from one template, even with same language code.

Note that the result in boolean mode "b" is either "1" "accepted" or "0" "bad"
after this module has succeeded to run. But there is a third option "module
failed to run" due to not found or timeout for example. The conditional logic
in the calling template must be aware of this.

In the category mode "k" the format of the categories is:
* obviously invalid:
  * "[[Kategorio:Evidente nevalida lingvokodo]]"
  * "[[Kategorio:Evidente nevalida lingvokodo nome (Deutsch)]]"
  * "[[Kategorio:Evidente nevalida lingvokodo loke (deveno3)]]"
or
* uknown (unsupported by given wiki at the moment)
  * "[[Kategorio:Nekonata lingvokodo]]"
  * "[[Kategorio:Nekonata lingvokodo nome (haw)]]"
  * "[[Kategorio:Nekonata lingvokodo loke (deveno3)]]"
The reported detail string is sanitized for both incoming langcode
and peeked template name:
* replaced with "e-m-p-t-y" if empty
* otherwise truncated to max 14 octet:s and unsafe char:s are
  replaced with dot:s (safe are "0"..."9" "A"..."Z" "a"..."z"
  "!" "," "-")
The code does not have to be sanitized if it is only "unknown",
but must be if it is "obviously invalid", we sanitize always.

The name of the caller ie parent ie previous page in the calling chain
(presumably a template) is peeked automatically and only the core (without
namespace prefix) is taken. Note that this is NOT the same as "{{PAGENAME}}"
returning the very last page in calling chain (usually in NS ZERO).

If two codes are tested then two separate triples of categories can be
created of same type or of different types (one invalid and one unknown).

This module allows to mostly separate cases of "obviously invalid language
code" (for example "" (empty), "...", "Deutsch", "De", "FR", "taja", ...)
from "unknown language code" (for example "haw" that is valid according to
"ISO 639-3:2007" but might lack in the list of languages on given wiki)

{{hr3}} <!-------------------------------->

* #T00 (no params, evil)
* expected result: "0" (evil)
* actual result: "{{#invoke:mchklngcode|ek}}"

::* #T01 ("eo", default binary output, only 1 code is tested)
::* expected result: "1" (tolerable)
::* actual result: "{{#invoke:mchklngcode|ek|eo}}"

* #T02 ("eo|crap", default binary output, only 1 code is tested)
* expected result: "1" (tolerable)
* actual result: "{{#invoke:mchklngcode|ek|eo|crap}}"

::* #T03 ("eo|sv|id", 3 anon params, evil)
::* expected result: "0" (evil)
::* actual result: "{{#invoke:mchklngcode|ek|eo|sv|id}}"

* #T04 ("eo|xx=b0000", all 5 defaults explicitely confirmed, binary output)
* expected result: "1" (tolerable)
* actual result: "{{#invoke:mchklngcode|ek|eo|xx=b0000}}"

* #T04 ("eo|yy=b0-00000", all 8 defaults explicitely confirmed, binary output)
* expected result: "1" (tolerable)
* actual result: "{{#invoke:mchklngcode|ek|eo|yy=b0-00000}}"

::* #T05 ("eo|xx=b00000", parameter too long)
::* expected result: "1" (bad, parameter "xx=" ignored)
::* actual result: "{{#invoke:mchklngcode|ek|eo|xx=b00000}}"

::* #T05 ("eo|yy=b0-000000", parameter too long)
::* expected result: "1" (bad, parameter "yy=" ignored)
::* actual result: "{{#invoke:mchklngcode|ek|eo|yy=b0-000000}}"

* #T06 ("eo|xx=b2000", invalid digit "2" in boolean position)
* expected result: "0" (bad, parameter "xx=" rejected)
* actual result: "{{#invoke:mchklngcode|ek|eo|xx=b2000}}"

* #T06 ("eo|yy=b0-00200", invalid digit "2" in boolean position)
* expected result: "0" (bad, parameter "yy=" rejected)
* actual result: "{{#invoke:mchklngcode|ek|eo|yy=b0-00200}}"

::* #T07 ("eo|crap|xx=b1000", both codes are tested)
::* expected result: "0" (bad, latter code is invalid)
::* actual result: "{{#invoke:mchklngcode|ek|eo|crap|xx=b1000}}"

::* #T07 ("eo|crap|yy=b1-00000", both codes are tested)
::* expected result: "0" (bad, latter code is invalid)
::* actual result: "{{#invoke:mchklngcode|ek|eo|crap|yy=b1-00000}}"

{{hr3}} <!-------------------------------->

* #T10 ("eo|haw|xx=b1000", both codes are tested)
* expected result: "1" (good)
* actual result: "{{#invoke:mchklngcode|ek|eo|haw|xx=b1000}}"

::* #T11 ("eo|??|xx=b1000", both codes are tested, "??" prohibited)
::* expected result: "0" (bad)
::* actual result: "{{#invoke:mchklngcode|ek|eo|??|xx=b1000}}"

* #T12 ("eo|??|xx=b1200", both codes are tested, "??" allowed)
* expected result: "1" (good)
* actual result: "{{#invoke:mchklngcode|ek|eo|??|xx=b1200}}"

::* #T13 ("por|xx=b0000", binary output, "por" expl prohibited)
::* expected result: "0" (evil)
::* actual result: "{{#invoke:mchklngcode|ek|por|xx=b0000}}"

* #T14 ("por|xx=b0001", binary output, "por" allowed)
* expected result: "1" (tolerable)
* actual result: "{{#invoke:mchklngcode|ek|por|xx=b0001}}"

::* #T15 ("eo|z|xx=b1101", both codes are tested, right "z" is bad)
::* expected result: "0" (evil)
::* actual result: "{{#invoke:mchklngcode|ek|eo|z|xx=b1101}}"

* #T16 ("z|eo|xx=b1101", both codes are tested, left "z" is bad)
* expected result: "0" (evil)
* actual result: "{{#invoke:mchklngcode|ek|z|eo|xx=b1101}}"

::* #T17 ("epo|eo|xx=b1101", both codes are tested, "epo" allowed)
::* expected result: "1" (tolerable)
::* actual result: "{{#invoke:mchklngcode|ek|epo|eo|xx=b1101}}"

{{hr3}} <!-------------------------------->

* #T20 ("id||xx=b1101", both codes are tested, empty param is bad)
* expected result: "0" (bad)
* actual result: "{{#invoke:mchklngcode|ek|id||xx=b1101}}"

::* #T21 ("id||xx=b0101", only one code is tested, empty param is bad but ignored)
::* expected result: "1" (good)
::* actual result: "{{#invoke:mchklngcode|ek|id||xx=b0101}}"

* #T22 ("|id|xx=b0101", only one code is tested, empty early param is bad)
* expected result: "0" (bad)
* actual result: "{{#invoke:mchklngcode|ek||id|xx=b0101}}"

::* #T23 ("t8i|xx=b0000", digits prohibited as default)
::* expected result: "0" (bad)
::* actual result: "{{#invoke:mchklngcode|ek|t8i|xx=b0000}}"

* #T24 ("t8i|xx=b0010", digits permitted)
* expected result: "1" (good)
* actual result: "{{#invoke:mchklngcode|ek|t8i|xx=b0010}}"

{{hr3}} <!-------------------------------->

* #T30 ("grc|xx=t0000", tristate)
* expected result: "2" (good and known)
* actual result: "{{#invoke:mchklngcode|ek|grc|xx=t0000}}"

::* #T31 ("t8i|xx=t0010", tristate, digits permitted)
::* expected result: "2" (good and known) or "1" (valid but unknown)
::* actual result: "{{#invoke:mchklngcode|ek|t8i|xx=t0010}}"

* #T32 ("??|xx=t0200", tristate, "??" is allowed)
* expected result: "1" (valid but unknown)
* actual result: "{{#invoke:mchklngcode|ek|??|xx=t0200}}"

::* #T33 ("???|xx=t0200", tristate, "??" is allowed but "???" is NOT)
::* expected result: "0" (obviously invalid)
::* actual result: "{{#invoke:mchklngcode|ek|???|xx=t0200}}"

* #T34 ("fra|xx=t0000", tristate, this code is expl banned)
* expected result: "0" (obviously invalid)
* actual result: "{{#invoke:mchklngcode|ek|fra|xx=t0000}}"

::* #T35 ("fra|xx=t0001", tristate, this code is expl banned but we do not care)
::* expected result: "1" (valid but unknown)
::* actual result: "{{#invoke:mchklngcode|ek|fra|xx=t0001}}"

{{hr3}} <!-------------------------------->

* #T40 ("f3i|xx=t0000", tristate, digits prohibited by default)
* expected result: "0" (obviously invalid)
* actual result: "{{#invoke:mchklngcode|ek|f3i|xx=t0000}}"

::* #T41 ("f3i|xx=t0010", tristate, digits permitted)
::* expected result: "1" (valid but unknown)
::* actual result: "{{#invoke:mchklngcode|ek|f3i|xx=t0010}}"

* #42 ("fi3|xx=t0010", tristate, digits permitted but only in middle position)
* expected result: "0" (obviously invalid)
* actual result: "{{#invoke:mchklngcode|ek|fi3|xx=t0010}}"

* #43 ("3fi|xx=t0010", tristate, digits permitted but only in middle position)
* expected result: "0" (obviously invalid)
* actual result: "{{#invoke:mchklngcode|ek|3fi|xx=t0010}}"

{{hr3}} <!-------------------------------->

* #50 ("grc|xx=k0000", 4 defaults explicitely confirmed, category mode)
* expected result: "" (empty string, good)
* actual result: "{{#invoke:mchklngcode|ek|grc|xx=k0000}}"

* #51 ("fri|xx=k0000|detxt=true", 4 defaults explicitely confirmed, category mode)
* expected result: N/A (valid but unknown, categories)
* actual result: "{{#invoke:mchklngcode|ek|fri|xx=k0000|detxt=true}}"

* #52 ("fori|xx=k0000|detxt=true", 4 defaults explicitely confirmed, category mode)
* expected result: N/A (obviously invalid, categories)
* actual result: "{{#invoke:mchklngcode|ek|fori|xx=k0000|detxt=true}}"

<pre>
* #T53 ("fri|xx=k0000", 4 defaults explicitely confirmed, category mode)
* expected result: N/A (valid but unknown, categories)
* actual result: "{{#invoke:mchklngcode|ek|fri|xx=k0000}}"

* #T54 ("fori|xx=k0000", 4 defaults explicitely confirmed, category mode)
* expected result: N/A (obviously invalid, categories)
* actual result: "{{#invoke:mchklngcode|ek|fori|xx=k0000}}"
</pre>

* note that tests #T20 ... #T22 use empty parameters
* note that tests #T53 and #T54 cannot be executed on the docs subpage

{{hr3}} <!-------------------------------->

]===]

local exporttable = {}

------------------------------------------------------------------------

---- CONSTANTS [O] ----

------------------------------------------------------------------------

-- uncommentable EO vs ID constant strings (core site-related features, "constrpriv" NOT needed)

      local constringvoj = "Modulo:loaddata-tbllingvoj"  -- EO
        -- local constringvoj = "Modul:loaddata-tblbahasa"    -- ID

      local constrneva = "Kategorio:Evidente nevalida lingvokodo"          -- EO -- no brackets ("[[","]]") here
        -- local constrneva = "Kategori:Kode bahasa jelas-jelas tidak valid"  -- ID -- no brackets ("[[","]]") here

      local constrneko = "Kategorio:Nekonata lingvokodo"                   -- EO -- no brackets ("[[","]]") here
        -- local constrneko = "Kategori:Kode bahasa tidak diketahui"          -- ID -- no brackets ("[[","]]") here

-- constant table -- ban list -- add obviously invalid access codes (2-letter or 3-letter) only

  -- length of the list is NOT stored anywhere, the processing stops
  -- when type "nil" is encountered, used by "lfivalidatelnkoadv" only

  -- controversial codes (sh sr hr), (zh cmn)
  -- "en.wiktionary.org/wiki/Wiktionary:Language_treatment" excluded languages
  -- "en.wikipedia.org/wiki/Spurious_languages"
  -- "iso639-3.sil.org/code/art" only valid in ISO 639-2
  -- "iso639-3.sil.org/code/gem" only valid in ISO 639-2 and 639-5, "collective"
  -- "iso639-3.sil.org/code/zxx" "No linguistic content"

  local contabisbanned = {}
  contabisbanned = {'by','dc','ll','jp','art','deu','eng','epo','fra','gem','ger','ido','lat','por','rus','spa','swe','tup','zxx'} -- 1...19

  -- emergency brake (6 binary digits: nevagene,nevanome,nevaloke,nekogene,nekonome,nekoloke)

  local constrfilter = "111111"  -- change one or several digits to ZERO to prevent categorization

------------------------------------------------------------------------

---- SPECIAL STUFF OUTSIDE MAIN [B] ----

------------------------------------------------------------------------

---- SPECIAL VAR:S ----

local qldingvoj = {}     -- type "table" and nested
local qbooguard = false  -- only for the guard test, pass to other var ASAP

---- GUARD AGAINST INTERNAL ERROR AND IMPORT ONE VIA LOADDATA ----

qbooguard = (type(constringvoj)~='string') or (type(constrneva)~='string') or (type(constrneko)~='string')
if (not qbooguard) then
  qbooguard = (constringvoj=='') or (constrneva=='') or (constrneko=='')
end--if
if (not qbooguard) then
  qldingvoj = mw.loadData(constringvoj) -- can crash here
  qbooguard = (type(qldingvoj)~='table') -- seems to be always false
end--if

------------------------------------------------------------------------

---- LOW LEVEL STRING FUNCTIONS [G] ----

------------------------------------------------------------------------

-- Local function LFGSTRINGRANGE

local function lfgstringrange (varvictim, nummini, nummaxi)
  local nummylengthofstr = 0
  local booveryvalid = false -- preASSume guilt
  if (type(varvictim)=='string') then
    nummylengthofstr = string.len(varvictim)
    booveryvalid = ((nummylengthofstr>=nummini) and (nummylengthofstr<=nummaxi))
  end--if
  return booveryvalid
end--function lfgstringrange

------------------------------------------------------------------------

-- test whether char is an ASCII digit "0"..."9", return boolean

local function lfgtestnum (numkaad)
  local boodigit = false
  boodigit = ((numkaad>=48) and (numkaad<=57))
  return boodigit
end--function lfgtestnum

------------------------------------------------------------------------

-- test whether char is an ASCII uppercase letter, return boolean

local function lfgtestuc (numkode)
  local booupperc = false
  booupperc = ((numkode>=65) and (numkode<=90))
  return booupperc
end--function lfgtestuc

------------------------------------------------------------------------

-- test whether char is an ASCII lowercase letter, return boolean

local function lfgtestlc (numcode)
  local boolowerc = false
  boolowerc = ((numcode>=97) and (numcode<=122))
  return boolowerc
end--function lfgtestlc

------------------------------------------------------------------------

-- Local function LFGIS62SAFE

-- Test whether incoming ASCII char is very safe (0...9 A...Z a...z).

-- Depends on functions :
-- [G] lfgtestnum lfgtestuc lfgtestlc

local function lfgis62safe (numcxair)
  local booguud = false
  booguud = lfgtestnum (numcxair) or lfgtestuc (numcxair) or lfgtestlc (numcxair)
  return booguud
end--function lfgis62safe

------------------------------------------------------------------------

---- HIGH LEVEL STRING FUNCTIONS [I] ----

------------------------------------------------------------------------

-- Local function LFIFIXUNSAFE

-- Fix dangerous string (obviously invalid language code or whatever) so that
-- it can at least be reported (used inside name of a tracking category).

-- Input  : * strbahaya

-- Output : * strfixed

-- Depends on functions :
-- [G] lfgtestnum lfgtestuc lfgtestlc lfgis62safe

-- # empty string replaced with "e-m-p-t-y"
-- # truncated to 14 octet:s if longer
-- # unsafe char:s are replaced with dot:s (safe are only "0"..."9" and
--   "A"..."Z" and "a"..."z" and "!" and "," and "-" and maybe ".")

local function lfifixunsafe (strbahaya)
  local strfixed = ""
  local numlencx = 0
  local numcxaar = 0
  local numuindex = 1 -- ONE-based
  local boogood = false
  if (strbahaya=="") then
    strfixed = "e-m-p-t-y"
  else
    numlencx = math.min (string.len (strbahaya), 14)
    while true do
      if (numuindex>numlencx) then
        break
      end--if
      numcxaar = string.byte (strbahaya,numuindex,numuindex)
      boogood = lfgis62safe (numcxaar) -- 0...9 A...Z a...z
      if (numcxaar==33) then
        boogood = true -- "!"
      end--if
      if ((numcxaar>=44) and (numcxaar<=46)) then
        boogood = true -- ",-" -- FYI: 46 is the dot "."
      end--if
      if (not boogood) then
        numcxaar = 46 -- replace by dot "."
      end--if
      strfixed = strfixed .. string.char (numcxaar)
      numuindex = numuindex + 1
    end--while
  end--if
  return strfixed
end--function lfifixunsafe

------------------------------------------------------------------------

-- Local function LFIDECENCODMIN

-- Minimally encode char:s to prevent parsing. Our cool module has brewed
-- something with "[["..."]]" but we want to see plain text for debugging
-- purposes. This is the most dumb version that dec-encodes all ASCII and
-- does not expect esoteric ASCII values or broken UTF8 stream.

-- Input  : * strkrampdang -- string, empty tolerable, but type "nil" is NOT

-- Output : * strmincod -- string, empty in worst case

local function lfidecencodmin (strkrampdang)
  local strmincod = ''
  local numstrlen = 0
  local numpeekinx = 1 -- ONE-based index
  local numchmiar = 0
  numstrlen = string.len (strkrampdang)
  while true do
    if (numpeekinx>numstrlen) then
      break
    end--if
    numchmiar = string.byte (strkrampdang,numpeekinx,numpeekinx)
    numpeekinx = numpeekinx + 1
    if (numchmiar>127) then
      strmincod = strmincod .. string.char (numchmiar) -- pass UTF8
    else
      strmincod = strmincod .. '&#' .. tostring (numchmiar) .. ';' -- encode ASCII
    end--if
  end--while
  return strmincod
end--function lfidecencodmin

------------------------------------------------------------------------

-- Local function LFIVALIDATELNKOADV

-- Advanced test whether a string (intended to be a langcode) is valid
-- containing only 2 or 3 lowercase letters, or 2...10 char:s and with some
-- dashes, or maybe a digit in middle position or maybe instead equals to "-"
-- or "??" and maybe additionally is not included on the ban list.

-- Input  : * strqooq -- string (empty is useless and returns
--                       "true" ie "bad" but cannot cause any major harm)
--          * booyesdsh -- "true" to allow special code dash "-"
--          * booyesqst -- "true" to allow special code doublequest "??"
--          * booloonkg -- "true" to allow long codes such as "zh-min-nan"
--          * boodigit -- "true" to allow digit in middle position
--          * boonoban -- (inverted) "true" to skip test against ban table

-- Output : * booisvaladv -- true if string is valid

-- Depends on functions :
-- [G] lfgtestnum lfgtestlc

-- Depends on constants :
-- * table "contabisbanned"

-- Incoming empty string is safe but type "nil" is NOT.

-- Digit is tolerable only ("and" applies):
-- * if boodigit is "true"
-- * if length is 3 char:s
-- * in middle position

-- Dashes are tolerable (except in special code "-") only ("and" applies):
-- * if length is at least 4 char:s (if this is permitted at all)
-- * in inner positions
-- * NOT adjacent
-- * maximally TWO totally
-- There may be maximally 3 adjacent letters, this makes at least ONE dash
-- obligatory for length 4...7, and TWO dashes for length 8...10.

local function lfivalidatelnkoadv (strqooq, booyesdsh, booyesqst, booloonkg, boodigit, boonoban)

  local varomongkosong = 0 -- for check against the ban list
  local numchiiar = 0
  local numukurran = 0
  local numindeex = 0 -- ZERO-based -- two loops
  local numadjlet = 0 -- number of adjacent letters (max 3)
  local numadjdsh = 0 -- number of adjacent dashes (max 1)
  local numtotdsh = 0 -- total number of dashes (max 2)
  local booislclc = false
  local booisdigi = false
  local booisdash = false
  local booisvaladv = true -- preASSume innocence -- later final verdict here

  while true do -- fake (outer) loop

    if (strqooq=='-') then
      booisvaladv = booyesdsh
      break -- to join mark -- good or bad
    end--if
    if (strqooq=='??') then
      booisvaladv = booyesqst
      break -- to join mark -- good or bad
    end--if
    numukurran = string.len (strqooq)
    if ((numukurran<2) or (numukurran>10)) then
      booisvaladv = false
      break -- to join mark -- evil
    end--if
    if (not booloonkg and (numukurran>3)) then
      booisvaladv = false
      break -- to join mark -- evil
    end--if

    numindeex = 0
    while true do -- inner genuine loop over char:s
      if (numindeex>=numukurran) then
        break -- done -- good
      end--if
      numchiiar = string.byte (strqooq,(numindeex+1),(numindeex+1))
      booisdash = (numchiiar==45)
      booisdigi = lfgtestnum(numchiiar)
      booislclc = lfgtestlc(numchiiar)
      if (not (booislclc or booisdigi or booisdash)) then
        booisvaladv = false
        break -- to join mark -- inherently bad char
      end--if
      if (booislclc) then
        numadjlet = numadjlet + 1
      else
        numadjlet = 0
      end--if
      if (booisdigi and ((numukurran~=3) or (numindeex~=1) or (not boodigit))) then
        booisvaladv = false
        break -- to join mark -- illegal digit
      end--if
      if (booisdash) then
        if ((numukurran<4) or (numindeex==0) or ((numindeex+1)==numukurran)) then
          booisvaladv = false
          break -- to join mark -- illegal dash
        end--if
        numadjdsh = numadjdsh + 1
        numtotdsh = numtotdsh + 1 -- total
      else
        numadjdsh = 0 -- do NOT zeroize the total !!!
      end--if
      if ((numadjlet>3) or (numadjdsh>1) or (numtotdsh>2)) then
        booisvaladv = false
        break -- to join mark -- evil
      end--if
      numindeex = numindeex + 1 -- ZERO-based
    end--while -- inner genuine loop over char:s

    if (not boonoban) then -- if "yesban" then
      numindeex = 0
      while true do -- lower inner genuine loop
        varomongkosong = contabisbanned[numindeex+1] -- number of elem unknown
        if (type(varomongkosong)~='string') then
          break -- abort inner loop (then outer fake loop) due to end of table
        end--if
        numukurran = string.len (varomongkosong)
        if ((numukurran<2) or (numukurran>3)) then
          break -- abort inner loop (then outer fake loop) due to faulty table
        end--if
        if (strqooq==varomongkosong) then
          booisvaladv = false
          break -- abort inner loop (then outer fake loop) due to violation
        end--if
        numindeex = numindeex + 1 -- ZERO-based
      end--while -- lower inner genuine loop
    end--if (not boonoban) then

    break -- finally to join mark
  end--while -- fake loop -- join mark

  return booisvaladv

end--function lfivalidatelnkoadv

------------------------------------------------------------------------

---- HIGH LEVEL FUNCTIONS [H] ----

------------------------------------------------------------------------

-- Local function LFBREW3KAT

-- Brew 3 categories from the bad langcode (generic and specific nome
-- and specific loke).

-- Input  : * strkatbase (kategory base name with namespace prefix)
--          * strspecnome (the bad langcode already sanitized)
--          * strspecloke (the caller name already sanitized)
--          * booxgene, booxnome, booxloke

-- Output : * strtigakucing (can be empty)

local function lfbrew3kat (strkatbase, strspecnome, strspecloke, booxgene, booxnome, booxloke)
  local strtigakucing = ''
  if (booxgene) then
    strtigakucing = '[[' .. strkatbase .. ']]'
  end--if
  if (booxnome) then
    strtigakucing = strtigakucing .. '[[' .. strkatbase .. ' nome (' .. strspecnome .. ')]]'
  end--if
  if (booxloke) then
    strtigakucing = strtigakucing .. '[[' .. strkatbase .. ' loke (' .. strspecloke .. ')]]'
  end--if
  return strtigakucing
end--function lfbrew3kat

------------------------------------------------------------------------

---- VARIABLES [R] ----

------------------------------------------------------------------------

function exporttable.ek (arxframent)

  -- general unknown type

  local vartmp = 0     -- variable without type

  -- special type "args" AKA "arx"

  local arxourown = 0  -- metaized "args" from our own "frame"
  local arxcaller = 0  -- metaized "args" from caller's "frame"

  -- general "tab"

  local tablg76yleft = {}

  -- general "str"

  local strkodo3   = ""  -- code (obligatory)
  local strkodo4   = ""  -- code (optional)
  local strpncalco = ""  -- pagename core of the caller
  local strxx      = ""  -- DEPRECATED 5 char:s
  local stryy      = ""  -- new 8 char:s
  local strret     = ""  -- output string

  -- general "num"

  local numbtkmo = 98   -- operation mode / type of result: "b" or "t" or "k"

  local num012st3k = 2  -- tri-state sta "strkodo3"
  local num012st4k = 2  -- tri-state sta "strkodo4" (remains 2 if only 1 code)
  local num012stzz = 2  -- tri-state combo = min (num012st3k,num012st4k)

  local numlong  = 0    -- temp
  local numchar  = 0    -- temp

  local numbull   = 0   -- temp for peeking caller
  local numposcol = 0   -- temp for peeking caller ONE-based position of colon

  -- general "boo"

  local boochktwo = false

  local boodashgd = false  -- allow "-"
  local boodblqgd = false  -- allow "??"
  local boolonggd = false  -- allow long codes such as "zh-min-nan"
  local boodigigd = false  -- allow digit in middle position
  local booskipbt = false  -- (inverted) skip test against ban table

  local boonocat  = false  -- from "nocat=true"
  local boodetxt  = false  -- from "detxt=true"

  local boointer  = false  -- "true" on internal error (blocks categorization)

  local boodoccek = false  -- temp: do the check at all (maybe can be skipped)

  local boonevagene = true  -- @ for "constrfilter", default is "true"
  local boonevanome = true  -- @
  local boonevaloke = true  -- @
  local boonekogene = true  -- @
  local boonekonome = true  -- @
  local boonekoloke = true  -- @

------------------------------------------------------------------------

---- MAIN [Z] ----

------------------------------------------------------------------------

  ---- GUARD AGAINST INTERNAL ERROR ----

  boointer = qbooguard

  ---- PICK ONE SUBTABLE ----

  while true do -- fake loop

    if (boointer) then
      break -- to join mark
    end--if
    num2statcode = qldingvoj[2] -- risk of type "nil"
    if (num2statcode~=0) then
      boointer = true -- #E02 malica
      break -- to join mark
    end--if
    tablg76yleft = qldingvoj['T76']
    if (type(tablg76yleft)~='table') then -- important check
      boointer = true -- #E02 malica
      break -- to join mark
    end--if

    break -- finally to join mark
  end--while -- fake loop -- join mark

  ---- SEIZE CALLER'S NAME FROM MW (ONLY CORE NEEDED, NO PREFIX) ----

  -- assigns "strpncalco" (pagename core) at least one char

  -- a posible failure here is NOT fatal

  vartmp = arxframent:getParent():getTitle()
  numposcol = 0
  strpncalco = ''

  if (type(vartmp)=="string") then
    strtmp = vartmp
    numbull = string.len (strtmp)
    if (numbull>2) then
      vartmp = string.find (strtmp, ':', 1, true) -- plain text search
      if (vartmp~=nil) then -- "not found" is NOT valid
        numposcol = vartmp -- ONE-based position
        if ((numposcol==1) or (numposcol==numbull)) then
          numposcol = 0 -- invalid position of colon
        end--if
      end--if
    end--if (numbull>2) then
  end--if

  if (numposcol~=0) then
    strpncalco = string.sub (strtmp,(numposcol+1),numbull) -- remove prefix
  end--if

  ---- GET THE ARX:ES ----

  if (not boointer) then
    arxourown = arxframent.args -- "args" from our own "frame"
    arxcaller = arxframent:getParent().args -- "args" from caller's "frame"
  end--if

  ---- SEIZE 1 OR 2 ANONYMOUS PARAMETERS AND "XX=" "YY=" SENT BY CALLER TO US ----

  while true do -- fake loop

    if (boointer) then
      break -- to join mark
    end--if

    if (arxourown[3]) then
      boointer = true -- internal error -- was preassigned to "false"
      break -- to join mark -- 3 anon params are not appreciated
    end--if

    vartmp = arxourown[1] -- can be "nil"
    if (type(vartmp)=="string") then
      strkodo3 = vartmp -- give a f**k about risk of empty string
    end--if

    vartmp = arxourown[2] -- can be "nil"
    if (type(vartmp)=="string") then
      strkodo4 = vartmp -- give a f**k about risk of empty string
    end--if

    vartmp = arxourown['xx'] -- can be "nil" -- optional named  !!!FIXME!!! deprecated
    if (lfgstringrange(vartmp,5,5)) then
      strxx = vartmp
    end--if

    vartmp = arxourown['yy'] -- can be "nil" -- optional named
    if (lfgstringrange(vartmp,8,8)) then
      stryy = vartmp
    end--if

    break -- finally to join mark
  end--while -- fake loop -- join mark


  while true do -- fake loop  !!!FIXME!!! use LFIVALIUMDCTLSTR

    if (boointer) then
      break -- to join mark
    end--if

    if ((strxx~='') and (stryy=='')) then -- !!!FIXME!!! DEPRECATED and allow digit sudah removed

      numchar = string.byte (strxx,1,1) -- enum "b" (default) or "t" "k"
      if (numchar==116) then
        numbtkmo = 116 -- requested "t" -- was preassigned to 98 ie "b"
      else
        if (numchar==107) then
          numbtkmo = 107 -- requested "k" -- was preassigned to 98 ie "b"
        else
          if (numchar~=98) then
            boointer = true -- internal error -- was preassigned to "false"
            break -- to join mark
          end--if
        end--if
      end--if (numchar==116) else

      numchar = string.byte (strxx,2,2)
      if (numchar==49) then
        boochktwo = true -- was preassigned to "false" -- check 2 codes
      else
        if (numchar~=48) then
          boointer = true -- internal error -- was preassigned to "false"
          break -- to join mark
        end--if
      end--if (numchar==49) else

      numchar = string.byte (strxx,3,3) -- fourstate "0" ... "3"
      if ((numchar<48) or (numchar>51)) then
        boointer = true -- internal error -- was preassigned to "false"
        break -- to join mark
      end--if
      boodashgd = ((numchar==49) or (numchar==51)) -- allow "-"
      boodblqgd = ((numchar==50) or (numchar==51)) -- allow "??"

      numchar = string.byte (strxx,5,5)
      if (numchar==49) then
        booskipbt = true -- was preassigned to "false" -- skip extra ban test
      else
        if (numchar~=48) then
          boointer = true -- internal error -- was preassigned to "false"
          break -- to join mark
        end--if
      end--if (numchar==49) else

    end--if

    if ((strxx=='') and (stryy~='')) then

      numchar = string.byte (stryy,1,1) -- enum "b" (default) or "t" "k"
      if (numchar==116) then
        numbtkmo = 116 -- requested "t" -- was preassigned to 98 ie "b"
      else
        if (numchar==107) then
          numbtkmo = 107 -- requested "k" -- was preassigned to 98 ie "b"
        else
          if (numchar~=98) then
            boointer = true -- internal error -- was preassigned to "false"
            break -- to join mark
          end--if
        end--if
      end--if (numchar==116) else

      numchar = string.byte (stryy,2,2) -- boolean
      if (numchar==49) then
        boochktwo = true -- was preassigned to "false" -- check 2 codes
      else
        if (numchar~=48) then
          boointer = true -- internal error -- was preassigned to "false"
          break -- to join mark
        end--if
      end--if (numchar==49) else

      numchar = string.byte (stryy,4,4) -- boolean
      if ((numchar<48) or (numchar>49)) then
        boointer = true -- internal error -- was preassigned to "false"
        break -- to join mark
      end--if
      boodashgd = (numchar==49) -- allow "-"

      numchar = string.byte (stryy,5,5) -- boolean
      if ((numchar<48) or (numchar>49)) then
        boointer = true -- internal error -- was preassigned to "false"
        break -- to join mark
      end--if
      boodblqgd = (numchar==49) -- allow "??"

      numchar = string.byte (stryy,6,6) -- boolean
      if (numchar==49) then
        boolonggd = true -- was preassigned to "false" -- allow long codes
      else
        if (numchar~=48) then
          boointer = true -- internal error -- was preassigned to "false"
          break -- to join mark
        end--if
      end--if (numchar==49) else

      numchar = string.byte (stryy,7,7) -- boolean
      if (numchar==49) then
        boodigigd = true -- was preassigned to "false" -- allow digit
      else
        if (numchar~=48) then
          boointer = true -- internal error -- was preassigned to "false"
          break -- to join mark
        end--if
      end--if (numchar==49) else

      numchar = string.byte (stryy,8,8) -- boolean
      if (numchar==49) then
        booskipbt = true -- was preassigned to "false" -- skip extra ban test
      else
        if (numchar~=48) then
          boointer = true -- internal error -- was preassigned to "false"
          break -- to join mark
        end--if
      end--if (numchar==49) else

    end--if

    break -- finally to join mark
  end--while -- fake loop -- join mark

  ---- SEIZE 2 OPTIONAL NAMED PARAM SENT BY SOMEONE TO US OR TO CALLER ----

  -- "detxt" overrides "nocat"

  if (not boointer) then
    vartmp = arxourown['detxt'] -- can be "nil"
    boodetxt = (vartmp=='true')
    vartmp = arxcaller['detxt'] -- can be "nil"
    if (type(vartmp)=='string') then -- override only if text given
      boodetxt = (vartmp=='true')
    end--if
  end--if

  if ((boointer==false) and (boodetxt==false)) then
    vartmp = arxourown['nocat'] -- can be "nil"
    boonocat = (vartmp=='true')
    vartmp = arxcaller['nocat'] -- can be "nil"
    if (type(vartmp)=='string') then -- override only if text given
      boonocat = (vartmp=='true')
    end--if
  end--if

  ---- CARRY OUT THE HARD WORK -- TEST FOR OBVIOUS INVALIDITY ----

  -- this hard work is NOT needed if:
  -- # we already have an internal error
  -- or
  -- # result mode is "k" category and we have "nocat=true"

  boodoccek = true -- preASSume
  if (boointer) then
    boodoccek = false
  end--if
  if ((numbtkmo==107) and boonocat) then -- "k" and "nocat=true"
    boodoccek = false
  end--if

  if (boodoccek) then
    if (not lfivalidatelnkoadv(strkodo3,boodashgd,boodblqgd,boolonggd,boodigigd,booskipbt)) then
      num012st3k = 0
    end--if
  end--if

  if (boodoccek and boochktwo) then
    if (not lfivalidatelnkoadv(strkodo4,boodashgd,boodblqgd,boolonggd,boodigigd,booskipbt)) then
      num012st4k = 0
    end--if
  end--if

  ---- CHECK WHETHER THE CODES ARE KNOWN IE SUPPORTED ----

  -- this hard work is NOT needed if:
  -- # we already have an internal error
  -- or
  -- # result mode is "b" binary (then we do not distinguish "1" from "2")
  -- or
  -- # result mode is "k" category and we have "nocat=true"

  boodoccek = true
  if (boointer) then
    boodoccek = false
  end--if
  if (numbtkmo==98) then -- "b" boolean / binary mode
    boodoccek = false
  end--if
  if ((numbtkmo==107) and boonocat) then -- "k" and "nocat=true"
    boodoccek = false
  end--if

  if ((num012st3k==2) and boodoccek) then -- 2 means known
    if (type(tablg76yleft[strkodo3])~='string') then
      num012st3k = 1 -- degrade to 1 unknown
    end--if
  end--if

  if ((num012st4k==2) and boodoccek and boochktwo) then -- 2 means known
    if (type(tablg76yleft[strkodo4])~='string') then
      num012st4k = 1 -- degrade to 1 unknown
    end--if
  end--if

  ---- BREW MIN ----

  -- we have 2 separate tristate results "num012st3k" and "num012st4k"

  -- "num012st4k" was preassigned to 2 and remains 2 all the time if
  -- only one code is tested

  -- an internal error ie "boointer" = "true" results in "num012stzz"
  -- assigned to ZERO

  -- combo result num012stzz = min (num012st3k,num012st4k)

  if (boointer) then
    num012stzz = 0 -- jaevlar
  else
    num012stzz = math.min (num012st3k, num012st4k)
  end--if

  ---- ASSIGN "STRRET" TO BOOLEAN OR TRISTATE ----

  -- possible modes in "numbtkmo" are 98 "b" (default) or 116 "t" or 107 "k"

  if (numbtkmo==116) then -- "t" tristate
    strret = string.char(num012stzz+48) -- this was rocket science
  end--if
  if (numbtkmo==98) then -- "b" boolean
    strret = "1" -- preASSume innocence -- report "tolerable" (was 1 or 2)
    if (num012stzz==0) then
      strret = "0" -- report "evil" (0)
    end--if
  end--if

  ---- BREW CATEGORIES ----

  -- here we process the "k" mode

  -- we use the 2 separate tristate results "num012st3k" and "num012st4k"
  -- 0 invalid -- 1 unknown -- 2 known

  -- the strings are "constrneva" and "constrneko" and include the category
  -- prefix and the name, but not "[[","]]" nor space before details
  -- ("nome", "loke", "(", ")")

  -- for obviously invalid codes we use "constrneva" and
  -- include the bad string (sanitized)

  -- for unknown codes we use "constrneko" and include the bad string

  -- categorization can be blocked by several conditions:
  -- # totally by "num012stzz" = 2 (code is or both codes are
  --   known, no need to whine)
  -- # totally by "nocat=true"
  -- # totally by "boointer=true" (junk categories on
  --   internal error are NOT appreciated)
  -- # selectively by "num012st.." = 2 (code is known, no need to whine)
  -- # selectively by ZERO:s in "constrfilter", length must be
  --   6 char:s, default for those 6 boolean values is true

  -- "strret" was preassigned to empty "" and is so
  -- far untouched in the "k" mode

  if ((num012stzz~=2) and (numbtkmo==107) and (boointer==false) and (boonocat==false)) then

    if (string.len(constrfilter)==6) then
      if (string.byte(constrfilter,1,1)==48) then
        boonevagene = false
      end--if
      if (string.byte(constrfilter,2,2)==48) then
        boonevanome = false
      end--if
      if (string.byte(constrfilter,3,3)==48) then
        boonevaloke = false
      end--if
      if (string.byte(constrfilter,4,4)==48) then
        boonekogene = false
      end--if
      if (string.byte(constrfilter,5,5)==48) then
        boonekonome = false
      end--if
      if (string.byte(constrfilter,6,6)==48) then
        boonekoloke = false
      end--if
    end--if

    strkodo3 = lfifixunsafe (strkodo3) -- sani lngcode (redu for num012st3k==1)
    strpncalco = lfifixunsafe (strpncalco) -- sanitize caller

    if (num012st3k==0) then -- 2 possib in 1 of "num012st3k" & "num012st4k"
      strret = strret .. lfbrew3kat (constrneva, strkodo3, strpncalco, boonevagene, boonevanome, boonevaloke)
    end--if
    if (num012st3k==1) then
      strret = strret .. lfbrew3kat (constrneko, strkodo3, strpncalco, boonekogene, boonekonome, boonekoloke)
    end--if

    if (boochktwo) then
      strkodo4 = lfifixunsafe (strkodo4) -- sani lngcode (redu for num012st4k==1)
      if (num012st4k==0) then -- 2 possib in 1 of "num012st3k" & "num012st4k"
        strret = strret .. lfbrew3kat (constrneva, strkodo4, strpncalco, boonevagene, boonevanome, boonevaloke)
      end--if
      if (num012st4k==1) then
        strret = strret .. lfbrew3kat (constrneko, strkodo4, strpncalco, boonekogene, boonekonome, boonekoloke)
      end--if
    end--if

    if (boodetxt) then
      strret = lfidecencodmin (strret)
    end--if

  end--if ((num012stzz~=2) ... (boonocat==false)) then

  ---- RETURN THE JUNK STRING ----

  return strret -- can vary depending on result type, even be empty

end--function

  ---- RETURN THE JUNK LUA TABLE ----

return exporttable