[go: up one dir, main page]

Page MenuHomePhabricator

Relax restrictions on toolforge envvar names
Open, Needs TriagePublicFeature

Description

Feature summary (what you would like to be able to do and where):
toolforge envvar names are currently limited to the regex ^[A-Z_][A-Z_0-9]{3,}$. Please make it more permissive – ideally as much as the underlying Kubernetes API allows (see below).

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
I’m working on migrating the QuickCategories tool to read its configuration from environment variables rather than config.yaml, with the goal of porting it to the build service, so I can ultimately run the background runner as a continuous job with a health check and thus hopefully fix T374152. To read the config from the environment, I am using Flask’s config.from_prefixed_env(), which supports reading nested objects from variables with double underscores in their name. For example, the current config.yaml snippet

EDITGROUPS:
    commons.wikimedia.org:
        url: "https://editgroups-commons.toolforge.org/b/QC/{0}/"
        since: 2021-09-14T00:00:00Z

would correspond to the environment variables

TOOL_EDITGROUPS__commons.wikimedia.org__url='https://editgroups-commons.toolforge.org/b/QC/{0}/'
TOOL_EDITGROUPS__commons.wikimedia.org__since='2021-09-14T00:00:00Z'

As far as I can tell from the Kubernetes documentation, this should be supported in a secret:

The keys of data and stringData must consist of alphanumeric characters, -, _ or ..

However, toolforge envvar enforces additional restrictions on the keys, which makes the above variable names unavailable.

tools.quickcategories@tools-bastion-13:~/www/python/src$ python3 -c 'import yaml; print(yaml.safe_load(open("config.yaml"))["EDITGROUPS"]["commons.wikimedia.org"]["url"])' | toolforge envvars create TOOL_EDITGROUPS__commons.wikimedia.org__url
Error: {"message":"request body has an error: doesn't match schema #/components/schemas/Envvar: Error at \"/name\": string doesn't match the regular expression \"^[A-Z_][A-Z_0-9]{3,}$\""}

I don’t know why it does this; the documentation currently claims that “the name for the envvar has to be all caps, just like a bash environment variable” (emphasis added), but this is simply not true: bash environment variables can have any case.

Benefits (why should this be implemented?):
Tools would be able to use environment variables with fewer arbitrary restrictions.

Event Timeline

I looked a bit through the Git history but didn’t find a lot more explanation for the pattern.

The last task mentions that “We introduced a bit stricter check for the envvar names”, but doesn’t explain why.

Mentioned in SAL (#wikimedia-cloud) [2024-09-15T19:25:52Z] <wmbot~lucaswerkmeister@tools-bastion-13> deployed 377aab02c6 (lowercase nested keys to work around T374780)

Lowercase might be ok to include yep, though . and - are not valid bash variable characters, so that would not be possible (even though they are valid k8s secrets, as we expose the secrets as environment variables in the shell, not as mounted files).

I think that the wiki text was just a bad wording (it has to be all caps, and also all the bash envvar restrictions, just reworded it).

Do we have to be bound by Bash’s syntax limitations? . and - work just fine between env and Flask / Python, even if there is a Bash in between:

$ env 'TOOL_a.b-c=d' bash -c python3
Python 3.12.6 (main, Sep  8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import flask
>>> config = flask.Config('')
>>> config.from_prefixed_env('TOOL')
True
>>> config['a.b-c']
'd'

So I was hoping that, as long as the environment variables aren’t set via Bash (but rather via e.g. setenv() in k8s’ code), those characters would work.