[go: up one dir, main page]

Page MenuHomePhabricator

Request increased quota for urbanecmbot Toolforge tool
Closed, ResolvedPublic

Description

Tool Name: urbanecmbot
Quota increase requested: not sure how much resources each job consumes; if it's 0.5/job, then since I have 10 jobs that failed to start, something like +5 CPU? I'm open to suggestions from Toolforge admins though :).
Reason: urbanecmbot runs dozens of different tasks for Czech Wikipedia. Some jobs have to run fairly frequently (every few minutes) and according to toolforge-jobs, ~10 of the jobs were "Unable to start, out of quota for cpu"

Event Timeline

Logs:

tools.urbanecmbot@tools-sgebastion-10 ~/11bots/cswiki/userbots/patrolTrusted
$ toolforge-jobs list
Job name:                  Job type:               Status:
-------------------------  ----------------------  -------------------------------------
test-py39-sleep            normal                  Running for 13m14s
add-preklad-ct             schedule: 14 12 * * *   Waiting for scheduled time
afd-announcer              schedule: */5 * * * *   Unable to start, out of quota for cpu
archivebot                 schedule: 14 22 * * 1   Waiting for scheduled time
autoprotect-daily          schedule: 5 12 * * *    Waiting for scheduled time
autoprotect-weekly         schedule: 10 12 * * 0   Waiting for scheduled time
badprotecttemplates        schedule: 26 3 * * *    Waiting for scheduled time
clean-sandbox              schedule: */10 * * * *  Unable to start, out of quota for cpu
did-youknow                schedule: 18 23 * * 1   Waiting for scheduled time
edit-patrol-sorter         schedule: 13 14 * * *   Waiting for scheduled time
empty-course-pages         schedule: 50 23 * * *   Waiting for scheduled time
empty-talkpages            schedule: 47 23 * * *   Waiting for scheduled time
export-wd                  schedule: 54 9 * * *    Waiting for scheduled time
fa-deadlink                schedule: 40 14 * * *   Waiting for scheduled time
mark-socks                 schedule: 23 * * * *    Waiting for scheduled time
mark-students              schedule: 23 * * * *    Waiting for scheduled time
most-linked-disambigs      schedule: 42 16 * * *   Waiting for scheduled time
most-linked-redirs         schedule: 42 16 1 * *   Waiting for scheduled time
neklavesove-znaky-bot      schedule: 17 9 * * *    Waiting for scheduled time
new-articles-portals       schedule: 14 12 * * *   Waiting for scheduled time
nnc-announcer              schedule: */5 * * * *   Unable to start, out of quota for cpu
orphan                     schedule: 50 16 * * *   Waiting for scheduled time
par-announcer              schedule: */5 * * * *   Unable to start, out of quota for cpu
patrol-autopatrolled       schedule: 13 * * * *    Waiting for scheduled time
patrol-autopatrolled-meta  schedule: 13 * * * *    Waiting for scheduled time
patrol-dashboard           schedule: */5 * * * *   Unable to start, out of quota for cpu
patrol-fountain            schedule: */5 * * * *   Unable to start, out of quota for cpu
patrol-undo                schedule: */3 * * * *   Unable to start, out of quota for cpu
purge-konec-mazani         schedule: 13 * * * *    Waiting for scheduled time
purge-merch-end            schedule: 13 * * * *    Waiting for scheduled time
relikty                    schedule: 59 8 * * *    Waiting for scheduled time
seniori-articles           schedule: 57 13 * * *   Waiting for scheduled time
spory-announcer            schedule: */5 * * * *   Unable to start, out of quota for cpu
standardization            schedule: 54 23 * * *   Waiting for scheduled time
trusted-patrol-get-users   schedule: */5 * * * *   Unable to start, out of quota for cpu
ukoly-add-priority         schedule: 54 14 * * 1   Waiting for scheduled time
ukoly-bez-podstranky       schedule: 30 0 * * 1    Waiting for scheduled time
ukoly-resolved             schedule: 0 0 * * 1     Waiting for scheduled time
wiki-speedy-delete         schedule: 0 */2 * * *   Waiting for scheduled time
wikidata-coor-import       schedule: 6 3 * * *     Waiting for scheduled time
wikidata-label-import      schedule: 3 4 * * *     Waiting for scheduled time
zamky-hrady-bot            schedule: 17 8 * * *    Waiting for scheduled time
zok-announcer              schedule: */5 * * * *   Unable to start, out of quota for cpu
zops-announcer             schedule: */5 * * * *   Unable to start, out of quota for cpu
patrol-after-patrol        continuous              Fails to start
patrol-sandbox             continuous              Running
patrol-trusted             continuous              Running
tools.urbanecmbot@tools-sgebastion-10 ~/11bots/cswiki/userbots/patrolTrusted
$

kubectl get events log: P48403.

How many jobs do you want to run at the same time?

How many jobs do you want to run at the same time?

That's a question I'm not sure how to answer exactly. I currently have 12 jobs that run every couple of minutes (once per 3/5/10 minutes depending on the job), which I should likely convert to continuous jobs at some point and three continuous jobs.

In addition to that, there is a chance there are one or two daily/hourly jobs scheduled to run while the continuous/often running jobs run. So, I'd estimate something around 16 concurrent jobs?

In any case: Here is my current jobs.yaml for reference: https://github.com/wikimedia/labs-tools-urbanecmbot/blob/master/jobs.yaml.

Ok, the defaults are this:

JOB_DEFAULT_MEMORY = "512Mi"
JOB_DEFAULT_CPU = "500m"

So for 16 jobs I think you want 8Gi RAM and 8 CPUs (and half of that for requests/), does that sound good?

Sounds like a good starting point to me! I'll definitely let you all know if I run into issues with the increased quota.

taavi claimed this task.

Done. Please let me know if you have any issues.

That's the current quota, maybe someone got at it before me?

dcaro@tools-sgebastion-10:~$ kubectl sudo get resourcequota tool-urbanecmbot -n tool-urbanecmbot -o json | jq '.status.hard'
{
  "configmaps": "10",
  "count/cronjobs.batch": "50",
  "count/deployments.apps": "3",
  "count/jobs.batch": "15",
  "limits.cpu": "8",
  "limits.memory": "8Gi",
  "persistentvolumeclaims": "3",
  "pods": "16",
  "replicationcontrollers": "1",
  "requests.cpu": "4",
  "requests.memory": "4Gi",
  "secrets": "10",
  "services": "1",
  "services.nodeports": "0"
}