Tool Name: urbanecmbot
Quota increase requested: not sure how much resources each job consumes; if it's 0.5/job, then since I have 10 jobs that failed to start, something like +5 CPU? I'm open to suggestions from Toolforge admins though :).
Reason: urbanecmbot runs dozens of different tasks for Czech Wikipedia. Some jobs have to run fairly frequently (every few minutes) and according to toolforge-jobs, ~10 of the jobs were "Unable to start, out of quota for cpu"
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Declined | Urbanecm | T320108 Migrate urbanecmbot from Toolforge GridEngine to Toolforge Kubernetes | |||
Resolved | taavi | T337183 Request increased quota for urbanecmbot Toolforge tool |
Event Timeline
Logs:
tools.urbanecmbot@tools-sgebastion-10 ~/11bots/cswiki/userbots/patrolTrusted $ toolforge-jobs list Job name: Job type: Status: ------------------------- ---------------------- ------------------------------------- test-py39-sleep normal Running for 13m14s add-preklad-ct schedule: 14 12 * * * Waiting for scheduled time afd-announcer schedule: */5 * * * * Unable to start, out of quota for cpu archivebot schedule: 14 22 * * 1 Waiting for scheduled time autoprotect-daily schedule: 5 12 * * * Waiting for scheduled time autoprotect-weekly schedule: 10 12 * * 0 Waiting for scheduled time badprotecttemplates schedule: 26 3 * * * Waiting for scheduled time clean-sandbox schedule: */10 * * * * Unable to start, out of quota for cpu did-youknow schedule: 18 23 * * 1 Waiting for scheduled time edit-patrol-sorter schedule: 13 14 * * * Waiting for scheduled time empty-course-pages schedule: 50 23 * * * Waiting for scheduled time empty-talkpages schedule: 47 23 * * * Waiting for scheduled time export-wd schedule: 54 9 * * * Waiting for scheduled time fa-deadlink schedule: 40 14 * * * Waiting for scheduled time mark-socks schedule: 23 * * * * Waiting for scheduled time mark-students schedule: 23 * * * * Waiting for scheduled time most-linked-disambigs schedule: 42 16 * * * Waiting for scheduled time most-linked-redirs schedule: 42 16 1 * * Waiting for scheduled time neklavesove-znaky-bot schedule: 17 9 * * * Waiting for scheduled time new-articles-portals schedule: 14 12 * * * Waiting for scheduled time nnc-announcer schedule: */5 * * * * Unable to start, out of quota for cpu orphan schedule: 50 16 * * * Waiting for scheduled time par-announcer schedule: */5 * * * * Unable to start, out of quota for cpu patrol-autopatrolled schedule: 13 * * * * Waiting for scheduled time patrol-autopatrolled-meta schedule: 13 * * * * Waiting for scheduled time patrol-dashboard schedule: */5 * * * * Unable to start, out of quota for cpu patrol-fountain schedule: */5 * * * * Unable to start, out of quota for cpu patrol-undo schedule: */3 * * * * Unable to start, out of quota for cpu purge-konec-mazani schedule: 13 * * * * Waiting for scheduled time purge-merch-end schedule: 13 * * * * Waiting for scheduled time relikty schedule: 59 8 * * * Waiting for scheduled time seniori-articles schedule: 57 13 * * * Waiting for scheduled time spory-announcer schedule: */5 * * * * Unable to start, out of quota for cpu standardization schedule: 54 23 * * * Waiting for scheduled time trusted-patrol-get-users schedule: */5 * * * * Unable to start, out of quota for cpu ukoly-add-priority schedule: 54 14 * * 1 Waiting for scheduled time ukoly-bez-podstranky schedule: 30 0 * * 1 Waiting for scheduled time ukoly-resolved schedule: 0 0 * * 1 Waiting for scheduled time wiki-speedy-delete schedule: 0 */2 * * * Waiting for scheduled time wikidata-coor-import schedule: 6 3 * * * Waiting for scheduled time wikidata-label-import schedule: 3 4 * * * Waiting for scheduled time zamky-hrady-bot schedule: 17 8 * * * Waiting for scheduled time zok-announcer schedule: */5 * * * * Unable to start, out of quota for cpu zops-announcer schedule: */5 * * * * Unable to start, out of quota for cpu patrol-after-patrol continuous Fails to start patrol-sandbox continuous Running patrol-trusted continuous Running tools.urbanecmbot@tools-sgebastion-10 ~/11bots/cswiki/userbots/patrolTrusted $
kubectl get events log: P48403.
That's a question I'm not sure how to answer exactly. I currently have 12 jobs that run every couple of minutes (once per 3/5/10 minutes depending on the job), which I should likely convert to continuous jobs at some point and three continuous jobs.
In addition to that, there is a chance there are one or two daily/hourly jobs scheduled to run while the continuous/often running jobs run. So, I'd estimate something around 16 concurrent jobs?
In any case: Here is my current jobs.yaml for reference: https://github.com/wikimedia/labs-tools-urbanecmbot/blob/master/jobs.yaml.
Ok, the defaults are this:
JOB_DEFAULT_MEMORY = "512Mi" JOB_DEFAULT_CPU = "500m"
So for 16 jobs I think you want 8Gi RAM and 8 CPUs (and half of that for requests/), does that sound good?
Sounds like a good starting point to me! I'll definitely let you all know if I run into issues with the increased quota.
Mentioned in SAL (#wikimedia-cloud) [2023-05-31T09:35:26Z] <taavi> bump quotas per T337183
That's the current quota, maybe someone got at it before me?
dcaro@tools-sgebastion-10:~$ kubectl sudo get resourcequota tool-urbanecmbot -n tool-urbanecmbot -o json | jq '.status.hard' { "configmaps": "10", "count/cronjobs.batch": "50", "count/deployments.apps": "3", "count/jobs.batch": "15", "limits.cpu": "8", "limits.memory": "8Gi", "persistentvolumeclaims": "3", "pods": "16", "replicationcontrollers": "1", "requests.cpu": "4", "requests.memory": "4Gi", "secrets": "10", "services": "1", "services.nodeports": "0" }