[go: up one dir, main page]

Page MenuHomePhabricator

Migrate smallem from Toolforge GridEngine to Toolforge Kubernetes
Closed, ResolvedPublic

Description

Kindly migrate your tool(https://grid-deprecation.toolforge.org/t/smallem) from Toolforge GridEngine to Toolforge Kubernetes.

Toolforge GridEngine is getting deprecated.
See: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/

Please note that a volunteer may perform this migration if this has not been done after some time.
If you have already migrated this tool, kindly mark this as resolved.

If you would rather shut down this tool, kindly do so and mark this as resolved.

Useful Resources:
Migrating Jobs from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engine_migration
Migrating Web Services from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move_a_grid_engine_webservice
Python
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Rebuild_virtualenv_for_python_users

Event Timeline

My apologies if this ticket comes as a surprise to you. In order to ensure WMCS can provide a stable, secure and supported platform, it’s important we migrate away from GridEngine. I want to assure you that while it is WMCS’s intention to shutdown GridEngine as outlined in the blog post https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/, a shutdown date for GridEngine has not yet been set. The goal of the migration is to migrate as many tools as possible onto kubernetes and ensure as smooth a transition as possible for everyone. Once the majority of tools have migrated, discussion on a shutdown date is more appropriate. See T314664: [infra] Decommission the Grid Engine infrastructure.

As noted in https://techblog.wikimedia.org/2022/03/16/toolforge-gridengine-debian-10-buster-migration/ some use cases are already supported by kubernetes and should be migrated. If your tool can migrate, please do plan a migration. Reach out if you need help or find you are blocked by missing features. Most of all, WMCS is here to support you.

However, it’s possible your tool needs a mixed runtime environment or some other features that aren't yet present in https://techblog.wikimedia.org/2022/03/18/toolforge-jobs-framework/. We’d love to hear of this or any other blocking issues so we can work with you once a migration path is ready. Thanks for your hard work as volunteers and help in this migration!

This is a reminder that the tool for which this ticket is created is still running on the Grid.
The grid is deprecated and all remaining tools need to migrate to Toolforge Kubernetes.

We've sent several emails to maintainers as we continue to make the move away from the Grid.
Many of the issues that have held users back from moving away from the Grid have been addressed in
the latest updates to Build Service. See: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Changelog

You might find the following resources helpful in migrating your tool:

  1. https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Migrating_an_existing_tool
  2. https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Tutorials_for_popular_languages

Don't hesitate to reach out to us using this ticket or via any of our support channels

If you have already migrated this tool, kindly mark this ticket as 'resolved'
To do this, click on the 'Add Action' dropdown above the comment text box, select 'Change Status', then 'Resolved'.
Click 'Submit'

Thank you!

I'm the maintainer of this tool. Currently I'm trying to migrate my jobs into Kubernetes but I'm encountering technical difficulties. I'm trying to solve them with this discussion but any extra help or guidance would be appreciated. I'm just commenting to let everyone know where they can reach me since the tool has been listed as unreachable.

Klein claimed this task.

All Smallem's tasks have been successfully migrated to Toolforge Kubernetes.

Unfortunately I found out that only one of four tasks has been migrated successfully.

@aborrero, @dcaro, we communicated on the support channel and asked me to open a ticket about my issue so I reopened this. As said there, tried again my 4 tasks with a daily command. 1 was completed successfully, 2 failed and 1, the main one, never started (its status shows still waiting to start).

Any help would be appreciated. I can give more info if needed.

As the time of speaking the one task that "never started" has started and is working fine so I believe 2 are successful and 2 fail.

When did you create the tasks? They seem quite new (a bit more than 4h)

I can see that all are set as @daily, that picks a random time, and repeats every day, since created the trigger times are:

tools.smallem@tools-sgebastion-10:~$ kubectl get cronjob
NAME                         SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
smallem-transclusion-la.sh   19 14 * * *   False     0        <none>          4h20m
smallem-transclusion.sh      39 14 * * *   False     0        <none>          4h20m
smallem-wp.sh                27 3 * * *    False     0        <none>          4h20m
smallem-wq.sh                25 13 * * *   False     0        49m             4h20m

So two should trigger in the next hour (times are UTC), and one will trigger tomorrow at 3am. Watching the ones that should trigger soon

We could add that info to the jobs cli also:

tools.smallem@tools-sgebastion-10:~$ toolforge jobs list -o long
/usr/bin/toolforge-jobs:15: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import load_entry_point
Job name:                   Command:                                        Job type:         Image:      File log:    Output log:                                           Error log:                                            Emails:    Resources:    Mounts:    Retry:    Status:
--------------------------  ----------------------------------------------  ----------------  ----------  -----------  ----------------------------------------------------  ----------------------------------------------------  ---------  ------------  ---------  --------  ----------------------------------------
smallem-transclusion-la.sh  pyvenv/bin/python ./smallem-transclusion-la.sh  schedule: @daily  python3.11  yes          /data/project/smallem/smallem-transclusion-la.sh.out  /data/project/smallem/smallem-transclusion-la.sh.err  all        default       all        no        Waiting for scheduled time
smallem-transclusion.sh     pyvenv/bin/python ./smallem-transclusion.sh     schedule: @daily  python3.11  yes          /data/project/smallem/smallem-transclusion.sh.out     /data/project/smallem/smallem-transclusion.sh.err     all        default       all        no        Waiting for scheduled time
smallem-wp.sh               ./smallem-wp.sh                                 schedule: @daily  python3.11  yes          /data/project/smallem/smallem-wp.sh.out               /data/project/smallem/smallem-wp.sh.err               all        default       all        no        Waiting for scheduled time
smallem-wq.sh               ./smallem-wq.sh                                 schedule: @daily  python3.11  yes          /data/project/smallem/smallem-wq.sh.out               /data/project/smallem/smallem-wq.sh.err               all        default       all        no        Last schedule time: 2024-02-14T13:25:00Z

When did you create the tasks? They seem quite new (a bit more than 4h)

Fairly recently because I made a change to the command in the YAML file for some of them and reloaded.

So far, the said change looks to have fixed the problem with the two failing jobs because I got emails for them notifying me that they were completed successfully now (and for one of them I saw the edits done on-wiki on my bot's contributions page). That may mean there are no more problems to solve but I'm waiting for that last task to restart normally before saying that.

Normally the tasks are to be run monthly but I made them go for a daily run for testing purposes. (Actually the last task requires more than 1 day to finish so the daily run doesn't make sense for it as it will just be restarted in the middle of it, without reaching completion.)

When did you create the tasks? They seem quite new (a bit more than 4h)

Fairly recently because I made a change to the command in the YAML file for some of them and reloaded.

So far, the said change looks to have fixed the problem with the two failing jobs because I got emails for them notifying me that they were completed successfully now (and for one of them I saw the edits done on-wiki on my bot's contributions page). That may mean there are no more problems to solve but I'm waiting for that last task to restart normally before saying that.

Normally the tasks are to be run monthly but I made them go for a daily run for testing purposes. (Actually the last task requires more than 1 day to finish so the daily run doesn't make sense for it as it will just be restarted in the middle of it, without reaching completion.)

That sounds good :)

I'm hopeful!

As I'm writing this, all the 4 tasks have started and 3 of them have completed successfully (the last one takes more than one day to complete). The last step would be to try a monthly run but meanwhile I'm closing this ticket as resolved (and I'm hoping I won't need to open it again in one month).

Thank you for your swift interest in helping!