I've looked recently on how long it takes for a patch to pass CI, and I've noticed that it takes somewhere between 44 and 46 minutes for a WikibaseCirrusSearch patch to get through quibble-vendor-mysql-hhvm-docker. I think this is way too long a wait to validate a patch, and we should look into somehow reducing this time. This time cost is taken each time a patch is submitted (which sometimes requires several iterations, until all bugs, reviews and phpcs complaints are fixed) and waiting this long for each validation is IMHO inefficient. We should look at reducing these times.
Description
Related Objects
- Mentioned In
- T225730: Reduce runtime of MW shared gate Jenkins jobs to 5 min
- Mentioned Here
- T222023: wmf-quibble-vendor-mysql-hhvm-docker job sometime take 40+ minutes to run
T223971: Old cloudvirt (with Intel Xeon) are half the speed of newer ones (Intel Sky Lake)
T221434: Ensure we're testing appropriately and not over-testing across Wikimedia-deployed code
Event Timeline
See also: T221434, which this might be a dupe of.
The general issue is: running our tests is taking a long time for a number of reasons (eg: no clear "integration" vs "unit" test delineations uniformly enforced) and we don't want that.
Hey @Smalyshev , sorry I have delayed my reply to this request. Beside what Greg mentioned (we run every single tests from all the dependent extensions), there is another infrastructure related issue.
End of April, I have noticed the wmf-quibble-vendor-mysql-hhvm-docker (which is a different job and different set of repositories) was taking 40 minutes long. I have self filled/closed T222023 and just assumed it was a faulty WMCS instance and deleted it.
Later Kunal noticed that the Jenkins jobs to generate MediaWiki code coverage would sometime time out after 4 hours when it usually runs in two hours. The TLDR is that the oldest WMCS servers have bad CPU performances for some reason T223971. I have disabled the Jenkins instance running on those hosts.
I am suspecting the slow WikibaseCirrusSearch runs are related.
For the future: when a patchset is send the default is to run the HHVM based job. Then on Code-Review +2 run the PHP 7.0 - 7.2 jobs. Surely we should nowadays default to 7.2 which gives a faster feedback and move HHVM to just Code-Review+2.
I also think it's probably better to run regular patch set on 7.2 and do hhvm only on submits. Especially as we're migrating to 7.x in production. Would that speed things up or that's because of the old VMs?
OK, the main task is quibble-vendor-mysql-hhvm-docker; with recent changes, this is currently taking about 25 minutes in WikibaseCirrusSearch, which isn't great, but it's not as bad as when this was filed.
I'm not sure what improvements we can make ahead of the removal of HHVM from production (when this job will be removed entirely). We could switch the default "test" job from HHVM to PHP72 (we'd keep the HHVM job in the "submit" pipeline)?
Ok I am getting multiple builds taking 50+ minutes again for Wikibase, e.g.:
https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php73-docker/2752/
https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-docker/16913/
https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php70-docker/27655/
Overall job takes over 2 hours. And it needs to be done twice to submit a patch (and another two times if it needs to be backported) - so to do an urgent fix, I'd need to wait for several hours just for CI (and God help us if there's a bug in the patch and it needs to be amended). I don't think it's a good situation.
Recent example from https://gerrit.wikimedia.org/r/581007
[Mar 23 13:05] Patch Set 2: Code-Review+2
[Mar 23 13:27 (22min later)] Gate pipeline build succeeded. Change has been successfully merged by jenkins-bot.quibble-vendor-mysql-php72-noselenium-docker SUCCESS in 21m 41s quibble-vendor-mysql-php73-noselenium-docker SUCCESS in 17m 18s quibble-vendor-mysql-php74-noselenium-docker SUCCESS in 17m 28s mwgate-node10-docker SUCCESS in 40s quibble-vendor-selenium-docker SUCCESS in 13m 41s mwext-php72-phan-docker SUCCESS in 1m 17s mwext-php72-phan-seccheck-docker SUCCESS in 1m 25s wmf-quibble-vendor-mysql-php72-docker SUCCESS in 12m 42s wmf-quibble-selenium-php72-docker SUCCESS in 10m 47s
This looks much better and it only a few minutes slower than the standard gate jobs for wmf extensions (12-15min currently vs 17-22min)