Isolate the observed mysqli/mysqlnd memory leak in the PHP source, and propose a fix and/or workaround.
Description
Event Timeline
As discussed at T296098, mw1414 is serving no requests but has high memory usage. Analysis of /proc/<pid>/maps shows that most of the memory usage is in the sbrk() segment which is only used by malloc(), not emalloc(), opcache or APCu. Dumping the segment from a random process and looking at random parts of it showed field name strings:
[2240][tstarling@mw1414:~]$ tail -c+200m heap.dump | head -c16384 | hexdump -C 00000000 80 07 00 00 00 00 00 00 00 70 61 67 65 5f 69 64 |.........page_id| 00000010 00 65 73 70 61 63 65 00 6d 31 00 00 00 00 00 00 |.espace.m1......| 00000020 00 01 00 00 00 06 01 00 00 20 66 ee fa 68 e7 1a |......... f..h..| 00000030 80 08 00 00 00 00 00 00 00 70 61 67 65 5f 6c 65 |.........page_le| 00000040 6e 00 20 01 14 05 00 00 00 41 00 00 00 00 00 00 |n. ......A......| 00000050 00 01 00 00 00 06 01 00 00 6c 4e 83 cc 4f 9c b3 |.........lN..O..| 00000060 e3 12 00 00 00 00 00 00 00 70 61 67 65 5f 63 6f |.........page_co|
Also, the rate of increase of memory usage seems to correlate with the DB query rate.
MediaWiki uses MySQLi, and MySQLi uses mysqlnd. The mysqlnd documentation claims to use emalloc(), however for persistent connections, malloc() is used, and mysqli has:
/* We create always persistent, as if the user want to connecto to p:somehost, we can't convert the handle then */ if (!(mysql->mysql = mysqlnd_init(MYSQLND_CLIENT_KNOWS_RSET_COPY_DATA, TRUE)))
This is called from mysqli_init(), but the library doesn't know whether a persistent connection will be required until mysqli_real_connect() is called, so it just makes all connections be persistent. That's a design error.
This explains why malloc() is called. We don't use persistent connections, so a workaround would be to just patch this out. Then results would be freed when PHP frees the request-local allocator.
I will try to figure out how the results get leaked, but I haven't got a reproduction procedure yet. It's not as simple as just calling $dbr->query(), since I can do that in a tight loop in eval.php without any memory usage increase.
I think the things being leaked are zend_string objects allocated by mysqlnd_wireprotocol.c line 1380. In the core dump they have a reference count of 1 but no reference to them exists:
(gdb) x/s 0x56391af11008 0x56391af11008: "page_len" (gdb) print/x 0x56391af11008-0x18 $14 = 0x56391af10ff0 (gdb) print *(zend_string*)$14 $15 = {gc = {refcount = 1, u = {v = {type = 6 '\006', flags = 1 '\001', gc_info = 0}, type_info = 262}}, h = 9230944824321795616, len = 8, val = "p"} (gdb) find/g 0x563908311000, 0x563920761000, 0x56391af10ff0 Pattern not found.
Normally the reference count should be reduced to zero by php_mysqlnd_free_field_metadata() so that the string is freed.
It's a fairly unusual situation to have reference-counted persistent strings exposed to userspace. A workaround might be to have MysqliResultWrapper clone the strings.
Mentioned in SAL (#wikimedia-operations) [2021-12-14T16:43:48Z] <Amir1> rolling restart of php-fpm on all mediawiki hosts (T297517 T297667)
So for now, I ran a rolling restart to mitigate the issue: sudo cumin -b 5 -s 30 'A:mw or A:parsoid' 'restart-php7.2-fpm' we probably need to run this once in while until the underlying issue resolved.
Reduced test case:
<?php mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT); class C { public static $row; } $conn = mysqli_init(); $conn->real_connect($argv[1], $argv[2], $argv[3], $argv[4]); C::$row = $conn->query('SELECT 1 as asdfg')->fetch_object(); var_dump( C::$row );
With PHP 7.2 and valgrind this shows a leak from mysqlnd_wireprotocol.c:1380.
PHP's "fast shutdown" does not call the free_storage handler for objects with the standard free_obj handler. The theory is that such objects only contain memory allocated with emalloc() which can be freed in bulk. This assumption is broken by mysqli which puts reference-counted persistent strings into a stdClass with a standard free_obj handler.
The test case does not appear to work in PHP 7.4, so I'm now looking at PHP versions.
The fix was a7305eb539596e175bd6c3ae9a20953358c5d677, which is in PHP 7.3, which caused mysqlnd to use the request-local allocator for result data even for persistent connections.
Mentioned in SAL (#wikimedia-operations) [2021-12-15T11:35:39Z] <_joe_> uploading php 7.2 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf4 to buster T297667
Mentioned in SAL (#wikimedia-operations) [2021-12-15T11:36:51Z] <_joe_> upgrading php7.2 on mw1414, T297667
Mentioned in SAL (#wikimedia-operations) [2021-12-15T11:39:57Z] <_joe_> repooling mw1414 T297667
Update: I rebuilt php 7.2 with Tim's backport, published it to our apt repository, and upgraded mw1414. Then repooled it, and restarted php-fpm on mw1415 about 10 minutes later to provide a baseline comparison of memory usage. Tomorrow I'll revisit the situation.
There are 20-30 segfaults per hour, so there's probably a bug with my backport. I'll see if I can figure it out. For now, don't upgrade the rest of the cluster.
I collected 37 segfault backtraces with systemd-coredump. All indicated that the problem is the lack of Dmitry's followup commit 5f47ce9127ec9123ba2359bb05cf180c8a4177b6. So I added that to my mysqlnd-leak-7.2-backport branch as 5f5ec42235cc15eacd86abe55ebb33bfb677a2ce. I'll do some more local testing but hopefully it will work now.
Memory usage has been flat for 9 hours.
Mentioned in SAL (#wikimedia-operations) [2021-12-16T00:19:27Z] <legoktm> uploaded 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf5 to buster-wikimedia for T297667
Mentioned in SAL (#wikimedia-operations) [2021-12-16T00:22:55Z] <legoktm> upgraded php7.2 on mw1414 for mysqlnd memory leak fix part 2 (T297667)
No new segfaults since I did the upgrade and beforehand it was pretty regular:
legoktm@mw1414:/var/log$ grep segfault syslog Dec 16 00:00:32 mw1414 kernel: [8063741.038851] php-fpm7.2[43416]: segfault at 7fd328000000 ip 00007fd4ee7059ef sp 00007ffded396d50 error 4 in libc-2.28.so[7fd4ee6a3000+148000] Dec 16 00:02:53 mw1414 kernel: [8063882.857769] php-fpm7.2[20116]: segfault at 7fd328000000 ip 00007fd4ee7059ef sp 00007ffded396d50 error 4 in libc-2.28.so[7fd4ee6a3000+148000] Dec 16 00:02:59 mw1414 kernel: [8063888.010164] php-fpm7.2[34102]: segfault at 7fd320000000 ip 00007fd4ee7059ef sp 00007ffded396d50 error 4 in libc-2.28.so[7fd4ee6a3000+148000] Dec 16 00:12:45 mw1414 kernel: [8064474.651297] php-fpm7.2[23280]: segfault at 7fd328000000 ip 00007fd4ee7059ef sp 00007ffded396d50 error 4 in libc-2.28.so[7fd4ee6a3000+148000] Dec 16 00:13:47 mw1414 kernel: [8064536.588552] php-fpm7.2[47657]: segfault at 7fd328000000 ip 00007fd4ee7059ef sp 00007ffded396d50 error 4 in libc-2.28.so[7fd4ee6a3000+148000] Dec 16 00:15:18 mw1414 kernel: [8064627.425579] php-fpm7.2[19648]: segfault at 7fd328000000 ip 00007fd4ee7059ef sp 00007ffded396d50 error 4 in libc-2.28.so[7fd4ee6a3000+148000] Dec 16 00:18:06 mw1414 kernel: [8064795.661468] php-fpm7.2[34073]: segfault at 7fd320000000 ip 00007fd4ee7059ef sp 00007ffded396d50 error 4 in libc-2.28.so[7fd4ee6a3000+148000] Dec 16 00:20:19 mw1414 kernel: [8064928.327696] php-fpm7.2[16256]: segfault at 7fd328000000 ip 00007fd4ee7059ef sp 00007ffded396d50 error 4 in libc-2.28.so[7fd4ee6a3000+148000]
Can confirm no new segfaults since them (and, if any would happen, we need first of all to check if it happens always in the same address, as php has its own sizeable amount of segfaults anyways.
If memory usage stays flat, I'll install this new version on all canaries today.
Interestingly, after the second patch, I think the memory usage is steadily increasing again:
So we'll have to check after a few more days how things go compared to mw1415.
Yeah, it turns out segfaulting once every couple of hours keeps a lid on memory usage. I did a linear regression of the data from 2021-12-16 04:00 to 21:50. mw1414 is leaking 122 MB/hour, and mw1415 is leaking 600MB/hour. Most likely there is a second smaller memory leak, and the task as described is resolved. I would suggest rolling out the patch. This is my last day before vacation. I can do another core dump analysis in the new year.
Yes I agree, it's clearly an effect of workers regularly crashing that we had no memory increase. And I also agree we can resolve this task as soon as I roll out to all appservers on tuesday. In practice, with the new leak rate we can go weeks before memory starvation sets in, and by that point either opcache limits or apcu fragmentation will have kicked in the restart of php-fpm anyways.
I don't think we need to solve every memleak in php 7.2, we should hold our resources for the memleaks we'll find in 7.4 next quarter.
I just upgraded all the canary appservers, will proceed with the rest of the canaries now (which includes parsoid and the api clusters). If after inspection tomorrow I don't see major issues, I'll proceed with the rest of the clusters.
Mentioned in SAL (#wikimedia-operations) [2021-12-21T14:57:30Z] <_joe_> upgrading php 7.2 everywhere, T297667