[go: up one dir, main page]

Page MenuHomePhabricator

PHP Warning: curl_multi_remove_handle(): supplied resource is not a valid cURL Multi Handle resource
Closed, ResolvedPublicBUG REPORT

Description

What is the problem?

In the beta logs over the last hour or so, I am seeing hundreds of errors of the form:

2021-08-11 14:11:25 [de7d9353e2adce7cd2141614] deployment-mwmaint01 enwiki 1.37.0-alpha error ERROR: [de7d9353e2adce7cd2141614] [no req]   PHP Warning: curl_multi_remove_handle(): supplied resource is not a valid cURL Multi Handle resource {"exception_url":"[no req]","reqId":"de7d9353e2adce7cd2141614","caught_by":"mwe_handler"} 
[Exception ErrorException] (/srv/mediawiki/php-master/includes/libs/http/MultiHttpClient.php:257) PHP Warning: curl_multi_remove_handle(): supplied resource is not a valid cURL Multi Handle resource
  #0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
  #1 /srv/mediawiki/php-master/includes/libs/http/MultiHttpClient.php(257): curl_multi_remove_handle(resource (closed), resource)
  #2 /srv/mediawiki/php-master/includes/libs/http/MultiHttpClient.php(189): MultiHttpClient->runMultiCurl(array, array)
  #3 /srv/mediawiki/php-master/extensions/EventBus/includes/EventBus.php(244): MultiHttpClient->runMulti(array, array)
  #4 /srv/mediawiki/php-master/extensions/EventBus/includes/Adapters/Monolog/EventBusMonologHandler.php(63): MediaWiki\Extension\EventBus\EventBus->send(array)
  #5 /srv/mediawiki/php-master/includes/deferred/MWCallableUpdate.php(38): MediaWiki\Extension\EventBus\Adapters\Monolog\EventBusMonologHandler->MediaWiki\Extension\EventBus\Adapters\Monolog\{closure}()
  #6 /srv/mediawiki/php-master/includes/deferred/DeferredUpdates.php(515): MWCallableUpdate->doUpdate()
  #7 /srv/mediawiki/php-master/includes/deferred/DeferredUpdates.php(391): DeferredUpdates::attemptUpdate(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti)
  #8 /srv/mediawiki/php-master/includes/deferred/DeferredUpdates.php(221): DeferredUpdates::run(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti, Monolog\Logger, BufferingStatsdDataFactory, string)
  #9 /srv/mediawiki/php-master/includes/deferred/DeferredUpdatesScope.php(267): DeferredUpdates::{closure}(MWCallableUpdate, integer)
  #10 /srv/mediawiki/php-master/includes/deferred/DeferredUpdatesScope.php(196): DeferredUpdatesScope->processStageQueue(integer, integer, Closure)
  #11 /srv/mediawiki/php-master/includes/deferred/DeferredUpdates.php(242): DeferredUpdatesScope->processUpdates(integer, Closure)
  #12 /srv/mediawiki/php-master/includes/deferred/DeferredUpdates.php(290): DeferredUpdates::doUpdates(string, integer)
  #13 /srv/mediawiki/php-master/includes/deferred/DeferredUpdates.php(129): DeferredUpdates::tryOpportunisticExecute(string)
  #14 /srv/mediawiki/php-master/includes/deferred/DeferredUpdates.php(146): DeferredUpdates::addUpdate(MWCallableUpdate, integer)
  #15 /srv/mediawiki/php-master/extensions/EventBus/includes/Adapters/Monolog/EventBusMonologHandler.php(64): DeferredUpdates::addCallableUpdate(Closure)
  #16 /srv/mediawiki/php-master/vendor/monolog/monolog/src/Monolog/Handler/AbstractProcessingHandler.php(42): MediaWiki\Extension\EventBus\Adapters\Monolog\EventBusMonologHandler->write(array)
  #17 /srv/mediawiki/php-master/vendor/monolog/monolog/src/Monolog/Handler/Handler.php(27): Monolog\Handler\AbstractProcessingHandler->handle(array)
  #18 /srv/mediawiki/php-master/vendor/monolog/monolog/src/Monolog/Handler/BufferHandler.php(93): Monolog\Handler\Handler->handleBatch(array)
  #19 /srv/mediawiki/php-master/vendor/monolog/monolog/src/Monolog/Handler/BufferHandler.php(109): Monolog\Handler\BufferHandler->flush()
  #20 /srv/mediawiki/php-master/vendor/monolog/monolog/src/Monolog/Handler/GroupHandler.php(110): Monolog\Handler\BufferHandler->close()
  #21 /srv/mediawiki/php-master/vendor/monolog/monolog/src/Monolog/Handler/Handler.php(41): Monolog\Handler\GroupHandler->close()
  #22 [internal function]: Monolog\Handler\Handler->__destruct()
  #23 {main}

Event Timeline

ori triaged this task as High priority.Aug 10 2022, 8:37 PM

This is spamming error.log in production.

I'm not too up to date on how MW Monolog stuff works, but here's what I know.

There are two monolog channels configured in InitialiseSettings.php, 'api-request' and 'cirrussearch-request', that set 'eventbus' => 'debug'.

For api-request, ApiMain's logRequest constructs a structured and unstructured version of the api request log and then logs the structured one to the 'api-request' channel. The same should be true of cirrussearch-request, in the CirrusSearch extension.

Then, in logging.php, a $wmfMonoLogConfig eventbus handler is configured to send to the eventgate-analytics EventService. This 'eventgate-analytics' is a key into the wgEventServices config, for which values like the url where to actually send events is a key in ProductionServices.php or LabsServices.php.

In prod, eventgate-analytics is http://localhost:6004, which IIRC is an envoy proxy (sidecar?) that routes local requests to the eventgate-analytics.discovery.wmnet service in production. In beta, it is configured explicitly as 'http://deployment-eventgate-4.deployment-prep.eqiad1.wikimedia.cloud:8192'.

As far as I can tell, there hasn't been any significant change in event throughput for api-request or cirrussearch-request in the last 3 months, so it looks like most of the time this is all working.

This is a warning though? Can you drop a logstash link to one of these errors? I just tried to find it myself and totally failed due to lack of logstash-fu.

@Ottomata Please see /srv/mw-log/error.log on mwlog1002:

hoo@mwlog1002:/srv/mw-log$ grep -c curl_multi_remove_handle error.log
232326

I have been trying to figure out why logspam has been running so slowly lately. I discovered that there are a ton (879,681 hits today) of PHP Warning: curl_multi_remove_handle(): supplied resource is not a valid cURL Multi Handle resource warnings in /srv/mw-log/error.log for mwmaint1002. They look like this:

2023-09-02 18:22:23.400717 [de8b82d35cf22966cf5333b9] mwmaint1002 idwiki 1.41.0-wmf.24 error ERROR: [de8b82d35cf22966cf5333b9] [no req]   PHP Warning: curl_multi_info_read(): supplied resource is not a valid cURL Multi Handle resource {"exception_url":"[no req]","reqId":"de8b82d35cf22966cf5333b9","caught_by":"mwe_handler"}
[Exception ErrorException] (/srv/mediawiki/php-1.41.0-wmf.24/includes/libs/http/MultiHttpClient.php:272) PHP Warning: curl_multi_info_read(): supplied resource is not a valid cURL Multi Handle resource
  #0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
  #1 /srv/mediawiki/php-1.41.0-wmf.24/includes/libs/http/MultiHttpClient.php(272): curl_multi_info_read(resource (closed))
  #2 /srv/mediawiki/php-1.41.0-wmf.24/includes/libs/http/MultiHttpClient.php(217): MultiHttpClient->runMultiCurl(array, array)
  #3 /srv/mediawiki/php-1.41.0-wmf.24/extensions/EventBus/includes/EventBus.php(243): MultiHttpClient->runMulti(array, array)
  #4 /srv/mediawiki/php-1.41.0-wmf.24/extensions/EventBus/includes/Adapters/Monolog/EventBusMonologHandler.php(63): MediaWiki\Extension\EventBus\EventBus->send(array)
  #5 /srv/mediawiki/php-1.41.0-wmf.24/includes/deferred/MWCallableUpdate.php(38): MediaWiki\Extension\EventBus\Adapters\Monolog\EventBusMonologHandler->MediaWiki\Extension\EventBus\Adapters\Monolog\{closure}()
  #6 /srv/mediawiki/php-1.41.0-wmf.24/includes/deferred/DeferredUpdates.php(498): MWCallableUpdate->doUpdate()
  #7 /srv/mediawiki/php-1.41.0-wmf.24/includes/deferred/DeferredUpdates.php(192): DeferredUpdates::attemptUpdate(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti)
  #8 /srv/mediawiki/php-1.41.0-wmf.24/includes/deferred/DeferredUpdates.php(285): DeferredUpdates::run(MWCallableUpdate, string)
  #9 /srv/mediawiki/php-1.41.0-wmf.24/includes/deferred/DeferredUpdatesScope.php(267): DeferredUpdates::{closure}(MWCallableUpdate, integer)
  #10 /srv/mediawiki/php-1.41.0-wmf.24/includes/deferred/DeferredUpdatesScope.php(196): DeferredUpdatesScope->processStageQueue(integer, integer, Closure)
  #11 /srv/mediawiki/php-1.41.0-wmf.24/includes/deferred/DeferredUpdates.php(306): DeferredUpdatesScope->processUpdates(integer, Closure)
  #12 /srv/mediawiki/php-1.41.0-wmf.24/includes/deferred/DeferredUpdates.php(372): DeferredUpdates::doUpdates(integer)
  #13 /srv/mediawiki/php-1.41.0-wmf.24/includes/deferred/DeferredUpdates.php(142): DeferredUpdates::tryOpportunisticExecute()
  #14 /srv/mediawiki/php-1.41.0-wmf.24/includes/deferred/DeferredUpdates.php(159): DeferredUpdates::addUpdate(MWCallableUpdate, integer)
  #15 /srv/mediawiki/php-1.41.0-wmf.24/extensions/EventBus/includes/Adapters/Monolog/EventBusMonologHandler.php(64): DeferredUpdates::addCallableUpdate(Closure)
  #16 /srv/mediawiki/php-1.41.0-wmf.24/vendor/monolog/monolog/src/Monolog/Handler/AbstractProcessingHandler.php(42): MediaWiki\Extension\EventBus\Adapters\Monolog\EventBusMonologHandler->write(array)
  #17 /srv/mediawiki/php-1.41.0-wmf.24/vendor/monolog/monolog/src/Monolog/Handler/Handler.php(27): Monolog\Handler\AbstractProcessingHandler->handle(array)
  #18 /srv/mediawiki/php-1.41.0-wmf.24/vendor/monolog/monolog/src/Monolog/Handler/BufferHandler.php(93): Monolog\Handler\Handler->handleBatch(array)
  #19 /srv/mediawiki/php-1.41.0-wmf.24/vendor/monolog/monolog/src/Monolog/Handler/BufferHandler.php(109): Monolog\Handler\BufferHandler->flush()
  #20 /srv/mediawiki/php-1.41.0-wmf.24/vendor/monolog/monolog/src/Monolog/Handler/GroupHandler.php(110): Monolog\Handler\BufferHandler->close()
  #21 /srv/mediawiki/php-1.41.0-wmf.24/vendor/monolog/monolog/src/Monolog/Handler/Handler.php(41): Monolog\Handler\GroupHandler->close()
  #22 [internal function]: Monolog\Handler\Handler->__destruct()
  #23 {main}

Impact:
This makes programs (such as logspam) that read /srv/mw-log/error.log take a lot longer to operate. A one-hour portion of /srv/mw-log/error.log is now about 800MB of data.

Extra problem:
I cannot find a way to get visibility into these log messages in https://logstash.wikimedia.org/. Assistance would be appreciated. @colewhite
can you provide advice?

Change 957894 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] MultiHttpClient: Unset $this->cmh after closing it

https://gerrit.wikimedia.org/r/957894

^ A shot in the dark, but worth a try, I think…

During request shutdown, zend_objects_store_call_destructors() calls destructors in ascending order of object ID. It doesn't attempt to follow the reference tree. So it's unsafe to do anything complex or non-local at this stage, since indeed, destructors of properties may be called before the destructor of the parent object.

I was able to reproduce this locally with configuration code like

$wgMWLoggerDefaultSpi['args'][0]['handlers']['eventbus-buffered'] = [
	'class' => \Monolog\Handler\WhatFailureGroupHandler::class,
	'args' => [
		static function () {
			return [
				new \MediaWiki\Logger\Monolog\BufferHandler(
					new EventBusMonologHandler( 'log' )
				)
			];
		}
	]
];

$wgExtensionFunctions[] = function () {
	EventBus::getInstance( 'log' );
	LoggerFactory::getInstance( 'eventbus' )->warning( 'test' );
	EventBus::getInstance( 'log' )->send( 'test' );
};

$wgEventServices['log'] = [
	'url' => 'http://localhost/fake-eventbus'
];

It's not necessary to install anything special at http://localhost/fake-eventbus

It's a bit fiddly but by dumping the spl_object_id() of the various objects involved, you can arrange their initialisation order such that destruction in order of ID will trigger the bug.

It was necessary to use PHP 7.4 since PHP 8.1 has the CurlMultiHandle class. Also, I had to use CLI mode since it depends on tryOpportunisticExecute().

EventBus logs are supposed to be flushed only once, very early in request shutdown from a DeferredUpdate. BufferHandler has an empty __destruct() which is supposed to prevent a second flush on shutdown. This is defeated in production by wrapping the BufferHandler in a GroupHandler. GroupHandler extends the Handler base class, which has __destruct() which calls close(), which calls flush(). That's why flush() is called twice.

Primarily it's a monolog bug. Monolog should not be flushing logs from destructors.

It could also be fixed in Monolog by ignoring the second call to close(). I proposed that at https://github.com/Seldaek/monolog/pull/1837

The MultiHttpClient patch seems prudent but may not be sufficient.

Change 957894 merged by jenkins-bot:

[mediawiki/core@master] MultiHttpClient: Unset $this->cmh after closing it

https://gerrit.wikimedia.org/r/957894

Change 958408 had a related patch set uploaded (by Reedy; author: Lucas Werkmeister (WMDE)):

[mediawiki/core@REL1_40] MultiHttpClient: Unset $this->cmh after closing it

https://gerrit.wikimedia.org/r/958408

Change 958409 had a related patch set uploaded (by Reedy; author: Lucas Werkmeister (WMDE)):

[mediawiki/core@REL1_39] MultiHttpClient: Unset $this->cmh after closing it

https://gerrit.wikimedia.org/r/958409

Change 958409 merged by jenkins-bot:

[mediawiki/core@REL1_39] MultiHttpClient: Unset $this->cmh after closing it

https://gerrit.wikimedia.org/r/958409

Change 958408 merged by jenkins-bot:

[mediawiki/core@REL1_40] MultiHttpClient: Unset $this->cmh after closing it

https://gerrit.wikimedia.org/r/958408

Extra problem:
I cannot find a way to get visibility into these log messages in https://logstash.wikimedia.org/. Assistance would be appreciated. @colewhite can you provide advice?

I'd venture to guess that these logs match an rsyslog rule that specifically forwards these logs to mwlog, but bypasses writing them locally or to logstash. A guess as to what rule here.

Hm, apparently the warnings haven’t exactly gone down in production yet:

lucaswerkmeister-wmde@mwlog2002:~$ zgrep -ci 'curl.multi' /srv/mw-log/error.log /srv/mw-log/archive/error.log-202309*
/srv/mw-log/error.log:0
/srv/mw-log/archive/error.log-20230901.gz:17250629
/srv/mw-log/archive/error.log-20230902.gz:17352686
/srv/mw-log/archive/error.log-20230903.gz:17304773
/srv/mw-log/archive/error.log-20230904.gz:17288342
/srv/mw-log/archive/error.log-20230905.gz:17257363
/srv/mw-log/archive/error.log-20230906.gz:17298615
/srv/mw-log/archive/error.log-20230907.gz:17386129
/srv/mw-log/archive/error.log-20230908.gz:17382321
/srv/mw-log/archive/error.log-20230909.gz:17279528
/srv/mw-log/archive/error.log-20230910.gz:17342442
/srv/mw-log/archive/error.log-20230911.gz:17287858
/srv/mw-log/archive/error.log-20230912.gz:17550887
/srv/mw-log/archive/error.log-20230913.gz:17589079
/srv/mw-log/archive/error.log-20230914.gz:17628869
/srv/mw-log/archive/error.log-20230915.gz:17667626
/srv/mw-log/archive/error.log-20230916.gz:17567593
/srv/mw-log/archive/error.log-20230917.gz:17521113
/srv/mw-log/archive/error.log-20230918.gz:17577762
/srv/mw-log/archive/error.log-20230919.gz:17500330
/srv/mw-log/archive/error.log-20230920.gz:20897931
/srv/mw-log/archive/error.log-20230921.gz:23602008
/srv/mw-log/archive/error.log-20230922:17617891

There were significantly more warnings yesterday and two days ago, and the 2023-09-22 number is probably only lower (in the same ballpark as last week) because the day isn’t done yet.

However, those warnings apparently come from wmf.26, not wmf.27, even though wmf.27 ought to be fully rolled out today:

lucaswerkmeister-wmde@mwlog2002:~$ grep 'wmf\... ' /srv/mw-log/archive/error.log-20230922 | head
2023-09-21 00:27:22.840260 [e69c4e40dad57c17da905027] mwmaint2002 lawiki 1.41.0-wmf.26 error ERROR: [e69c4e40dad57c17da905027] [no req]   PHP Warning: curl_multi_setopt(): supplied resource is not a valid cURL Multi Handle resource {"exception_url":"[no req]","reqId":"e69c4e40dad57c17da905027","caught_by":"mwe_handler"} 
2023-09-21 00:27:22.840717 [e69c4e40dad57c17da905027] mwmaint2002 lawiki 1.41.0-wmf.26 error ERROR: [e69c4e40dad57c17da905027] [no req]   PHP Warning: curl_multi_add_handle(): supplied resource is not a valid cURL Multi Handle resource {"exception_url":"[no req]","reqId":"e69c4e40dad57c17da905027","caught_by":"mwe_handler"} 
2023-09-21 00:27:22.840953 [e69c4e40dad57c17da905027] mwmaint2002 lawiki 1.41.0-wmf.26 error ERROR: [e69c4e40dad57c17da905027] [no req]   PHP Warning: curl_multi_exec(): supplied resource is not a valid cURL Multi Handle resource {"exception_url":"[no req]","reqId":"e69c4e40dad57c17da905027","caught_by":"mwe_handler"} 
2023-09-21 00:27:22.841160 [e69c4e40dad57c17da905027] mwmaint2002 lawiki 1.41.0-wmf.26 error ERROR: [e69c4e40dad57c17da905027] [no req]   PHP Warning: curl_multi_info_read(): supplied resource is not a valid cURL Multi Handle resource {"exception_url":"[no req]","reqId":"e69c4e40dad57c17da905027","caught_by":"mwe_handler"} 
2023-09-21 00:27:22.841361 [e69c4e40dad57c17da905027] mwmaint2002 lawiki 1.41.0-wmf.26 error ERROR: [e69c4e40dad57c17da905027] [no req]   PHP Warning: curl_multi_remove_handle(): supplied resource is not a valid cURL Multi Handle resource {"exception_url":"[no req]","reqId":"e69c4e40dad57c17da905027","caught_by":"mwe_handler"} 
2023-09-21 00:27:22.841835 [e69c4e40dad57c17da905027] mwmaint2002 lawiki 1.41.0-wmf.26 error ERROR: [e69c4e40dad57c17da905027] [no req]   PHP Warning: curl_multi_setopt(): supplied resource is not a valid cURL Multi Handle resource {"exception_url":"[no req]","reqId":"e69c4e40dad57c17da905027","caught_by":"mwe_handler"} 
2023-09-21 00:27:22.842065 [e69c4e40dad57c17da905027] mwmaint2002 lawiki 1.41.0-wmf.26 error ERROR: [e69c4e40dad57c17da905027] [no req]   PHP Warning: curl_multi_add_handle(): supplied resource is not a valid cURL Multi Handle resource {"exception_url":"[no req]","reqId":"e69c4e40dad57c17da905027","caught_by":"mwe_handler"} 
2023-09-21 00:27:22.842262 [e69c4e40dad57c17da905027] mwmaint2002 lawiki 1.41.0-wmf.26 error ERROR: [e69c4e40dad57c17da905027] [no req]   PHP Warning: curl_multi_exec(): supplied resource is not a valid cURL Multi Handle resource {"exception_url":"[no req]","reqId":"e69c4e40dad57c17da905027","caught_by":"mwe_handler"} 
2023-09-21 00:27:22.842451 [e69c4e40dad57c17da905027] mwmaint2002 lawiki 1.41.0-wmf.26 error ERROR: [e69c4e40dad57c17da905027] [no req]   PHP Warning: curl_multi_info_read(): supplied resource is not a valid cURL Multi Handle resource {"exception_url":"[no req]","reqId":"e69c4e40dad57c17da905027","caught_by":"mwe_handler"} 
2023-09-21 00:27:22.842638 [e69c4e40dad57c17da905027] mwmaint2002 lawiki 1.41.0-wmf.26 error ERROR: [e69c4e40dad57c17da905027] [no req]   PHP Warning: curl_multi_remove_handle(): supplied resource is not a valid cURL Multi Handle resource {"exception_url":"[no req]","reqId":"e69c4e40dad57c17da905027","caught_by":"mwe_handler"} 

I assume this is a longer-running maintenance script (you can see mwmaint2002 in those messages). So I guess we should check back in a few more days whether the errors have gone away or not…

The messages went away \o/ \o/ \o/

lucaswerkmeister-wmde@mwlog2002:~$ zgrep -ci 'curl.multi' /srv/mw-log/{archive/error.log-2023092{2..5}*,error.log}
/srv/mw-log/archive/error.log-20230922.gz:17617891
/srv/mw-log/archive/error.log-20230923.gz:0
/srv/mw-log/archive/error.log-20230924.gz:0
/srv/mw-log/archive/error.log-20230925:0
/srv/mw-log/error.log:0

(In Beta, /srv/mw-log/error.log’s last entry dates from 2023-09-07, even though the file was supposedly modified just an hour ago according to stat; I don’t really understand it.)

Yay! Thanks for taking care of this. logspam is working much better now.

@tstarling do you think we should leave this open for the Monolog PR or is it okay to close?

tstarling claimed this task.