[go: up one dir, main page]

Page MenuHomePhabricator

PHP Warning: Erroneous data format for unserializing 'Wikimedia\Rdbms\MySQLPrimaryPos'
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   PHP Warning: Erroneous data format for unserializing 'Wikimedia\Rdbms\MySQLPrimaryPos'
exception.trace
from /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/objectcache/MemcachedPeclBagOStuff.php(196)
#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/objectcache/MemcachedPeclBagOStuff.php(196): Memcached->get(string)
#2 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/objectcache/MediumSpecificBagOStuff.php(117): MemcachedPeclBagOStuff->doGet(string, integer)
#3 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/ChronologyProtector.php(485): MediumSpecificBagOStuff->get(string)
#4 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/ChronologyProtector.php(405): Wikimedia\Rdbms\ChronologyProtector->lazyStartup()
#5 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/ChronologyProtector.php(266): Wikimedia\Rdbms\ChronologyProtector->getStartupSessionPositions()
#6 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/lbfactory/LBFactory.php(771): Wikimedia\Rdbms\ChronologyProtector->applySessionReplicationPosition(Wikimedia\Rdbms\LoadBalancer)
#7 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1452): Wikimedia\Rdbms\LBFactory->Wikimedia\Rdbms\{closure}(Wikimedia\Rdbms\LoadBalancer)
#8 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/loadbalancer/LoadBalancer.php(561): Wikimedia\Rdbms\LoadBalancer->lazyLoadReplicationPositions()
#9 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/loadbalancer/LoadBalancer.php(514): Wikimedia\Rdbms\LoadBalancer->getReaderIndex(string, string)
#10 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/loadbalancer/LoadBalancer.php(939): Wikimedia\Rdbms\LoadBalancer->getConnectionIndex(integer, array, string)
#11 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/database/DBConnRef.php(95): Wikimedia\Rdbms\LoadBalancer->getConnectionInternal(integer, array, string, integer)
#12 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/database/DBConnRef.php(101): Wikimedia\Rdbms\DBConnRef->ensureConnection()
#13 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/database/DBConnRef.php(351): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#14 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/querybuilder/SelectQueryBuilder.php(686): Wikimedia\Rdbms\DBConnRef->selectRow(array, array, array, string, array, array)
#15 /srv/mediawiki/php-1.39.0-wmf.26/includes/user/UserSelectQueryBuilder.php(191): Wikimedia\Rdbms\SelectQueryBuilder->fetchRow()
#16 /srv/mediawiki/php-1.39.0-wmf.26/includes/user/ActorStore.php(237): MediaWiki\User\UserSelectQueryBuilder->fetchUserIdentity()
#17 /srv/mediawiki/php-1.39.0-wmf.26/includes/user/User.php(937): MediaWiki\User\ActorStore->getUserIdentityByName(string, integer)
#18 /srv/mediawiki/php-1.39.0-wmf.26/extensions/CentralAuth/includes/User/CentralAuthUser.php(2165): User::idFromName(string)
#19 /srv/mediawiki/php-1.39.0-wmf.26/extensions/CentralAuth/includes/User/CentralAuthUser.php(2245): MediaWiki\Extension\CentralAuth\User\CentralAuthUser->canAuthenticate()
#20 /srv/mediawiki/php-1.39.0-wmf.26/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php(212): MediaWiki\Extension\CentralAuth\User\CentralAuthUser->authenticateWithToken(string)
#21 /srv/mediawiki/php-1.39.0-wmf.26/includes/session/SessionManager.php(537): CentralAuthSessionProvider->provideSessionInfo(WebRequest)
#22 /srv/mediawiki/php-1.39.0-wmf.26/includes/session/SessionManager.php(243): MediaWiki\Session\SessionManager->getSessionInfoForRequest(WebRequest)
#23 /srv/mediawiki/php-1.39.0-wmf.26/includes/WebRequest.php(843): MediaWiki\Session\SessionManager->getSessionForRequest(WebRequest)
#24 /srv/mediawiki/php-1.39.0-wmf.26/includes/session/SessionManager.php(164): WebRequest->getSession()
#25 /srv/mediawiki/php-1.39.0-wmf.26/includes/Setup.php(448): MediaWiki\Session\SessionManager::getGlobalSession()
#26 /srv/mediawiki/php-1.39.0-wmf.26/includes/WebStart.php(86): require_once(string)
#27 /srv/mediawiki/php-1.39.0-wmf.26/api.php(43): require(string)
#28 /srv/mediawiki/w/api.php(3): require(string)
#29 {main}

Immediately followed by PHP Warning: Memcached::get(): could not unserialize value

from /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/objectcache/MemcachedPeclBagOStuff.php(196)
#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/objectcache/MemcachedPeclBagOStuff.php(196): Memcached->get(string)
#2 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/objectcache/MediumSpecificBagOStuff.php(117): MemcachedPeclBagOStuff->doGet(string, integer)
#3 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/ChronologyProtector.php(485): MediumSpecificBagOStuff->get(string)
#4 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/ChronologyProtector.php(405): Wikimedia\Rdbms\ChronologyProtector->lazyStartup()
#5 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/ChronologyProtector.php(266): Wikimedia\Rdbms\ChronologyProtector->getStartupSessionPositions()
#6 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/lbfactory/LBFactory.php(771): Wikimedia\Rdbms\ChronologyProtector->applySessionReplicationPosition(Wikimedia\Rdbms\LoadBalancer)
#7 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1452): Wikimedia\Rdbms\LBFactory->Wikimedia\Rdbms\{closure}(Wikimedia\Rdbms\LoadBalancer)
#8 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/loadbalancer/LoadBalancer.php(561): Wikimedia\Rdbms\LoadBalancer->lazyLoadReplicationPositions()
#9 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/loadbalancer/LoadBalancer.php(514): Wikimedia\Rdbms\LoadBalancer->getReaderIndex(string, string)
#10 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/loadbalancer/LoadBalancer.php(939): Wikimedia\Rdbms\LoadBalancer->getConnectionIndex(integer, array, string)
#11 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/database/DBConnRef.php(95): Wikimedia\Rdbms\LoadBalancer->getConnectionInternal(integer, array, string, integer)
#12 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/database/DBConnRef.php(101): Wikimedia\Rdbms\DBConnRef->ensureConnection()
#13 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/database/DBConnRef.php(351): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#14 /srv/mediawiki/php-1.39.0-wmf.26/includes/libs/rdbms/querybuilder/SelectQueryBuilder.php(686): Wikimedia\Rdbms\DBConnRef->selectRow(array, array, array, string, array, array)
#15 /srv/mediawiki/php-1.39.0-wmf.26/includes/user/UserSelectQueryBuilder.php(191): Wikimedia\Rdbms\SelectQueryBuilder->fetchRow()
#16 /srv/mediawiki/php-1.39.0-wmf.26/includes/user/ActorStore.php(237): MediaWiki\User\UserSelectQueryBuilder->fetchUserIdentity()
#17 /srv/mediawiki/php-1.39.0-wmf.26/includes/user/User.php(937): MediaWiki\User\ActorStore->getUserIdentityByName(string, integer)
#18 /srv/mediawiki/php-1.39.0-wmf.26/extensions/CentralAuth/includes/User/CentralAuthUser.php(2165): User::idFromName(string)
#19 /srv/mediawiki/php-1.39.0-wmf.26/extensions/CentralAuth/includes/User/CentralAuthUser.php(2245): MediaWiki\Extension\CentralAuth\User\CentralAuthUser->canAuthenticate()
#20 /srv/mediawiki/php-1.39.0-wmf.26/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php(212): MediaWiki\Extension\CentralAuth\User\CentralAuthUser->authenticateWithToken(string)
#21 /srv/mediawiki/php-1.39.0-wmf.26/includes/session/SessionManager.php(537): CentralAuthSessionProvider->provideSessionInfo(WebRequest)
#22 /srv/mediawiki/php-1.39.0-wmf.26/includes/session/SessionManager.php(243): MediaWiki\Session\SessionManager->getSessionInfoForRequest(WebRequest)
#23 /srv/mediawiki/php-1.39.0-wmf.26/includes/WebRequest.php(843): MediaWiki\Session\SessionManager->getSessionForRequest(WebRequest)
#24 /srv/mediawiki/php-1.39.0-wmf.26/includes/session/SessionManager.php(164): WebRequest->getSession()
#25 /srv/mediawiki/php-1.39.0-wmf.26/includes/Setup.php(448): MediaWiki\Session\SessionManager::getGlobalSession()
#26 /srv/mediawiki/php-1.39.0-wmf.26/includes/WebStart.php(86): require_once(string)
#27 /srv/mediawiki/php-1.39.0-wmf.26/api.php(43): require(string)
#28 /srv/mediawiki/w/api.php(3): require(string)
#29 {main}
Impact

Central authentication might be broken?

Notes

30k such messages since August 29th 13:20 UTC. The earliest were at 8:30, some others during the morning but since 13:20 it is a constant stream. Messages matching PHP Warning: Erroneous data format for unserializing 'Wikimedia\Rdbms\MySQLPrimaryPos'since 7:00 UTC:

mwerrors.png (453×790 px, 21 KB)

From the https://grafana.wikimedia.org/d/000000102/production-logging dashboard:

mwlogs_per_channels.png (435×908 px, 36 KB)

production sal
13:21 	<taavi@deploy1002> 	Synchronized php-1.39.0-wmf.26/extensions/SecurePoll/: T316150 (duration: 03m 44s)
13:14 	<oblivian@deploy1002> 	Synchronized wmf-config/InitialiseSettings.php: Moving 1% of users to php 7.4 (duration: 04m 18s)
13:12 	<vgutierrez> 	Increase roll-out of query-sorting to 75% - T314868

Event Timeline

hashar triaged this task as Unbreak Now! priority.Aug 29 2022, 9:07 PM
hashar created this task.

Does not seem specific to CentralAuth in any way, probably affects all requests with a ChronologyProtector cookie. It's a PHP warning and the sites aren't down so the impact is probably people getting replica connections in situations where they should be getting primary connections, and not seeing their own edits and such.

The immediate cause is rOMWC612de54b6c2b: Move 1% of traffic to php 7.4 presumably.

Search for Erroneous data format for unserializing: https://logstash.wikimedia.org/goto/00f365a405d48837ed009e886ffec4e9
(30K MySQLPrimaryPos, 4 Message)
There's a giant spike around 15h UTC, and a few dozen to few hundred errors per minute after that. All on PHP 7.2 which apparently can't always read back the serialization format used by PHP 7.4.

A reproduction of the serialization issue by @Krinkle and @Zabe: https://3v4l.org/GmmmL

Change 827600 had a related patch set uploaded (by Zabe; author: Zabe):

[operations/mediawiki-config@master] Stop all PHP 7.4 user traffic

https://gerrit.wikimedia.org/r/827600

Change 827600 merged by jenkins-bot:

[operations/mediawiki-config@master] Stop all PHP 7.4 user traffic

https://gerrit.wikimedia.org/r/827600

From @Krinkle on rOMWC3fe1920493e0: Stop all PHP 7.4 user traffic:

I believe this would not drain traffic immediately as the cookie is valid for 7 days, and will only be removed by the JS to inform the 2nd next pageview after the one that removes the cookie. This means there will be a long tail of returning visitors over the next few days that have yet to be re-sampled after this config change. This should probably be via some Puppet mechanism instead.

So still kinda Unbreak Now.

The comment by @Krinkle is incorrect, the traffic gets drained up pretty quickly (see https://logstash.wikimedia.org/goto/db7753562f930806440d6fbd95c7b7ab for the disappearing warning), as it's always historically been, as the cookie is reevalued on every page view.

Having said that, I'd expect this to be treated as an UBN! bug if we want to migrate to php 7.4.

Would it make sense to add the PHP version to the cache key if the object being stored implements Serializable? I guess there could be more than just MySQLPrimaryPos. Maybe we could change the key for PHP 7.4 only, leaving the unversioned key for 7.2. Another approach could be switching to JSON serialization, but I guess it'd take longer and I'm not sure if it's applicable in this specific case.

Would it make sense to add the PHP version to the cache key if the object being stored implements Serializable? I guess there could be more than just MySQLPrimaryPos. Maybe we could change the key for PHP 7.4 only, leaving the unversioned key for 7.2. Another approach could be switching to JSON serialization, but I guess it'd take longer and I'm not sure if it's applicable in this specific case.

Please just don't use PHP serialization. Change it to JSON, per the current policy.

I think I can patch PHP 7.4 so that its serialize() is backwards compatible. Just like so, effectively a partial revert of nikic's implementation patch. A patched PHP binary would be able to read the new format, but would not write it. So we would migrate fully to the patched PHP 7.4, then when we are sure there will be no need to revert back to PHP 7.2, we would revert the patch.

I will test it tomorrow, I don't think this is the kind of UBN worth losing sleep over.

Please just don't use PHP serialization.

That may well be a good long-term solution.

Change 828058 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] ChronologyProtector: Use JSON to store position objects

https://gerrit.wikimedia.org/r/828058

Going by the discussion here and affected version (1.39.0-wmf.26), it seems this is not a train blocker for 1.39.0-wmf.27 despite its UBN! status. Removing as a blocker for now.

Code search for __serialize shows the following implementations in deployed code:

  • ApiMessageTrait
  • Message
  • GenericArrayObject
  • HashRing
  • MapCacheLRU
  • MySQLPrimaryPos
  • Site
  • Various classes in ramsey/uuid
  • RunningStat\PSquare
  • Wikibase\Lib\Changes\{EntityDiffChangedAspects, RepoRevisionIdentifier}
  • Wikibase\DataModel\Snak\{ReferenceList, SnakList}
  • Wikibase\MediaInfo\DataModel\MediaInfoId

If I understand the bug correctly, if any of these classes are serialized with PHP 7.4, they will fail to unserialize on PHP 7.2, so it's not enough to just patch MySQLPrimaryPos.

My PHP patch is tested now. With php72-serialize.php having the code from T316601#8196601:

$ sapi/cli/php /srv/mw/core/junk/php72-serialize.php 
C:4:"Test":27:{a:1:{s:3:"foo";s:4:"Test";}}
Test
Test

This shows my patched serialize() producing PHP 7.2 compatible output but correctly interpreting the new format. That's with the updated patch 7bc2f4d194e2136 at the head of my bc-serialize-7.4 branch. It compiles with -Werror and should pass make test.

We could also just get rid of all the __serialize methods. It's a noop on PHP 7.2 and won't make serialization behavior worse than it already is on 7.4. They could be reintroduced once the cluster is fully upgraded.

If the 1.39 release happens in between, that would make the upgrade path for third parties a bit less painful, too (although I imagine most third parties can afford to just reset all caches).

Code search for __serialize shows the following implementations in deployed code:

  • ApiMessageTrait
  • Message
  • GenericArrayObject
  • HashRing
  • MapCacheLRU
  • MySQLPrimaryPos
  • Site
  • Various classes in ramsey/uuid
  • RunningStat\PSquare
  • Wikibase\Lib\Changes\{EntityDiffChangedAspects, RepoRevisionIdentifier}
  • Wikibase\DataModel\Snak\{ReferenceList, SnakList}
  • Wikibase\MediaInfo\DataModel\MediaInfoId

If I understand the bug correctly, if any of these classes are serialized with PHP 7.4, they will fail to unserialize on PHP 7.2, so it's not enough to just patch MySQLPrimaryPos.

My PHP patch is tested now. With php72-serialize.php having the code from T316601#8196601:

$ sapi/cli/php /srv/mw/core/junk/php72-serialize.php 
C:4:"Test":27:{a:1:{s:3:"foo";s:4:"Test";}}
Test
Test

This shows my patched serialize() producing PHP 7.2 compatible output but correctly interpreting the new format. That's with the updated patch 7bc2f4d194e2136 at the head of my bc-serialize-7.4 branch. It compiles with -Werror and should pass make test.

The patch is small enough, and simple enough, that I feel confident applying it across the fleet. We'll rebuild the packages and deploy them ASAP.

While I appreciate the efforts to get rid of __serialize, I don't think that's the safe approach as far as the transition is concerned.

Please just don't use PHP serialization.

That may well be a good long-term solution.

Maybe it's time to start emitting deprecation warnings when we see non-trivial objects in a BagOStuff... Traversing big structures takes time, but typically the structures aren't that big. Also, we could disable it in prod, and only turn it on in dev settings.

I think I can patch PHP 7.4 so that its serialize() is backwards compatible. Just like so, effectively a partial revert of nikic's implementation patch. A patched PHP binary would be able to read the new format, but would not write it. So we would migrate fully to the patched PHP 7.4, then when we are sure there will be no need to revert back to PHP 7.2, we would revert the patch.

I admire the beauty and easiness of this solution. That is quite nice, thank you Tim!

Whenever the patched packages are available I will rebuild the CI images to use them.

More or less related, the issue remembers me of issues we had in the past with serialization/unserialization of ParserOutput (T264397) which I think was due to the class changing between MediaWiki versions. Would it be possible to add some tests in mediawiki/core to check our patched php works as appropriate, and possibly only running them if phpversion() has the back compatibility suffix?

We could also just get rid of all the __serialize methods. It's a noop on PHP 7.2 and won't make serialization behavior worse than it already is on 7.4. They could be reintroduced once the cluster is fully upgraded.

That sounds good to me – I believe in Wikibase we decided to defer the implementation of the new serialization methods until production was fully on PHP 7.4, to reduce the amount of changes happening at once (so I’m surprised to see a few Wikibase classes in T316601#8201209).

Removing __serialize will cause deprecation warnings in php 8.1 setups. I don't think I like the idea of going this step backward in php 8.1 compatibility.

And this also seems to be the reason why it was introduced in some Wikibase code: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseMediaInfo/+/769402

Removing __serialize will cause deprecation warnings in php 8.1 setups. I don't think I like the idea of going this step backward in php 8.1 compatibility.

MediaWiki doesn’t even support PHP 8.0 yet. I firmly believe we should fix this after we’ve fully moved away from PHP 7.2.

And this also seems to be the reason why it was introduced in some Wikibase code: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseMediaInfo/+/769402

I found my earlier comment – for Wikibase itself, I argued against adding __serialize() back in March, at least for any methods that affect hashes which are supposed to be stable. So I think that WikibaseMediaInfo change is actually fairly catastrophic, and we might need to revert it… I’ll try it out.

So I think that WikibaseMediaInfo change is actually fairly catastrophic, and we might need to revert it… I’ll try it out.

It’s not, because there is no MediaInfo data type, so MediaInfo IDs never show up in snaks. Phew.

(Sorry for getting a bit off-topic here.)

Mentioned in SAL (#wikimedia-operations) [2022-08-31T10:42:18Z] <_joe_> updating php 7.4 on mwdebug1002 to test the new patched packages T316601

Joe changed the task status from Open to In Progress.Aug 31 2022, 10:47 AM
Joe claimed this task.

Just to clarify, this task is about a production issue.

We're going to use @tstarling's patch to make php 7.4 write serialized content in a format php 7.2 can read for the duration of the transition.

I'd ask to please move any further discussion on serialization in mediawiki to a separate, independent task.

Mentioned in SAL (#wikimedia-operations) [2022-08-31T11:00:43Z] <_joe_> updating php 7.4 packages in wikimedia/bustrer T316601

I have rebuild the CI images, docker-registry.wikimedia.org/releng/quibble-buster-php74 has the latest package 1:7.4.30-3+0~20220627.69+debian10~1.gbpf2b381+wmf1+buster3

PHP 7.4.30 (cli) (built: Aug 31 2022 09:45:57) ( NTS )

@hashar the latest package is +buster3.1

I have brought back 1% of the user traffic to php 7.4 now, let's see if errors spike up again, otherwise I'll resolve the task.

@hashar the latest package is +buster3.1

I am not sure I how I got my comment wrong earlier. Rechecking this morning and the CI image indeed has buster3.1:

$ docker run --rm -it --entrypoint=bash docker-registry.discovery.wmnet/releng/quibble-buster-php74:1.4.6
nobody@60752d04aa72:/workspace$ apt-cache policy php7.4-cli
php7.4-cli:
  Installed: 1:7.4.30-3+0~20220627.69+debian10~1.gbpf2b381+wmf1+buster3.1
  Candidate: 1:7.4.30-3+0~20220627.69+debian10~1.gbpf2b381+wmf1+buster3.1
  Version table:

We're back to 1% of traffic and no issue is showing up, I'll resolve the task and move to 5% now.

For information: after I have deployed our patched php7.4 to CI, a Wikibase test started failing. The test asserts against serialize() output based on version_compare( phpversion(), '7.4', '>=' ) ) but since our 7.4 yields the previous serialization format the assertion fails. We talked about it and will mark the test skipped until we switch to the un patched 7.4. T243590#8207874

Change 828058 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Use JSON to store ChronologyProtector position data

https://gerrit.wikimedia.org/r/828058

Change 831563 had a related patch set uploaded (by Ladsgroup; author: Daniel Kinzler):

[mediawiki/core@wmf/1.39.0-wmf.28] rdbms: Use JSON to store ChronologyProtector position data

https://gerrit.wikimedia.org/r/831563

Change 831563 abandoned by Ladsgroup:

[mediawiki/core@wmf/1.39.0-wmf.28] rdbms: Use JSON to store ChronologyProtector position data

Reason:

https://gerrit.wikimedia.org/r/831563

I think I can patch PHP 7.4 so that its serialize() is backwards compatible. Just like so, effectively a partial revert of nikic's implementation patch. A patched PHP binary would be able to read the new format, but would not write it. So we would migrate fully to the patched PHP 7.4, then when we are sure there will be no need to revert back to PHP 7.2, we would revert the patch.

I created a task to track reverting the patch (which we should be able to do soon, I believe): T318918