[go: up one dir, main page]

Page MenuHomePhabricator

Decide how to use use StatsFactory inside Parsoid
Open, HighPublic

Description

While working on https://phabricator.wikimedia.org/T354908 we identified that because of the way MW depends on parsoid as an extension there is no easy way to use the Stats lib without having a circular dependency.

  • Parsoid imports Wikimedia\Stats from core
  • Core imports Parsoid

Since it looks like a generic enough library that could be used by other extensions too it might be useful to have it packaged as an external library.

Event Timeline

StatsFactory in particular is the thing we want to use, and it ends up depending on all the metric classes implicitly through its method signatures.

Krinkle renamed this task from Extract mediawiki-core:libs/Stats to a standalone composer library to Use StatsFactory inside Parsoid.Jul 17 2024, 1:36 PM
Krinkle claimed this task.
Krinkle renamed this task from Use StatsFactory inside Parsoid to Decide how to use use StatsFactory inside Parsoid.Jul 17 2024, 1:38 PM
Krinkle triaged this task as High priority.Jul 17 2024, 1:41 PM

Since it looks like a generic enough library that could be used by other extensions too it might be useful to have it packaged as an external library.

Extensions can use core libraries already.

Extract mediawiki-core:libs/Stats to a standalone composer library.

I don't think we're ready to do this just yet. Doing so would increase on-going costs and update work.

Stats can in practice be reduced to one or two void methods that take scalar values only, e.g. increment(metric, labels) and timing(metric, labels). I suggest injecting these as optional callables in Parsoid. Would that work?

That would work as a short-term stop gap, but it will limit the sorts of statistics we can gather as part of KR 5.3.1. So it would really be nice if the library could be split off before our KR 5.3.1 work gets seriously underway. We don't want to be in a position of having to duplicate the entire interface of libs/stats in an ad-hoc way in Parsoid's SiteConfig.

This is basically trading off "on-going costs and update work" between the platform team and the content transform team, and my conversation with Birgit indicated that platform has a bit of spare capacity right now for this (while CTT is severely short-handed at present).

(Parsoid is a library, not an extension; and libraries can't use core. Core isn't even packaged with composer.)

Can you be more specific about what you'd like to be able to capture?

Well, we haven't done the KR 5.3.1 work yet, so I don't know for sure. But currently we are exposing just the two methods that you mention above (counter, timing) as methods of SiteConfig. If we need to use additional capabilities of statslib (totals, sampling, etc) we'll have to keep adding them adhoc to SiteConfig, creating additional delay and technical debt. Everything in libs is "supposed" to be a standalone library; I'm just asking whether that technical debt can be retired sooner rather than later, to avoid incurring additional technical debt on the Parsoid side.

My understanding is that 99% of MediaWiki core components and extensions exclusively use increment(metric, labels) and timing(metric, labels). I can't think off-hand of code where we need something beyond that. Without a concrete example I don't think we should proactively invest in increasing scope and expanding long-term costs when we're likely to not need or make use of it.

If a third method were needed, I think the one-off cost of adding a signature in a class is much preferred organizationally compared to a continuous cost of maintaining, publishing, and supporting a separate library, and the various increase costs and decreased agility that that would introduce. This is why we haven't moved Rdbms, BagOStuff, and the other libs either. This is imho seen as an intentional design and worthwhile trade-off, not as debt.

Can I help audit metrics to see what methods you need? That way, there doesn't have to be a delay later on.

Well, the WE5.3.1 is in active progress, so the set of statistics I want to gather is going to change. But so far I have a patch (see https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1059404/1/includes/parser/Parsoid/ParsoidParser.php) that does the statistics in core right now as a workaround but uses the ::getCounter() method to return a CounterMetric subclass, and then calls setLabel multiple times on that object, then finally calls CounterMetric::incrementBy.

Yes, we /could/ keep adding methods to Parsoid's SiteConfig interface. In this case to do this from Parsoid we'd need to extend https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1053267/8/includes/parser/Parsoid/Config/SiteConfig.php with:

public function incrementCounter( string $name, array $labels, float $amount = 1 ) {
		$component = $this->getStatsPrefix( true );
		$metric = $this->statsFactory->withComponent( $component )->getCounter( $name );
		foreach ( $labels as $labelKey => $labelValue ) {
			$metric->setLabel( $labelKey, $labelValue );
		}
		$metric->incrementBy( $amount );
	}

But it takes two weeks every time we do this because the interfaces need to be added to core first and deployed, and only after that's done can the interface be added to the Parsoid side and deployed.

And then of course, when/if you *do* moved the stats to a library, we need to go back and rewrite all that code again.

I was hoping we could avoid all that by just quickly moving this to a composer package.

Change #1059421 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Parsoid SiteConfig: allow stat counters incremented by arbitrary values

https://gerrit.wikimedia.org/r/1059421

Change #1059421 merged by jenkins-bot:

[mediawiki/core@master] Parsoid SiteConfig: allow stat counters incremented by arbitrary values

https://gerrit.wikimedia.org/r/1059421