What:
CentralAuth is the system in charge of creating MediaWiki accounts on the (public) Wikimedia Foundation wikis. Users can create an account, log in, and have that same account across the different language editons of Wikipedia, as well as on sister projects such as Wikidata, Commons, Meta-Wiki, Wiktionary, mediawiki.org etc.
Its internal auto-login mechanism is essential to cross-wiki tooling, such as:
- viewing an article in a different language (thus technically being on a different wiki) but remain logged-in there and thus able to receive notifications and make contributions.
- making contributions to Wikidata without leaving the Wikipedia article.
- uploading files to Commons from within VisualEditor and MobileFrontend.
- following links to feedback forms on meta.wikimedia.org or mediawiki.org and still being logged-in.
- board votes, steward elections etc, via vote.wikimedia.org (SecurePoll).
- etc.
Motivation:
This system has effectively been without a manager or tech lead for about ten years. Issues are piling up. This has comes with numerous kinds of costs and risks:
- Regularly at high risk of breakage. Its code is among the oldest and least maintained. It is at constant risk of breakage due to being very different from our current principles. E.g. if you change something in core or far away from it in an extension, there's a good chance that if no reasable extension could be affected by your change, CentralAuth will be. This leads to train blockers, which tend to take days to be seen by anyone given no team (officially) looks after CentralAuth.
- Death by thousand cuts affecting diversity, accessibility and performance.
- The world around us in consistent flux. This means without regular upkeep, something that may've once been considered accessible and localised, won't be for long. Devices change. Expectations change. And even without that external churn, changes do inevitably get made to CentralAuth to unbreak other things, and those changes are not done with the review or consultation of design/accessibility/language resources. For example T237765 has been open and re-re-re-re-reported by users many times. This means we are excluding contributors before they even begin.
- Block progress elsewhere. Best practices are not followed. Plans for holistic improvements and upkeep go nowhere. Small regressions add up and are not acted on. Given CentralAuth is involved in virtually all page views, edits and other backend requests, its suboptimal performance affects all backend requests for other features as well. It also makes it difficult or impossible to improve our performance budgets due to CentralAuth violating them by default. Examples:
- T231961: DBPerformance warning: "Expectation masterConns <= 0 not met" from CentralAuth special pages
- T150506: Avoid lazyImportLocalNames() master writes on GET requests (Run a script to backfill them once for all)
- T106276: Certain Memcached keys are retrieved multiple times for a single page view
- T68828: CentralAuth: Audit autologin procedure for performance and code quality
- T130935: CentralAutoLogin delays fully load time
- Negligence. For several months users have noticed, mentioned and reported warnings in the Chrome browser console about login.wikimedia.org cookies. Nobody picked up on this because it isn't anyone's job to. Now we have T252236 and little time left to figure out what to do.
Are there known security issues: Yes. I think the very definition of a Wikimedia-related security issue is almost synonymous with CentralAuth. It's what decides who's who and what they can do. It's what in charge of founder rights (e.g. Jimbo), stewards, WMF staff, etc.
The full list of on-going and recent security issues is too long to mention, but just a few then.
- Risk of breach: T112359, T197150, …
- Risk of leaking private data or shocking data: T112320, T226212, T201568, …
- Vulnerability to faccilitate attacks against WMF elsewhere: T244682, …
- Medium/Low?: T234371, T237274, …
Production outages or incidents: Yes. The most recent one I could find is T226840, where most articles could not be viewed by some logged-in users due to an HTTP header bug that would break the user's login session.
Does it have sufficient hardware resources for now and the near future (to take into account expected usage growth)?
TBD
Is it a frequent cause of monitoring alerts that need action, and are they addressed timely and appropriately?
Yes, it is regularly involved in prod errors that raise the alert levels. To my knowledge these are not currently reported, investigated or addressed (aside from CPT/RelEng during train deployments).
When it was first deployed to Wikimedia production
Usage statistics based on audience(s) served
TBD
Changes committed in last 1, 3, 6, and 12 months
TBD
Number of developers who committed code in the last 1, 3, 6, and 12 months
TBD
Number and age of open patches
TBD
Number and age of open bugs
TBD
Number of known dependencies?
TBD
Is there a replacement/alternative for the feature? Is there a plan for a replacement? No.
Submitter's recommendation (what do you propose be done?)
For an existing team to take ownership.