[go: up one dir, main page]

Page MenuHomePhabricator

Temporary accounts: Automatically resolve temporary account names to IP addresses on displaying (auto reveal feature)
Open, Stalled, Needs TriagePublic

Description

Motivation

In certain situations, temporary accounts can be hopping significantly faster than the communities are used to with the IP editors (in the worst scenario, on every edit). In instances of such high-speed account-hopping vandalism, patrollers would find it useful to temporarily enable an "auto-reveal mode" within which MediaWiki interface will automatically resolve temporary accounts into IP addresses where encountered. This enables them to quickly identify IPs that belong to the same range/area and determine the best way to block bad actors.
Having T358852: [Epic] Display temporary account contributions on Special:Contributions for IP addresses and IP ranges implemented together would also make it possible to query for (IP-wise) similar temporary accounts.

Access

Since this is more sensitive than one-by-one reveal, we need to limit this permission to sysops and above.

Product Spec
  • Implement a permission that allows users listed under the Access section above to enable an auto-reveal mode in which all temporary account IPs will be automatically resolved
  • The permission will be temporary. A dropdown will let the user select how long this mode will be active for when they turn it on.
  • UI elements will indicate to the user when they are in this mode. When the permission is about the expire, inform the user about it and let them extend it, if needed.
  • Logging:
    • Log when a user enables the mode.
    • Log each time an IP is revealed (consistent with how we log case-by-case reveals T325658)
    • Log when a user disables the mode or the permission expires
Design

TBD

Event Timeline

Adding @Niharika, who was looking into something similar in T346809: Bulk Reveal IP addresses. (I don't think that idea was exactly the same, since it refers to revealing IP addresses "in one step", rather than in zero steps which this task is suggesting.)

I'm wondering how logging will work.

What we currently log

Currently, we're logging when a user actually views IP addresses, and which temp user they view them for:

image.png (195×924 px, 85 KB)

This would be difficult to maintain if we do this task. If we did log which temp users' IP addresses were shown to users who see them automatically, it would (1) create a lot of logs, including about users that weren't actually being investigated, just because they appeared on a page; and (2) could violate the privacy of the stewards, whose page visits unrelated to patroller work could possibly be reconstructed from the log.

What the policy says

However, the Wikimedia Access to Temporary Account IP Addresses Policy states:

To ensure accountability, a log is kept of which users have access to temporary account IP addresses.

...which doesn't seem to imply that each instance needs to be logged: just when access was enabled/disabled for those users who need to accept a preference (checkuser-temporary-account) and when users become members, or stop being memebers, of groups who have IP reveal enabled automatically (checkuser-temporary-account-no-preference).

An idea that originates from a recent-ish meeting with @Tchanders is introducing a temporary "IP addresses visible" mode, which would ensure all temporary accounts are resolved to an IP, which can be enabled from time to time (for a specific reason). Conceptually, this would be similar to Phabricator's high-security mode, which removes the need to enter a MFA token every time a sensitive action is taken. If this mode exists, it could help with the logging. It would be more similar to T346809 (except it would still be multipage, just not unlimited over time).

Thanks @Urbanecm. We're discussing product work for this with @Niharika and @Madalina.

kostajh renamed this task from Temporary accounts: Automatically resolve temporary account names to IP addresses on displaying to Temporary accounts: Automatically resolve temporary account names to IP addresses on displaying (auto reveal feature).Aug 28 2024, 2:22 PM

From a DBA perspective, are there concerns about the volume of additions to the logging table for this feature?

The feature will add an entry to the logging table whenever a temporary account is seen in various user interfaces (recent changes; watchlist; history pages; etc) when a privileged user has a preference enabled. So e.g. if a user with the ability to auto-reveal IP addresses for temporary accounts is on RecentChanges set to show the last 500 changes, and all 500 changes are made by different temporary accounts, then we would log 500 rows to the logging table indicating that the privileged user revealed those IPs.

Some additional notes:

  • we'd debounce logging the "reveal" if the temporary account's IP was already revealed in the last 24 hours
  • The log entry is unique per privileged user and temporary account, so e.g. if users A, B and C have the right to auto reveal temp accounts, and each of those users visit a page with "temporary account ~2024-1", then there will be three log entries.

So, the theoretical maximum would be # of temporary accounts that haven't been logged in last 24 hours x # of users with ability to auto reveal accounts, which will be a large number on bigger wikis. In practice, my guess is that we might see something in the neighborhood of thousands / tens of thousands of rows added each day, but it's hard to know without seeing what the feature usage looks like. (We could also feature flag this so that if the volume of log entries is overwhelming, we can quickly shut it off.)

Does the above concern DBAs from the point of view of adding too many rows to the logging table?

For the terms of impact on the databases, I think having a realistic estimate would be useful. I think we can deploy the feature and then decide what to do if our measurements turn out to be too big. For example, one possible solution would be to drop any logs after X years or something like that.

OTOH, from auditing point of view, adding a lot of IP reveals defies the point of logging in the first place. But this is a more product issue.

For the terms of impact on the databases, I think having a realistic estimate would be useful. I think we can deploy the feature and then decide what to do if our measurements turn out to be too big. For example, one possible solution would be to drop any logs after X years or something like that.

I'm not sure it's possible to have a realistic estimate until we see how users make use of this feature. We will definitely know more as part of the pilot wiki deployment stages.

OTOH, from auditing point of view, adding a lot of IP reveals defies the point of logging in the first place. But this is a more product issue.

Yeah, this is a tough one. I suspect for auditing, you'd probably want to query an aggregate of all the individual reveals. It's kind of difficult to foresee what misuse of this feature might look like as well. But single log entry per reveal seems safe in terms of preserving the ability to audit whatever pattern of abuse/misuse we might identify later on.