[go: up one dir, main page]

Page MenuHomePhabricator

Abusefilter throttle did not work at dewiki
Closed, ResolvedPublic

Assigned To
Authored By
Luke081515
Dec 17 2019, 2:12 PM
Referenced Files
F31499573: grafik.png
Jan 3 2020, 6:02 PM
F31482893: grafik.png
Dec 19 2019, 6:02 PM
F31482817: grafik.png
Dec 19 2019, 4:31 PM
F31479090: grafik.png
Dec 17 2019, 2:12 PM
Tokens
"Heartbreak" token, awarded by Jules78120.

Description

(Asked before at wikimedia-tech, was redirect to phabricator now)

At dewiki we have this filter (https://de.wikipedia.org/wiki/Spezial:Missbrauchsfilter/290?uselang=en) to enforce arbcom actions. It matches on every edit a user makes on a page and throttles them if they make too much edits a day.

Normally this is working fine, however it did not today:

When I view the testresults for the filter, I get this:

grafik.png (1×2 px, 748 KB)

As you can see, it matched on every edit from the User "Brainswiffer", however he exceeded the limit of 5 Edits per 24h but was not throttled, why?

Event Timeline

Just writing this down for reference as it was annoying to figure out, I think the memcached key in question is dewiki:abusefilter:throttle:290:user,page:ce12284db0992734fd982f4a3ddd1f5f2b5c5408 (aka dewiki:abusefilter:throttle:290:user,page:sha1( "404212:Wikipedia Diskussion:Schiedsgericht/Anfragen/Diskussionskomplexim Umfeld \"Anetta Kahane\"" ) )

To summarize what's been done:

  • The obvious suspect is a rogue cache key.
  • Specifically, a key that may have been evicted before its expiry
  • Another idea was that a key expired before reaching 5, and the new key also didn't reach fine, although they totaled more than 5 edits on Dec 17. However, this doesn't seem to be the case: an edit was prevented at 6:58 (hence the key was recreated at 8:00), but there've been more than 5 edits in the next 24 hours.

To understand exactly what happened, we'd need to see the debug logs (channel:AbuseFilter) related to that key (like it was done at T210709#4811680). Grepping the key posted by @Bawolff above would be the best choice; however, since we don't know whether the key is correct, it could also suffice to grep dewiki:abusefilter:throttle:290:user,page:.

Yesterday, @Reedy and @Bawolff tried to do that, but I had to leave suddenly and I don't know whether there's been any progress.

As a side note, it's vital to investigate soon and quickly, because if the logs get garbage collected, we'll have no way to understand what went wrong.

As a side note, it's vital to investigate soon and quickly, because if the logs get garbage collected, we'll have no way to understand what went wrong.

Logs (of the mwlog1001 variety) last 90 days. That said, even if we do confirm the key went missing, im not sure what anyone would be able to do sbout

As a side note, it's vital to investigate soon and quickly, because if the logs get garbage collected, we'll have no way to understand what went wrong.

Logs (of the mwlog1001 variety) last 90 days. That said, even if we do confirm the key went missing, im not sure what anyone would be able to do sbout

And AFAIK (to state the obvious a bit), cached items aren't guaranteed to still exist upto $expiry are they? They can be evicted for $reasons (as dictated by the config of whatever cache backend we're using) and shouldn't be guaranteed to be still there...

As a side note, it's vital to investigate soon and quickly, because if the logs get garbage collected, we'll have no way to understand what went wrong.

Logs (of the mwlog1001 variety) last 90 days.

Thanks, I thought they had a shorter lifespan.

Logs (of the mwlog1001 variety) last 90 days. That said, even if we do confirm the key went missing, im not sure what anyone would be able to do sbout

And AFAIK (to state the obvious a bit), cached items aren't guaranteed to still exist upto $expiry are they? They can be evicted for $reasons (as dictated by the config of whatever cache backend we're using) and shouldn't be guaranteed to be still there...

Yes, sure. In fact, my main goal here is to make sure that this was just due to memcached evicting the key too soon. Of course, we could make this system more robust, but this task can be closed as soon as we know for sure that nothing is broken in the AF code.

Hello, its "me myself". Filter sometimes "blocks" me but sometimes not. Yesterday 17.12 about 11:25 MEZ it showed "blocking" in browser, but this is not in log of filter. After that I repeated storing and it stored all without message. Today it seems that I work without limit but will not test again. I will rely on filter! There are not special circumstances here despite of hanging on a very fast line :-)
Filter has no data when he decides?

it seems not to be limited to myself, also User Kurator71 has too much edits today.
I often work with 3 machines (IPhone, IPad, PC) and switch during day several times, this may be also important?

I often work with 3 machines (IPhone, IPad, PC) and switch during day several times, this may be also important?

No, this shouldn't make any difference.

Now, I believe this is happening too often. Of course, it's still possible that memcached is evicting keys too quickly, but I wonder whether we could at least put a stopgap in place.

For the short term, we only need to check the debug logs buried on mwlog to see if it's really memcached's fault. For the long term, it depends... T52894 is for making throttle work without caching enabled, but I wonder whether there would be an even better alternative.

Happend again today, 5 edits were allowed, 7 were made, no one was blocked:

grafik.png (1×3 px, 844 KB)

Thanks to @Urbanecm, we dug through the logs. First of all, Brainswiffer's key is dewiki:abusefilter:throttle:290:user,page:4cab7fa217da080aea32c3fb8145f66d0af59989. We've checked all the logs for 2020, and here is the cleaned result:

12020-01-03 05:57:19 Got value 0 for throttle key
22020-01-03 05:57:19 Added throttle key with value 1
3
42020-01-03 07:51:52 Got value 1 for throttle key
52020-01-03 07:51:52 Incremented throttle key
6
72020-01-03 12:00:42 Got value 2 for throttle key
82020-01-03 12:00:42 Incremented throttle key
9
102020-01-03 13:54:00 Got value 0 for throttle key
112020-01-03 13:54:00 Added throttle key with value 1
12
132020-01-03 14:04:52 Got value 1 for throttle key
142020-01-03 14:04:52 Incremented throttle key
15
162020-01-03 14:23:16 Got value 2 for throttle key
172020-01-03 14:23:16 Incremented throttle key
18
192020-01-03 14:43:27 Got value 3 for throttle key
202020-01-03 14:43:27 Incremented throttle key
21
222020-01-04 07:24:08 Got value 0 for throttle key
232020-01-04 07:24:08 Added throttle key with value 1
24
252020-01-04 16:09:09 Got value 0 for throttle key
262020-01-04 16:09:09 Added throttle key with value 1
27
282020-01-05 07:20:48 Got value 0 for throttle key
292020-01-05 07:20:48 Added throttle key with value 1

(derived from P10041).

This tells us that yes, the key is being evicted for some reason. In fact, if you look at the top of the paste, a new key was created at 5:57:19, incremented twice, but at 13:54:00 the key was missing. That's 8 hours later, hence the key was deleted in the meanwhile, before its expiry. The only possible explanation for this is memcached auto-evicting the key, as nothing in the AF code tries to delete those keys.

If this is confirmed, then I don't think we can do much, aside from T52894.

My final words: caused by the eviction of the key. Note, the backend used for throttle keys is Redis and not memcached. I believe there's nothing we can do here. The best outcome would be to find a way to make throttle work without caching (but how?), or open a follow-up task to investigate whether it's possible to increase the persistence, and hence reduce the chance of eviction.

Daimona claimed this task.

Filters with "throttle" enabled will only generate an AbuseLog entries once the throttle limit is reached. For filter 1158 that you linked, it means that edits 1-24 will not generate an AbuseLog entry. This has always been the case, and I'm not sure whether it's intentional (e.g. because until the limit is reached, the filter wasn't really "triggered", or because it could spam logs) or not. The relevant task is T196995.

As for this task, it was almost certainly caused by a caching issue unrelated to AbuseFilter, which is not deterministic and non-trivial to fix, so I'm just going to close it as resolved.

[offtopic] Do note that currently, the filter is set to throttle by username plus IP address. That means users using highly dynamic ranges might never make it to the throttle.

[offtopic] Do note that currently, the filter is set to throttle by username plus IP address. That means users using highly dynamic ranges might never make it to the throttle.

Right, that's also a valid point (I just understood that the question wasn't really about the number of entries). I'm not sure if I can query the DB to see which IPs that editor used, which in turn is needed for checking the throttle key.

[offtopic] Do note that currently, the filter is set to throttle by username plus IP address. That means users using highly dynamic ranges might never make it to the throttle.

Right, that's also a valid point (I just understood that the question wasn't really about the number of entries). I'm not sure if I can query the DB to see which IPs that editor used, which in turn is needed for checking the throttle key.

I did that already, and at least one of the WPWP users do use a lot of IPs (like dozens of them).

Also, now that we're talking about the issue here anyway, I'm linking https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/715339 here as a patch to make user create separate throttle counts for individual IPs, which could be helpful for cases like this one. Of course, that behavior might be intentional -- feel free to abandon if that's so.