spamassassin broken for VRTS
Open, HighPublicBUG REPORT
Actions

Assigned To

None

Authored By

	Krd
	Wed, Nov 20, 5:53 PM

Description

VRTS appears to be affected by this: https://lwn.net/Articles/987566/

It currently heavily annoys all VRTS users and takes VRTS admins hours (sic) of manual sorting each day, so please unbreak this as soon as at all possible.

SpamAssassin bug: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8278

Details

	Subject	Repo	Branch	Lines +/-
	vrts: Disable more VALIDITY RBL checks from Spamassassin	operations/puppet	production	+14 -12
	vrts: Block bondedsender RBL check from spamassassin on vrts	operations/puppet	production	+20 -9

Customize query in gerrit

Event Timeline

Krd created this task.Wed, Nov 20, 5:53 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptWed, Nov 20, 5:53 PM

Reedy added a project: Mail.Wed, Nov 20, 7:54 PM

Restricted Application added a project: Infrastructure-Foundations. · View Herald TranscriptWed, Nov 20, 7:54 PM

Adding a couple of people that are better situated than yours truly to look into this, as well as a bit more information.

Looking into vrts1003 logs I see the following

Nov 20 20:29:06 vrts1003 spamd[611534]: spamd: clean message (-2.7/3.5) for nonexistent:65534 in 0.4 seconds, 82866 bytes.
Nov 20 20:29:05 vrts1003 spamd[611534]: spamd: checking message for nonexistent:65534
Nov 20 20:29:05 vrts1003 spamd[611534]: spamd: still running as root: user not specified with -u, not found, or set to root, falling back to nobody
Nov 20 20:29:05 vrts1003 spamd[611534]: spamd: handle_user (getpwnam) unable to find user: 'nonexistent'

This probably not causing an issue but should be fixed.

However I also see this pattern

akosiaris@vrts1003:/etc/spamassassin$ journalctl --since=-1w -u spamd.service | awk '/VALIDITY/ {print $8}' | sort | uniq -c | sort -rn
   4778 .
   1295 Y
      1 after
akosiaris@vrts1003:/etc/spamassassin$ journalctl --since=-2w --until=-1w -u spamd.service | awk '/VALIDITY/ {print $8}' | sort | uniq -c | sort -rn
   1720 .
    602 Y
akosiaris@vrts1003:/etc/spamassassin$ journalctl --since=-3w --until=-2w -u spamd.service | awk '/VALIDITY/ {print $8}' | sort | uniq -c | sort -rn
    204 Y
     35 .
akosiaris@vrts1003:/etc/spamassassin$ journalctl --since=-4w --until=-3w -u spamd.service | awk '/VALIDITY/ {print $8}' | sort | uniq -c | sort -rn
    207 Y
     41 .
akosiaris@vrts1003:/etc/spamassassin$ journalctl --since=-5w --until=-4w -u spamd.service | awk '/VALIDITY/ {print $8}' | sort | uniq -c | sort -rn
    343 Y
     52 .
      2 after
akosiaris@vrts1003:/etc/spamassassin$ journalctl --since=-6w --until=-5w -u spamd.service | awk '/VALIDITY/ {print $8}' | sort | uniq -c | sort -rn
    234 Y
     66 .

So in the last 2 weeks apparently the overall amount of email that triggered the validity related rules increased considerably, and so did the ratio of email that was let through. This consistent with what @Krd is reporting and could be explain by what Jonathan Corbet outlined in that linked lwn article.

Unfortunately, we don't appear to have vrts hosts under https://grafana.wikimedia.org/d/000000451/mail?orgId=1&refresh=5m to also visually verify this. We probably want another sub task to figure out what's needed to add it.

LSobanski added a project: collaboration-services.Thu, Nov 21, 8:06 AM

LSobanski updated the task description. (Show Details)Thu, Nov 21, 10:36 AM

Looking at https://knowledge.validity.com/s/articles/Accessing-Validity-reputation-data-through-DNS?language=en_US we could fix this by signing up on my.validity.com and registering the IPs of the VRTS nodes. This seems like the cleanest interim fix to unbreak this,

Following that we could assess whether this spamassassin rule by itself is useful enough to keep an external dependency to the VRTS service.

Change #1093905 had a related patch set uploaded (by EoghanGaffney; author: EoghanGaffney):

[operations/puppet@production] vrts: Block bondedsender RBL check from spamassassin on vrts

https://gerrit.wikimedia.org/r/1093905

gerritbot added a project: Patch-For-Review.Thu, Nov 21, 12:51 PM

In T380396#10343202, @MoritzMuehlenhoff wrote:

Looking at https://knowledge.validity.com/s/articles/Accessing-Validity-reputation-data-through-DNS?language=en_US we could fix this by signing up on my.validity.com and registering the IPs of the VRTS nodes. This seems like the cleanest interim fix to unbreak this,

Following that we could assess whether this spamassassin rule by itself is useful enough to keep an external dependency to the VRTS service.

Oh, that's nice. We should also look more into logs to see if this is indeed what is happening. A quick look yesterday didn't reveal something, but I probably missed something.

VRTS nodes btw have private addresses, we 'll probably need to use the proxies (I think they use the web proxies and not the url downloaders, but that's an implementation detail)

Change #1093905 merged by EoghanGaffney:

[operations/puppet@production] vrts: Block bondedsender RBL check from spamassassin on vrts

https://gerrit.wikimedia.org/r/1093905

Maintenance_bot removed a project: Patch-For-Review.Thu, Nov 21, 7:30 PM

I've put in a change to disable this specific check, we're also going to look at whether we can sign up for an account with them to get a higher usage limit.

@Krd I'd be interested to hear if the amount of spam going through decreases over the next day or so!

LSobanski triaged this task as High priority.Fri, Nov 22, 8:29 AM

LSobanski moved this task from Incoming to Work in Progress on the collaboration-services board.

I have the impression that it has changed something, but I still see a lot of RCVD_IN_VALIDITY_SAFE hits, example: https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom;TicketID=13409476

Change #1094488 had a related patch set uploaded (by EoghanGaffney; author: EoghanGaffney):

[operations/puppet@production] vrts: Disable more VALIDITY RBL checks from Spamassassin

https://gerrit.wikimedia.org/r/1094488

gerritbot added a project: Patch-For-Review.Fri, Nov 22, 4:58 PM

I see what's going on. There are three checks provided by VALIDITY configured by default. RCVD_IN_VALIDITY_CERTIFIED, RCVD_IN_VALIDITY_SAFE, and RCVD_IN_VALIDITY_RPBL. We disabled the first one but not the other two, which is why we saw a small but measurable decrease in false clean messages, and why we're still seeing the other VALIDITY checks in spam messages. These can be seen configured here. I also checked that they were responding to all queries with the same exceeded error message.

I've put up the change above to disable the other two and am merging it now.

Change #1094488 merged by EoghanGaffney: