[go: up one dir, main page]

Page MenuHomePhabricator

audit all SSL certificates expiry on ops tracking gcal
Closed, ResolvedPublic

Description

the wmfusercontent.org certificate expired recently, and it should have been on our tracking calendar.

I've created this task to track auditing the existing certificates and ensuring all are on the calendar.

Event Timeline

RobH claimed this task.
RobH raised the priority of this task from to Needs Triage.
RobH updated the task description. (Show Details)
RobH added a project: acl*sre-team.
RobH subscribed.
RobH renamed this task from audit all ssh certificates expiry on ops tracking gcal to audit all ssl certificates expiry on ops tracking gcal.Sep 14 2015, 4:07 PM
RobH set Security to None.
Krenair renamed this task from audit all ssl certificates expiry on ops tracking gcal to audit all SSL certificates expiry on ops tracking gcal.Sep 14 2015, 4:07 PM
Krenair subscribed.

this is changing scope to a checklist for all ssl certificate purchases and how to review and audit

Since we've had a second cert expire on us unexpectedly now in a span of a few days, I went ahead and audited the expiries on all of the cert files stored in puppet's files/ssl/ directory. Nothing else is coming up imminently. There's a batch that will expire in November, but it's the prod SNI certs we're not currently using and don't currently plan to renew.

Wasn't there an Icinga check that tested that certificates were good for another x days?

We only have that icinga check on the primary unified cert, which covers the production endpoints for:

  • wikipedia.org
  • mediawiki.org
  • wikibooks.org
  • wikidata.org
  • wikimediafoundation.org
  • wikimedia.org
  • wikinews.org
  • wikiquote.org
  • wikisource.org
  • wikiversity.org
  • wikivoyage.org
  • wiktionary.org

... and all of their mobile subdomains and whatnot. It's a pretty verbose check, validates functional SSL for all of the SAN domains, checks the cert expiry, etc.

But we don't have any kind of checking in place for the various other misc certs we own that are deployed for smaller or one-off services, or deployed to third parties (or in some cases, rare today but important later - not deployed at all but still critical). Just looking at puppet's files/ssl/ today, that list is something like:

archiva.wikimedia.org.crt
blog.wikimedia.org.crt
dumps.wikimedia.org.crt
ecc-star.wmfusercontent.org.crt
eventdonations.wikimedia.org.crt
ganglia.wikimedia.org.crt
gerrit.wikimedia.org.crt
icinga.wikimedia.org.crt
labvirt-star.eqiad.wmnet.crt
ldap-codfw.wikimedia.org.crt
ldap-eqiad.wikimedia.org.crt
ldap-mirror.wikimedia.org.crt
librenms.wikimedia.org.crt
lists.wikimedia.org.crt
policy.wikimedia.org.crt
rt.wikimedia.org.crt
star.planet.wikimedia.org.crt
star.wmflabs.org.crt
star.wmfusercontent.org.crt
stream.wikimedia.org.crt
tendril.wikimedia.org.crt
ticket.wikimedia.org.crt
toolserver.org.crt
virt-star.eqiad.wmnet.crt
wikitech.wikimedia.org.crt

Of those, I can see in our icinga config direct expiry checks only for:

lists.wikimedia.org
ticket.wikimedia.org
ldap-codfw.wikimedia.org
ldap-eqiad.wikimedia.org

related but slightly tangent to this, we have also other private material that's bound to expire (e.g. puppet CA, gpg keyrings for apt repos, certs for cassandra server/client auth). I was thinking we could extend the checks directly by reading material from private.git locally and alert accordingly, thoughts?

For the first steps, I've created https://docs.google.com/a/wikimedia.org/spreadsheets/d/1yT5rvoEEUHhNeJAQRVamr8ECqN3TLsMaO8N_At4Ki3I/edit?usp=sharing

This lists off all the one-off certificates listed earlier in this task. I'll be using it to check against the ops tracking calendar for these renewals and adding the ssl renewals as needed to those entries.

google sheet has been updated with the most recent mail purchases and all info has been added to the google calendar for expiry tracking. each entry has a 4 week notification email set to go to both myself and the ssl renewal alias (intentionally not listed in task)

So while this is 99% complete, we should add in checks in icinga for the various hosts, so I'll be changing the overall subject of the task.

Actually, we'll just create a new task and link.