Milestone for Data Platform SRE work
Details
Sep 16 2024
All airflow instances have been restarted and are now using the new deb package, with the types-retry dependency made available.
Change #1073173 merged by Brouberol:
[operations/puppet@production] Install airflow-dags 2.9.3-py3.10-20240916 by default on all instances
Change #1073172 merged by Brouberol:
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-analytics
Change #1073171 merged by Brouberol:
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-wmde
Change #1073170 merged by Brouberol:
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-search
Change #1073169 merged by Brouberol:
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-research
Change #1073168 merged by Brouberol:
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-platform-eng
Change #1073167 merged by Brouberol:
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-analytics-product
root@apt1002:~# wget https://gitlab.wikimedia.org/api/v4/projects/93/packages/generic/airflow/2.9.3/airflow-2.9.3_amd64.deb --2024-09-16 10:47:06-- https://gitlab.wikimedia.org/api/v4/projects/93/packages/generic/airflow/2.9.3/airflow-2.9.3_amd64.deb Resolving gitlab.wikimedia.org (gitlab.wikimedia.org)... 2620:0:860:1:208:80:153:8, 208.80.153.8 Connecting to gitlab.wikimedia.org (gitlab.wikimedia.org)|2620:0:860:1:208:80:153:8|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 579378076 (553M) [application/octet-stream] Saving to: ‘airflow-2.9.3_amd64.deb’
Change #1073166 merged by Brouberol:
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-analytics-test
Change #1073173 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] Install airflow-dags 2.9.3-py3.10-20240916 by default on all instances
Change #1073172 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-analytics
Change #1073171 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-wmde
Change #1073170 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-search
Change #1073169 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-research
Change #1073168 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-platform-eng
Change #1073167 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-analytics-product
Change #1073166 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] Upgrade the airflow-dags deb in airflow-analytics-test
Everything seems to be working fine
brouberol@an-test-client1002:~$ wget https://gitlab.wikimedia.org/api/v4/projects/93/packages/generic/airflow/2.9.3/airflow-2.9.3_amd64.deb --2024-09-16 08:48:53-- https://gitlab.wikimedia.org/api/v4/projects/93/packages/generic/airflow/2.9.3/airflow-2.9.3_amd64.deb Resolving gitlab.wikimedia.org (gitlab.wikimedia.org)... 2620:0:860:1:208:80:153:8, 208.80.153.8 Connecting to gitlab.wikimedia.org (gitlab.wikimedia.org)|2620:0:860:1:208:80:153:8|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 579378076 (553M) [application/octet-stream] Saving to: ‘airflow-2.9.3_amd64.deb’
A new airflow-dags including the library has been published: https://gitlab.wikimedia.org/api/v4/projects/93/packages/generic/airflow/2.9.3/airflow-2.9.3_amd64.deb
Sep 10 2024
Sep 9 2024
Aug 16 2024
@CDanis : I'm closing this as it seems completed from our side. Please re-open and ping me on Slack if you need more from us.
Aug 15 2024
Mentioned in SAL (#wikimedia-operations) [2024-08-15T22:42:57Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards
Mentioned in SAL (#wikimedia-operations) [2024-08-15T21:54:27Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards
Mentioned in SAL (#wikimedia-operations) [2024-08-15T21:53:55Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards
Mentioned in SAL (#wikimedia-operations) [2024-08-15T21:53:15Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards
Change #1063067 merged by Ryan Kemper:
[operations/puppet@production] wdqs graph-split: data xfer needs python3-snappy
Change #1063067 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):
[operations/puppet@production] wdqs graph-split: data xfer needs python3-snappy
As Wikipedia suffered another incident yesterday at around the same time (2100 UTC) , but WDQS did not fall over, it seems these events are probably unrelated. We should still investigate what happened with WDQS, but it doesn't look like the larger incident was related. I'll update this task to reflect this.
Change #1062048 merged by Stevemunene:
[operations/dns@master] dns: provision airflow-test-k8s temp domain
Change #1062048 merged by Stevemunene:
[operations/dns@master] dns: provision airflow-test-k8s temp domain
That got me a step further! New error after I add my passphrase is