[go: up one dir, main page]

Page MenuHomePhabricator

Data-Platform-SRE (2024.07.29 - 2024.08.16)Milestone
ArchivedPublic

Members (6)

Watchers

  • This project does not have any watchers.
  • View All

Details

Description

Milestone for Data Platform SRE work

Recent Activity

Sep 16 2024

fnegri closed T365424: Upgrade clouddb* hosts to Bookworm, a subtask of T368518: decommission clouddb1021, as Resolved.
Sep 16 2024, 4:09 PM · SRE, DC-Ops, ops-eqiad, Data-Platform-SRE (2024.07.29 - 2024.08.16), decommission-hardware
brouberol closed T374396: Retry package added needs the types-retry 0.9.9.4 typing stub as Resolved.

All airflow instances have been restarted and are now using the new deb package, with the types-retry dependency made available.

Sep 16 2024, 1:33 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
brouberol closed T374396: Retry package added needs the types-retry 0.9.9.4 typing stub, a subtask of T372279: Add Retry package to airflow conda environment, as Resolved.
Sep 16 2024, 1:30 PM · Data-Platform-SRE (2024.07.29 - 2024.08.16), Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073173 merged by Brouberol:

[operations/puppet@production] Install airflow-dags 2.9.3-py3.10-20240916 by default on all instances

https://gerrit.wikimedia.org/r/1073173

Sep 16 2024, 1:25 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073172 merged by Brouberol:

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-analytics

https://gerrit.wikimedia.org/r/1073172

Sep 16 2024, 1:13 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073171 merged by Brouberol:

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-wmde

https://gerrit.wikimedia.org/r/1073171

Sep 16 2024, 1:04 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073170 merged by Brouberol:

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-search

https://gerrit.wikimedia.org/r/1073170

Sep 16 2024, 12:58 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073169 merged by Brouberol:

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-research

https://gerrit.wikimedia.org/r/1073169

Sep 16 2024, 12:50 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073168 merged by Brouberol:

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-platform-eng

https://gerrit.wikimedia.org/r/1073168

Sep 16 2024, 12:40 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073167 merged by Brouberol:

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-analytics-product

https://gerrit.wikimedia.org/r/1073167

Sep 16 2024, 12:30 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
brouberol added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.
root@apt1002:~# wget https://gitlab.wikimedia.org/api/v4/projects/93/packages/generic/airflow/2.9.3/airflow-2.9.3_amd64.deb
--2024-09-16 10:47:06--  https://gitlab.wikimedia.org/api/v4/projects/93/packages/generic/airflow/2.9.3/airflow-2.9.3_amd64.deb
Resolving gitlab.wikimedia.org (gitlab.wikimedia.org)... 2620:0:860:1:208:80:153:8, 208.80.153.8
Connecting to gitlab.wikimedia.org (gitlab.wikimedia.org)|2620:0:860:1:208:80:153:8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 579378076 (553M) [application/octet-stream]
Saving to: ‘airflow-2.9.3_amd64.deb’
Sep 16 2024, 10:51 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073166 merged by Brouberol:

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-analytics-test

https://gerrit.wikimedia.org/r/1073166

Sep 16 2024, 10:36 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073173 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Install airflow-dags 2.9.3-py3.10-20240916 by default on all instances

https://gerrit.wikimedia.org/r/1073173

Sep 16 2024, 10:22 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073172 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-analytics

https://gerrit.wikimedia.org/r/1073172

Sep 16 2024, 10:22 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073171 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-wmde

https://gerrit.wikimedia.org/r/1073171

Sep 16 2024, 10:22 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073170 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-search

https://gerrit.wikimedia.org/r/1073170

Sep 16 2024, 10:22 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073169 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-research

https://gerrit.wikimedia.org/r/1073169

Sep 16 2024, 10:22 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073168 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-platform-eng

https://gerrit.wikimedia.org/r/1073168

Sep 16 2024, 10:22 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073167 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-analytics-product

https://gerrit.wikimedia.org/r/1073167

Sep 16 2024, 10:22 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
gerritbot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Change #1073166 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Upgrade the airflow-dags deb in airflow-analytics-test

https://gerrit.wikimedia.org/r/1073166

Sep 16 2024, 10:22 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
brouberol added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

Everything seems to be working fine

Screenshot 2024-09-16 at 11.05.42.png (1×1 px, 217 KB)

Sep 16 2024, 9:08 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
brouberol added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.
brouberol@an-test-client1002:~$ wget https://gitlab.wikimedia.org/api/v4/projects/93/packages/generic/airflow/2.9.3/airflow-2.9.3_amd64.deb
--2024-09-16 08:48:53--  https://gitlab.wikimedia.org/api/v4/projects/93/packages/generic/airflow/2.9.3/airflow-2.9.3_amd64.deb
Resolving gitlab.wikimedia.org (gitlab.wikimedia.org)... 2620:0:860:1:208:80:153:8, 208.80.153.8
Connecting to gitlab.wikimedia.org (gitlab.wikimedia.org)|2620:0:860:1:208:80:153:8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 579378076 (553M) [application/octet-stream]
Saving to: ‘airflow-2.9.3_amd64.deb’
Sep 16 2024, 8:57 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
brouberol added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

A new airflow-dags including the library has been published: https://gitlab.wikimedia.org/api/v4/projects/93/packages/generic/airflow/2.9.3/airflow-2.9.3_amd64.deb

Sep 16 2024, 8:51 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
CodeReviewBot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

brouberol merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/828

Sep 16 2024, 8:20 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)
CodeReviewBot added a project to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub: Patch-For-Review.

brouberol opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/828

Sep 16 2024, 7:02 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)

Sep 10 2024

CodeReviewBot added a comment to T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.

brouberol updated https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/823

Sep 10 2024, 1:20 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)

Sep 9 2024

Snwachukwu created T374396: Retry package added needs the types-retry 0.9.9.4 typing stub.
Sep 9 2024, 8:34 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Data-Engineering (Q1 2024 July 1st - September 30th)

Aug 16 2024

Gehel edited parent tasks for T368757: Create a git-sync container image to be used with airflow, added: T368033: Design a suitable DAG deployment method; removed: T364387: Adapt Airflow auth and DAG deployment method.
Aug 16 2024, 3:04 PM · Data-Platform-SRE (2024.07.29 - 2024.08.16)
Gehel archived Data-Platform-SRE (2024.07.29 - 2024.08.16).
Aug 16 2024, 9:40 AM
BTullis moved T370203: Install Matomo Custom Reports Plugin for wikimediafoundation.org from In Progress to Needs Review on the Data-Platform-SRE (2024.07.29 - 2024.08.16) board.
Aug 16 2024, 9:31 AM · Software-Licensing, Data-Platform-SRE (2024.08.17 - 2024.09.06)
Gehel closed T371124: request for new matomo site: trace.wikimedia.org/ as Resolved.

@CDanis : I'm closing this as it seems completed from our side. Please re-open and ping me on Slack if you need more from us.

Aug 16 2024, 9:23 AM · Data-Platform-SRE (2024.07.29 - 2024.08.16), Data-Engineering
Gehel created T372620: Report on the initial growthbook installation PoC.
Aug 16 2024, 8:59 AM · Data-Platform-SRE (2024.08.17 - 2024.09.06), Data Products

Aug 15 2024

Stashbot added a comment to T370754: Import WDQS subgraphs to production nodes.

Mentioned in SAL (#wikimedia-operations) [2024-08-15T22:42:57Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards

Aug 15 2024, 10:43 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Discovery-Search (Current work), Wikidata
bking renamed T371061: Update CirrusSearch dashboards to use new metrics/refresh dashboards from Update CirrusSearch dashboards to use new metrics to Update CirrusSearch dashboards to use new metrics/refresh dashboards.
Aug 15 2024, 10:07 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), Discovery-Search, CirrusSearch
bking updated the task description for T372442: Investigate why 70% of WDQS EQIAD hosts became unresponsive within a few minutes of each other .
Aug 15 2024, 10:03 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Wikidata, Wikidata-Query-Service
bking claimed T372442: Investigate why 70% of WDQS EQIAD hosts became unresponsive within a few minutes of each other .
Aug 15 2024, 10:02 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Wikidata, Wikidata-Query-Service
bking renamed T372442: Investigate why 70% of WDQS EQIAD hosts became unresponsive within a few minutes of each other from Determine if WDQS was affected by wikipedia editing outage/consider protections in similiar future scenarios to Investigate why 70% of WDQS EQIAD hosts became unresponsive within a few minutes of each other .
Aug 15 2024, 10:02 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Wikidata, Wikidata-Query-Service
Stashbot added a comment to T370754: Import WDQS subgraphs to production nodes.

Mentioned in SAL (#wikimedia-operations) [2024-08-15T21:54:27Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards

Aug 15 2024, 9:54 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Discovery-Search (Current work), Wikidata
Stashbot added a comment to T370754: Import WDQS subgraphs to production nodes.

Mentioned in SAL (#wikimedia-operations) [2024-08-15T21:53:55Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards

Aug 15 2024, 9:54 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Discovery-Search (Current work), Wikidata
Stashbot added a comment to T370754: Import WDQS subgraphs to production nodes.

Mentioned in SAL (#wikimedia-operations) [2024-08-15T21:53:15Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards

Aug 15 2024, 9:53 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Discovery-Search (Current work), Wikidata
gerritbot added a comment to T364077: Adapt the wdqs data-transfer cookbook to operate with federated subgraphs.

Change #1063067 merged by Ryan Kemper:

[operations/puppet@production] wdqs graph-split: data xfer needs python3-snappy

https://gerrit.wikimedia.org/r/1063067

Aug 15 2024, 9:48 PM · Data-Platform-SRE (2024.09.28 - 2024.10.18), Discovery-Search (Current work), Wikidata
gerritbot added a comment to T364077: Adapt the wdqs data-transfer cookbook to operate with federated subgraphs.

Change #1063067 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs graph-split: data xfer needs python3-snappy

https://gerrit.wikimedia.org/r/1063067

Aug 15 2024, 9:42 PM · Data-Platform-SRE (2024.09.28 - 2024.10.18), Discovery-Search (Current work), Wikidata
bking added a comment to T372442: Investigate why 70% of WDQS EQIAD hosts became unresponsive within a few minutes of each other .

As Wikipedia suffered another incident yesterday at around the same time (2100 UTC) , but WDQS did not fall over, it seems these events are probably unrelated. We should still investigate what happened with WDQS, but it doesn't look like the larger incident was related. I'll update this task to reflect this.

Aug 15 2024, 9:02 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Wikidata, Wikidata-Query-Service
Stevemunene claimed T369582: Enable prometheus metrics on the cephosd cluster.
Aug 15 2024, 11:12 AM · Ceph, Data-Platform-SRE (2024.09.28 - 2024.10.18)
Maintenance_bot removed a project from T368760: Configure airflow webserver under Kubernetes to use OIDC authentication: Patch-For-Review.
Aug 15 2024, 10:30 AM · Data-Platform-SRE (2024.08.17 - 2024.09.06)
Maintenance_bot removed a project from T371208: Create remaining airflow DNS records to be used for access and oidc: Patch-For-Review.
Aug 15 2024, 10:30 AM · Data-Platform-SRE (2024.09.28 - 2024.10.18)
gerritbot added a comment to T371208: Create remaining airflow DNS records to be used for access and oidc.

Change #1062048 merged by Stevemunene:

[operations/dns@master] dns: provision airflow-test-k8s temp domain

https://gerrit.wikimedia.org/r/1062048

Aug 15 2024, 9:35 AM · Data-Platform-SRE (2024.09.28 - 2024.10.18)
gerritbot added a comment to T368760: Configure airflow webserver under Kubernetes to use OIDC authentication.

Change #1062048 merged by Stevemunene:

[operations/dns@master] dns: provision airflow-test-k8s temp domain

https://gerrit.wikimedia.org/r/1062048

Aug 15 2024, 9:35 AM · Data-Platform-SRE (2024.08.17 - 2024.09.06)
Ifrahkhanyaree_WMDE added a comment to T371894: Requesting Kerberos access for ifrahkhanyaree.

That got me a step further! New error after I add my passphrase is

Aug 15 2024, 8:22 AM · Patch-For-Review, Data-Platform-SRE (2024.08.17 - 2024.09.06), Data-Engineering
Stevemunene claimed T369583: Configure availability and health monitoring for the cephosd cluster.
Aug 15 2024, 8:21 AM · Data-Platform-SRE (2024.09.28 - 2024.10.18)