Adapt the wdqs data-transfer cookbook to operate with federated subgraphs
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	dcausse
	May 3 2024, 8:23 AM

Description

The current data-transfer cookbook does assume that a single graph is served from all wdqs nodes, this will no longer be the case when the graph will be split.
Most of the script should operate similarly but there are few important configuration bits that might need to vary:

Mainly the kafka topic the updater will consume from, it will vary depending on what subgraph the machine is serving. This information might be available from puppet and could possibly be exposed via some config file readable by the cookbook.
The cookbook should also make sure to not transfer the data of subgraph A into a machine configured to serve the subgraph B.
We should also explore and document a procedure to switch a machine that serves subgraph A to serve subgraph B.

AC:
The wdqs data-transfer cookbook can operate in a federated subgraphs setup

Correctly sets kafka consumer offsets based off of instance (graph) type
In addition to transferring the actual data, also transfers metadata about instance (graph) type. Similarly the data-reload cookbook should set the metadata. We'll use the already extant data-reloaded flag-file for this; previously the mere presence of the file served to tell the wdqs updater that it can perform work, but now we'll also include the instance type in that same file.
cookbook checks that the source and target are hosting the same type of journal (main, scholarly, wcqs, ...)

Details

Other Assignee: Stevemunene

Subject	Repo	Branch	Lines +/-
wdqs.data-transfer: refuse xfer on differing jnl	operations/cookbooks	master	+23 -1
sre.wdqs.data-transfer: fix CI	operations/cookbooks	master	+1 -1
wdqs.data-transfer: add missing self.	operations/cookbooks	master	+1 -1
wdqs.data-transfer: log now mentions bg instance	operations/cookbooks	master	+1 -1
wdqs: store graph type in data_loaded file	operations/cookbooks	master	+3 -3
wdqs: store metadata about graph split type	operations/cookbooks	master	+12 -6
wdqs graph-split: data xfer needs python3-snappy	operations/puppet	production	+1 -0
wdqs: make wdqs-all include graph splits	operations/puppet	production	+1 -1
wdqs restart envoy: support graph split aliases	operations/cookbooks	master	+2 -3
sre.wdqs.data-transfer: new graph split instances	operations/cookbooks	master	+18 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T335067 Epic: Wikidata Query Service stabilization
Open	None	T337013 [Epic] Splitting the graph in WDQS
Open	None	T364363 [Epic] Productionize federated wdqs graph-split endpoints
Resolved	RKemper	T364077 Adapt the wdqs data-transfer cookbook to operate with federated subgraphs

Event Timeline

dcausse created this task.May 3 2024, 8:23 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 3 2024, 8:23 AM

Maintenance_bot added a project: Wikidata.May 3 2024, 8:29 AM

Gehel triaged this task as High priority.May 10 2024, 8:27 AM

Gehel added a parent task: T337013: [Epic] Splitting the graph in WDQS.

Gehel moved this task from Incoming to Current work on the Wikidata-Query-Service board.

Gehel edited projects, added Discovery-Search (Current work); removed Wikidata-Query-Service.

dr0ptp4kt moved this task from Incoming to DPE-SRE on the Discovery-Search (Current work) board.May 13 2024, 3:24 PM

RKemper added a parent task: T364363: [Epic] Productionize federated wdqs graph-split endpoints.Jun 20 2024, 4:03 PM

RKemper claimed this task.Jun 20 2024, 6:48 PM

RKemper added a project: Data-Platform-SRE (2024.06.17 - 2024.07.07).

RKemper moved this task from Backlog to In Progress on the Data-Platform-SRE (2024.06.17 - 2024.07.07) board.

Change #1048060 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/cookbooks@master] sre.wdqs.data-transfer: new graph split instances

https://gerrit.wikimedia.org/r/1048060

gerritbot added a project: Patch-For-Review.Jun 20 2024, 6:52 PM

Change #1048060 merged by Ryan Kemper:

[operations/cookbooks@master] sre.wdqs.data-transfer: new graph split instances

https://gerrit.wikimedia.org/r/1048060

Maintenance_bot removed a project: Patch-For-Review.Jun 24 2024, 2:31 PM

Gehel edited projects, added Data-Platform-SRE (2024.07.08 - 2024.07.28); removed Data-Platform-SRE (2024.06.17 - 2024.07.07).Jul 8 2024, 6:33 PM

Gehel moved this task from Backlog to In Progress on the Data-Platform-SRE (2024.07.08 - 2024.07.28) board.

RKemper moved this task from In Progress to Done on the Data-Platform-SRE (2024.07.08 - 2024.07.28) board.Jul 9 2024, 3:03 PM

@RKemper : can we close this task? Or does it make sense to keep it open for some reason? I think we did some testing by reloading a full graph on our test servers, could you link that in or add a comment about it?

In T364077#9967036, @Gehel wrote:

@RKemper : can we close this task? Or does it make sense to keep it open for some reason? I think we did some testing by reloading a full graph on our test servers, could you link that in or add a comment about it?

The part of the task about setting the appropriate kafka topic / consumer offset is done.

I did just notice the task mentions:

The cookbook should also make sure to not transfer the data of subgraph A into a machine configured to serve the subgraph B.

& that feature isn't yet implemented. I can work on a patch for that and do another data transfer to validate.

Change #1053205 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/cookbooks@master] wdqs: store metadata about graph split type

https://gerrit.wikimedia.org/r/1053205

gerritbot added a project: Patch-For-Review.Jul 10 2024, 6:44 AM

RKemper updated the task description. (Show Details)Jul 10 2024, 6:48 AM

Gehel moved this task from Done to In Progress on the Data-Platform-SRE (2024.07.08 - 2024.07.28) board.Jul 10 2024, 7:40 AM

Change #1053778 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/cookbooks@master] wdqs restart envoy: support graph split aliases

https://gerrit.wikimedia.org/r/1053778

Change #1053778 merged by Ryan Kemper:

[operations/cookbooks@master] wdqs restart envoy: support graph split aliases

https://gerrit.wikimedia.org/r/1053778

bking edited projects, added Data-Platform-SRE (2024.07.29 - 2024.08.16); removed Data-Platform-SRE (2024.07.08 - 2024.07.28).Jul 31 2024, 2:48 PM

bking moved this task from Backlog to In Progress on the Data-Platform-SRE (2024.07.29 - 2024.08.16) board.

Change #1059931 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: make wdqs-all include graph splits

https://gerrit.wikimedia.org/r/1059931

Change #1059931 merged by Ryan Kemper:

[operations/puppet@production] wdqs: make wdqs-all include graph splits

https://gerrit.wikimedia.org/r/1059931

Mentioned in SAL (#wikimedia-operations) [2024-08-09T04:24:59Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T364077, transfer to unpooled host (1022) to test cookbook changes) xfer wikidata from wdqs1012.eqiad.wmnet -> wdqs1022.eqiad.wmnet, repooling source-only afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-09T04:32:24Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T364077, transfer to unpooled host (1022) to test cookbook changes) xfer wikidata from wdqs1012.eqiad.wmnet -> wdqs1022.eqiad.wmnet, repooling source-only afterwards

Change #1063067 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs graph-split: data xfer needs python3-snappy

https://gerrit.wikimedia.org/r/1063067

Change #1063067 merged by Ryan Kemper:

[operations/puppet@production] wdqs graph-split: data xfer needs python3-snappy

https://gerrit.wikimedia.org/r/1063067

Gehel edited projects, added Data-Platform-SRE (2024.08.17 - 2024.09.06); removed Data-Platform-SRE (2024.07.29 - 2024.08.16).Aug 16 2024, 9:44 AM

Gehel moved this task from Backlog - project to In Progress on the Data-Platform-SRE (2024.08.17 - 2024.09.06) board.

New data-transfer cookbook confirmed working properly. Currently it doesn't have the implementation of the "don't let transfers between graph split hosts of different types" logic yet, so that still remains to be added.

Gehel updated the task description. (Show Details)Aug 19 2024, 6:47 PM

Change #947930 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/cookbooks@master] wdqs: store graph type in data_loaded file

https://gerrit.wikimedia.org/r/947930

Change #1053205 merged by jenkins-bot:

[operations/cookbooks@master] wdqs: store metadata about graph split type

https://gerrit.wikimedia.org/r/1053205

Gehel edited projects, added Data-Platform-SRE (2024.09.06 - 2024.09.27); removed Data-Platform-SRE (2024.08.17 - 2024.09.06).Sep 6 2024, 9:48 AM

Gehel moved this task from Backlog - project to In Progress on the Data-Platform-SRE (2024.09.06 - 2024.09.27) board.

Stevemunene updated Other Assignee, added: Stevemunene.Sep 13 2024, 5:52 AM

dr0ptp4kt moved this task from DPE-SRE to In Progress on the Discovery-Search (Current work) board.Sep 16 2024, 3:15 PM

Change #947930 abandoned by Ryan Kemper:

[operations/cookbooks@master] wdqs: store graph type in data_loaded file

Reason:

obsoleted by 1053205

https://gerrit.wikimedia.org/r/947930

Maintenance_bot removed a project: Patch-For-Review.Sep 17 2024, 6:30 PM

Gehel edited projects, added Data-Platform-SRE (2024.09.28 - 2024.10.18); removed Data-Platform-SRE (2024.09.06 - 2024.09.27).Sep 27 2024, 1:00 PM

Gehel moved this task from Backlog - project to In Progress on the Data-Platform-SRE (2024.09.28 - 2024.10.18) board.

dr0ptp4kt moved this task from In Progress to DPE-SRE on the Discovery-Search (Current work) board.Oct 1 2024, 3:10 PM

Change #1077051 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/cookbooks@master] wdqs.data-transfer: log now mentions bg instance

https://gerrit.wikimedia.org/r/1077051

Change #1077051 merged by Ryan Kemper:

[operations/cookbooks@master] wdqs.data-transfer: log now mentions bg instance

https://gerrit.wikimedia.org/r/1077051

Change #1077059 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/cookbooks@master] wdqs.data-transfer: refuse xfer on differing jnl

https://gerrit.wikimedia.org/r/1077059

Mentioned in SAL (#wikimedia-operations) [2024-10-01T15:53:59Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T364077, this test transfer should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1023.eqiad.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-01T15:54:02Z] <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, this test transfer should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1023.eqiad.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-01T15:56:27Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T364077, this test transfer should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-01T15:59:02Z] <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, this test transfer should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards

Change #1077066 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/cookbooks@master] wdqs.data-transfer: add missing self.

https://gerrit.wikimedia.org/r/1077066

Change #1077066 merged by Ryan Kemper:

[operations/cookbooks@master] wdqs.data-transfer: add missing self.

https://gerrit.wikimedia.org/r/1077066

Mentioned in SAL (#wikimedia-operations) [2024-10-01T16:11:25Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T364077, test out new flag) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-01T16:16:05Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T364077, test out new flag) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards

Change #1077112 had a related patch set uploaded (by Volans; author: Volans):

[operations/cookbooks@master] sre.wdqs.data-transfer: fix CI

https://gerrit.wikimedia.org/r/1077112

Change #1077112 merged by Volans:

[operations/cookbooks@master] sre.wdqs.data-transfer: fix CI

https://gerrit.wikimedia.org/r/1077112

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:45:01Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:45:07Z] <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:45:57Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:45:59Z] <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:46:53Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:46:55Z] <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:47:32Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:47:34Z] <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:51:29Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:51:32Z] <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:52:12Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:52:14Z] <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:53:08Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-10-03T15:58:03Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards

RKemper updated the task description. (Show Details)Oct 3 2024, 4:02 PM

Change #1077059 merged by Ryan Kemper:

[operations/cookbooks@master] wdqs.data-transfer: refuse xfer on differing jnl

https://gerrit.wikimedia.org/r/1077059

Maintenance_bot removed a project: Patch-For-Review.Oct 3 2024, 4:31 PM

RKemper moved this task from In Progress to Done on the Data-Platform-SRE (2024.09.28 - 2024.10.18) board.Oct 3 2024, 4:31 PM

Stevemunene closed this task as Resolved.Oct 4 2024, 7:31 AM