The current data-transfer cookbook does assume that a single graph is served from all wdqs nodes, this will no longer be the case when the graph will be split.
Most of the script should operate similarly but there are few important configuration bits that might need to vary:
Mainly the kafka topic the updater will consume from, it will vary depending on what subgraph the machine is serving. This information might be available from puppet and could possibly be exposed via some config file readable by the cookbook.
The cookbook should also make sure to not transfer the data of subgraph A into a machine configured to serve the subgraph B.
We should also explore and document a procedure to switch a machine that serves subgraph A to serve subgraph B.
AC:
The wdqs data-transfer cookbook can operate in a federated subgraphs setup
- Correctly sets kafka consumer offsets based off of instance (graph) type
- In addition to transferring the actual data, also transfers metadata about instance (graph) type. Similarly the data-reload cookbook should set the metadata. We'll use the already extant data-reloaded flag-file for this; previously the mere presence of the file served to tell the wdqs updater that it can perform work, but now we'll also include the instance type in that same file.
- cookbook checks that the source and target are hosting the same type of journal (main, scholarly, wcqs, ...)