In our work for the Add Link structured task project (https://wikitech.wikimedia.org/wiki/Add_Link), we will be moving DB tables from the staging database on the stats1008 server to a production MySQL instance on the misc cluster.
We need help with what process to use in moving the data from the stats1008 machine to that production instance.
- The https://gerrit.wikimedia.org/r/plugins/gitiles/research/mwaddlink/ contains a script for generating datasets (run-pipeline.sh). The script creates SQLite files, imports those SQLite files as MySQL DB tables on the staging database on stats1008, and generates checksums for both sets of files. The files are then copied over to a location on stats1008 which results in their publishing to datasets.wikimedia.org
- The refreshLinkRecommendations.php script which runs hourly on mwmaint should query datasets.wikimedia.org, compare the latest checksums there, and if they do not match what is stored in the local wiki database, it should download the MySQL tables and import them into the addlink database on the misc cluster. It should then proceed with the rest of the refreshLinkRecommendations.php work of refreshing the data in the local wiki cache tables. This is tracked in T272419.