Context:
I met with @Kelson from Kiwix in Paris in February 2023 and they discussed moving the MWOffliner/Kiwix systems from hitting the public Wikimedia APIs to Wikimedia Enterprise, which batches in dumps parsoid response. We agreed in principle that this was a good idea and something to explore further.
What Wikimedia Enterprise has that is valuable here:
Dumps of all of the "text-based" language projects available daily that contain the Parsoid HTML (among other things). We have them publicly available here every two weeks. See docs as to projects and namespaces covered by our APIs today.
Next Steps
- I am heading onto leave for the next few weeks and would like to bring in more technical folks from the Wikimedia Enterprise side to help run a true process. I am cc'ing on @HShaikh and @Protsack.stephan from the Wikimedia engineering team. They will follow up.
- @Kelson if you could look through our docs which I think answers the questions in this issue on github of namespaces we offer and document anything else you might need, it could kick off the conversation.
- Use this ticket to communicate (or happy to do a call/etc.) on what might be missing...I am happy to take a look if there are reasonable things we can add to make this work.
Other notes
- Conversation also living here as an issue on the MWOffliner source code