Tags: wikimedia/ws-export
Tags
Check for imageinfo rather than missing or invalid Because shared-repo images are reported as missing, do not check for that status. And to make things simpler, use the same handling for invalid ones as well, and just check to see if any imageinfo metadata has been returned (that's all we're interested in, after all). This is a follow-up to GH #512. Bug: T372956
Return early for cover images that are not found BookProvider::getCover() was already returning for no pages returned, so this switches to API formatversion 2 and also checks for the `missing` or `invalid` parameter in the first/only returned page. Bug: T370257
Handle unqualified image src attrs Add the domain name to image src values that don't have one, to support WikiHiero. Also move the protocol-relative handling out of the PageParser so that it can go along with the new domain name adding in one place. Bug: T354242
Throw exception for rejected page requests (#511) This returns to the previous behaviour of throwing a WsExportException when a wiki page is not found. Also changes the error message to call it a 'page' rather than a 'book', becuase it could apply to a subpage or the root-level page, and it isn't referring to the whole book. Bug: T280329
Limit the concurrency when fetching pages For a production test case, BookProvider::getPages() was opening 3492 concurrent connections to the Wikimedia servers. That exceeds the default maxmimum file descriptor limit, and there is a risk that the tool will be blocked by WMF sysadmins. So, limit the concurrency to 10 connections.
PreviousNext