Mediawiki is the wiki engine behind Wikipedia, all Wikimedia projects and thousands other Web sites. It's a cutting edge free software providing high featured web sites that anybody can edit. Mediawiki hosted content can be made available for offline usage through the Collection extension (written in PHP). The Collection extension allows to easily create collection/selection of articles: so called books ; here is how it works on the Wikipedia in English. One time created, books can be exported in the PDF format. The PDF exporting backend itself is not provided by the Collection extension, it's done with a JavaScript based solution called OCG. OCG is a NodeJS daemon able to transform a book definition in a PDF and it should be able to do the same in the ZIM format. The ZIM format allows to store web pages (with images, videos, etc...) in one extremely compressed file, these pages are then available to read everywhere with a reader like Kiwix. A stub of solution has already been written and the MWOffline is already functional. This task is mostly about merging this two pieces of code.
Description
Details
- Reference
- bz71660
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Declined | None | T73660 Add ZIM format support to OCG | |||
Invalid | Kelson | T96235 Create a Debian package of zimwriterfs | |||
Declined | None | T113736 Include stylesheets and javascript in `mw-ocg-bundler` | |||
Resolved | Arlolra | T69540 Produce/preserve the metadata about additional ResourceLoader modules required by extension tags | |||
Resolved | • marcoil | T73490 Parsoid should set the prop parameter when calling API action=expandtemplates | |||
Resolved | • marcoil | T86902 Improve Parsoid's loading of CSS modules using ResourceLoader | |||
Invalid | Minervaxox | T115907 Outreachy proposal for T73660: Add ZIM format support to OCG | |||
Declined | None | T114788 OCG should download resourceLoader js/css dependencies | |||
Declined | Adishaporwal | T116482 Outreachy proposal for Add ZIM format support to OCG |
Event Timeline
Here's what we have:
- Skeleton code to parse CLI options, unpack bundles, interface with OCG in https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-zim_renderer
- It is missing the code which actually transforms the HTML for local viewing, but this part can be borrowed from http://sourceforge.net/p/kiwix/other/ci/master/tree/mwoffliner/
- After it writes the standalone HTML tree on disk, you would invoke zimwriterfs to actually create the ZIM file (but see T96235: Create a Debian package of zimwriterfs).
IIRC, last time I looked at the code, some tweaks to https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler might be needed as well. I already added one in 55aa6bea33e29053b76b2043d2c96bcb2f4f1964 since the zimwriter backend needs to rewrite redirects. I believe there were other minor issues involving stylesheets & etc -- for example, the Parsoid DOM includes a stylesheet URL, but we don't actually fetch it in the bundler. (And in this case a better solution would be to use the API to query the actual style modules necessary, instead of just stashing the result of ResourceLoader; see T69540: Produce/preserve the metadata about additional ResourceLoader modules required by extension tags). I'm happy to do the mw-ocg-bundler side of this work; just create phab tickets for specific items and link them here as blockers.
If you want to feature this project idea at https://www.mediawiki.org/wiki/Outreachy/Round_11 please edit the description adding the mentors, skills required, and microtasks for candidates. Thank you!
@cscott, would you be interested in mentoring this as an internship project for Outreachy?
@cscott, would you be interested in / would time allow mentoring this as an internship project for Outreachy?
I am shifting this to Outreachy-Round-11 as the project description has two mentors, micro-tasks and looks ready for the 11th edition of Outreachy ( Dec 2015 - Mar 2016 ) . Potential candidates should start by submitting their proposals as a blocker for this task, by November 02.
Feel free to revert it back, if this task has some relevant issues which might block its completion in this term of Outreachy.
I would like to work on this project as a part of Outreachy round 12/ GSoC 2016. I am fairly good at PHP and know some Node.JS. I have read about the ZIM format and OCG. I intend to look at the present stub of solution implemented in the next couple days, and in parallel solve the microtasks. @cscott, will you be willing to work as a mentor for this project?
This looks like quite a discussed project. @cscott , would you be willing to mentor this project for Outreachy-13(Dec 6-March 6) ?
Hi all,
I am a software engineering student and i am quite new to WIkimedia.
While browsing the possible projects, i read through this project and It seems very interesting. I am willing to take this project during this GsoC '17 Please @cscott if you agree with that I can move ahead directly with looking deeply at the project. Thanks
@Aklapper @Eugene233 On our side this is still pretty important even if we have no focus on this due to lack of resources. I have posted a comment in that direction here https://phabricator.wikimedia.org/T146757#2959943. That said, to the contrary to the OCG, the electron-renderer (effort) seems be self-focused an to offer little opportunities to be reused for other formats.
Electron will never support ZIM, AFAIK. I think OCG is still the only option for actual *offline* collection creation.
@cscott Thx for confirming my feeling.
"mwoffliner" is not available as a npm module, so it can be directly/easily used in OCG.
We are currently fixing the problem with mocking the resourceLoader for offline usage in mwoffliner and use also the mobile layout. This should be finished in a few weeks.
Then, it would be smart to move away from zimwriterfs binary call, and use directly node-libzim. One time that's made, it should be relatively easy to bring ZIM export in OCG.
"New Reader" and "global reach" teams are pretty supportive to that feature AFAIK and this is important to Kiwix project too. Looks like we just need to gather supportive people to get enough support to get dev resources to "finish the job".
I'm a dev resource and willing to work on this task but I cannot work out whether it's currently parked. I am new to this community and would appreciate being steered in the right direction. @cscott can you help?
@Inveteratransmog: Hi and welcome! :) Wikimedia plans to replace OCG on its servers and OCG might get archived. Hence I would not recommend spending time on this task. I'm not sure about the exact state of ZIM plans - the Kiwix folks or the WMF Readers team might know best? :-/
As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.
Is this still a valid task or be tagged as Possible-Tech-Projects as per @Aklapper's comment above?