The following is a proposal, pending the outcome of a consultation with the wikimedia communities: https://www.mediawiki.org/wiki/Reading/Web/PDF_Rendering
= Summary
We would like to create a plan that results in there only being one service for rendering pdfs. OCG was not aging well, and rather than support two solutions, the foundation needs to focus on one, which will allow for better user experience development and maintenance.
== Goal Visibility
This represents a response to community interest in an improved pdf solution, demonstrated in Wikimedia Deutschland's TCB team's wishlist (T135643), and necessitated by their work. Investing in pdf's is strategically aligned with WMF's [[ https://meta.wikimedia.org/wiki/New_Readers | "New Readers" project ]]. The New Readers project is product's [[ https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2016-2017/Final#Program_3:_Increase_our_global_reach_by_increasing_readership | program #3, Goal 3 on the 2016-2017 annual plan ]] as they have identified offline access as 1 of 3 barriers to Wikipedia access that they are looking to solve for.
= Rationale
As documented here, in the comment by @cscott (https://en.wikipedia.org/wiki/Wikipedia_talk:Offline_Content_Generator) and various other places, OCG was built quickly to replace code by an outside organization, PediaPress, and has had scaling and architectural issues.
The OCG service does the following:
# converts wikitext pages to latex-formatted-pdf and plain text. In the past, it has also supported zim, epub and possibly more
# per above, applies an attractive layout where the print.css has not provided an attractive option
# when integrated with the [[ https://www.mediawiki.org/wiki/Extension:Collection | collection extension ]], collates articles selected by a user into books + creates a table of contents
OCG is currently not well supported by the WMF and there are difficulties with Latex that have disabled table rendering in pdfs. Latex is a fairly brittle framework which is not well-suited to our incredibly flexible content-types. Furthermore, bugs in OCG or the [[ https://www.mediawiki.org/wiki/Extension:Collection | Collection extension ]] have greatly diminished the 3rd use of OCG (creating books).
There was significant desire from the community to provide a Latex alternative for single-article PDF rendering (captured here T135643) and we are doing this via a new service called Electron. Some of the decision making around Electron is captured here (T134205). Development and implementation of this service is currently underway. The WMF’s Operations team has strong reservations about running both services at once, given the heavy overlap in functionality, particularly when one of them is not well supported. They have kindly, and rightly asked us to transfer remaining features from OCG to electron and sunset it.
= Success Metrics
- Readers and contributors do not lose essential or popular functionality
- Operations only has one service to support
= External Dependencies
this will take effort from Wikimedia Deutschland TCB and the following WMF teams:
- reading infrastructure
- reading web
- community liasions
- services
- operations
= Unknowns
- we have a solid overview for what readers needs are around pdf creation, but lack nuance and edge cases
= EPIC Plan
# Stage 1, in parallel, Dec - March, 2017
- turn on electron alternative to OCG to allow tables in pdfs, per community wishlist #9 (T135643)
- improve print CSS so that default pdf's are more attractive (T135022#2672465)
- measure user preference for new v. old pdfs (T150326)
- introduce community to the implications of proposal and ask for feedback (T146757)
# Stage 2, April - May, 2017
- replicate collation of articles into a single pdf within "book creator" using Electron to replicate core missing functionality (pending)
- identify missing OCG uses that we have missed via community consultation (T146757),
# Stage 3, May - Jul, 2017
- act on above results
- communicate sunsetting (an announcement following the consultation in the earlier stage)
# Stage 4, August 2017
- retire OCG service
== Probable drawbacks
- currently there are no plans to continue to support two-column layout favored by Latex
- currently there are no plans to continue to support plain-text conversion, epub or zim (currently not supported by OCG)
== Metrics Implementation
- Current usage of API's is <1% of pageviews, which we consider significant
- measure user preference for new v. old pdfs (T150326)
- Current usage of book creator is very limited
== Updated timeline
**August - September 2017**
- Complete building article concatenation: https://phabricator.wikimedia.org/T171838
- Create post-processing library for table of contents and page numbers: https://phabricator.wikimedia.org/T171960
- Add an option in Special:Book to download PDFs generated by ElectronPdfService: https://phabricator.wikimedia.org/T173018
- Update styles for books
- Deploy new book renderer side-by-side with OCG: https://phabricator.wikimedia.org/T171833
--In this stage, we will be giving 1-2 weeks to ensure that the new book renderer functions well and that no major issues are observed.
- We will also be informing the community that we will be replacing OCG with the new renderer without the entire functionality of table of contents and page numbers: https://phabricator.wikimedia.org/T174147
- Remove OCG as an option in Special:Book to download PDFs (https://phabricator.wikimedia.org/T174148)
- Sunset OCG service
- [Spike - 8 hours] How should the PDF post-processing script be exposed for use by Extension:Collection (https://phabricator.wikimedia.org/T171965)
**October, 2017**
- Expose PDF post-processing scripts (https://phabricator.wikimedia.org/T173579)
- Use PDF post-processing library to generate final PDF (https://phabricator.wikimedia.org/T173015)
- Deploy post-processing library
## Delivery Estimate
September 30th is our deadline for turning off OCG