Hi folks,
A change from our legacy PDF export infrastructure to a more maintainable system has been a long time coming. Thanks to the work of C. Scott Ananian, Matt Walker, Max Semenik, Brad Jorsch, and the Parsoid/Services teams, we're close to disabling the old PDF rendering and enabling the new as default.
The old "mwlib" (Reportlab-based) PDF service will be disabled on September 29 [1], and the new "ocg" (Parsoid/XeLaTex-based) PDF service will be enabled on the same date. The output looks very different (two column LaTeX-generated output) and there are a few more customization options, as well (visible when you use the "Create a book" feature). This renderer has much improved rendering of non-latin scripts and fixes many issues of the old PDF service.
There are still a number of bugs to work through, as well, which we will do as a low priority. Most noticeably, tables are a mess - and since we have such a large variety of them, that's a pretty long tail. This is not a high priority project for us. We're looking for co-maintainers, and we have some ideas how to improve the architecture to make it easy for the community to optimize for different outputs -- if you're interested in joining the development effort, let us know via the new services list ( https://lists.wikimedia.org/mailman/listinfo/services ), wikitech-l or on #mediawiki-services on irc.freenode.net.
Please report any bugs against the "OCG" product in Bugzilla. https://bugzilla.wikimedia.org/enter_bug.cgi?product=OCG
If you want to more quickly test prior to deployment, you can add this one-liner to your common.js/global.js, which will point the "Download as PDF" link in the sidebar to the new renderer:
$('#coll-download-as-rl a').each(function() { this.href = this.href.replace(/([&?]writer=)rl(&|$)/g, "$1rdf2latex$2") });
As part of this change, we will disable ZIM and EPUB export for the time being. If you're interested in working on ZIM or EPUB support for the new offline content generator, or other export formats, please let us know via the above channels.
Thanks, Erik
[1] See https://wikitech.wikimedia.org/wiki/Deployments for the planned deployment window
The new PDF export is indeed ready for wide testing now, I'm working on it at https://etherpad.wikimedia.org/p/BugTriage-mwlib
Erik Moeller, 25/09/2014 04:01:
As part of this change, we will disable ZIM and EPUB export for the time being. If you're interested in working on ZIM or EPUB support for the new offline content generator, or other export formats, please let us know via the above channels.
This is... extremely sad. I've tried for years to get some volunteers to help with Collection's ePub export (even just by debugging), but it's extremely hard when they have no control on what gets integrated nor on the fate of the feature: they just feel it's not their thing. I watched with frustration people spend time on WSexport (a custom, fragile tool); I guess they were right, and I a fool. One thing however I don't understand. The ePub and ZIM generation causes negligible load, was PediaPress asked whether they'd be able to run that service on another server of their own? Ok, WMF must close Tampa and eventually wants to run export in house; but until there is a replacement, it makes most sense to keep continuity, and we need some default service to include in the extension for third party users.
Nemo
On Thu, Sep 25, 2014 at 6:36 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
I've tried for years to get some volunteers to help with
Collection's ePub export (even just by debugging), but it's extremely hard when they have no control on what gets integrated nor on the fate of the feature: they just feel it's not their thing.
We've felt the same way. ;-) PediaPress did a great job building this infrastructure on our behalf, including even their own parser in Python long before Parsoid. But ... other than the PP folks very few people are familiar with the internals of the mwlib pipeline, and to the extent that we are, we've badly wanted to change/improve it to bring it in line with recent developments (e.g. make it properly maintainable from an operations standpoint, actually leverage Parsoid, etc.).
This has changed now -- OfflineContentGenerator has a modern architecture, in line with our plans for other services, and supported (at a low priority) by WMF. Consequently, we really want the Collection exetnsion and the ocg project to become more accessible as an open source project, and we're serious about supporting that. So if folks want to work on stuff like EPUB in a serious way, we'll help. But we'll refrain from band-aid solutions that may actually make people feel like it's not something where help is wanted/needed.
ePUB and ZIM are likely to return. The new architecture makes it quite easy to add "html-like" backends, and both epub and zim are similar in that sense. I hacked on an initial zim backend at https://github.com/cscott/mw-ocg-zimwriter during Wikimania with Emmanuel. But I am rather overcommitted at WMF (in addition to my "real job" at the Parsoid project, where I'm in theory working on language converter issues this quarter, I'm also creating/maintaining the OCG service and lending some real-time collaboration help to the Visual Editor team). Hence Erik's help-wanted. --scott
wikitech-ambassadors@lists.wikimedia.org