[Wikitech-l] Status of the new PDF Renderer

18 Jan 2014


      All,
We've just finished our second sprint on the new PDF renderer. A
significant chunk of renderer development time this cycle was on non latin
script support, as well as puppetization and packaging for deployment. We
have a work in progress pipeline up and running in labs which I encourage
everyone to go try and break. You can use the following featured articles
just to see what our current output is:
    * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot
    *
http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire
Some other articles imported on that test wiki:
    * http://ur1.ca/gg0bw
Please note that some of these will fail due to known issues noted below.
You can render any page in the new renderer by clicking the sidebar link
"Download as WMF PDF"; if you "Download as PDF" you'll be using the old
renderer (useful for comparison.) Additionally, you can create full books
via Special:Book -- our renderer is "RDF to Latex (PDF)" and the old
renderer is "e-book (PDF)". You can also try out the "RDF to Text (TXT)"
renderer, but that's not on the critical path. As of right now we do not
have a bugzilla project entry so reply to this email, or email me directly
-- we'll need one of: the name of the page, the name of the collection, or
the collection_id parameter from the URL to debug.
There are some code bits that we know are still missing that we will have
to address in the coming weeks or in another sprint.
    * Attribution for images and text. The APIs are done, but we still need
to massage that information into the document.
    * Message translation -- right now all internal messages are in English
which is not so helpful to non English speakers.
    * Things using the <cite> tag and the Cite extension are not currently
supported (meaning you won't get nice references.)
    * Tables may not render at all, or may break the renderer.
    * Caching needs to be greatly improved.
Looking longer term into deployment on wiki, my plans right now are to get
this into beta labs for general testing and connect test.wikipedia.org up
to our QA hardware for load testing. The major blocker there is acceptance
of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal
aptitude package source. This is not quite as easy as it sounds, we already
use TexLive 2009 in production for the Math extension and we must apply
thorough tests to ensure we do not introduce any regressions when we update
to the 2012 package. I'm not sure what actual dates for those migrations /
testing will be because it greatly depends on when Ops has time. In the
meantime, our existing PDF cluster based on mwlib will continue to serve
our offline needs. Once our solution is deployed and tested, mwlib
(pdf[1-3]) will be retired here at the WMF and print on demand services
will be provided directly by PediaPress servers.
For the technically curious; we're approximately following the parsoid
deployment model -- using trebuchet to push out a source repository
(services/ocg-collection) that has the configuration and node dependencies
built on tin along with git submodules containing the actual service code.
It may not look like it on the surface, but we've come a long way and it
wouldn't have been possible without the (probably exasperated) help from
Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their
work, and Gabriel for some head thunking. C. Scott and I are not quite off
the hook yet, as indicated by the list above, but hopefully soon enough
we'll be enjoying the cake and cookies from another new product launch.
(And yes, even if you're remote if I promised you cookies as bribes I'll
ship them to you :p)
~Matt Walker

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Status of the new PDF Renderer