All,
We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. You can use the following featured articles just to see what our current output is: * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot * http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire
Some other articles imported on that test wiki: * http://ur1.ca/gg0bw
Please note that some of these will fail due to known issues noted below.
You can render any page in the new renderer by clicking the sidebar link "Download as WMF PDF"; if you "Download as PDF" you'll be using the old renderer (useful for comparison.) Additionally, you can create full books via Special:Book -- our renderer is "RDF to Latex (PDF)" and the old renderer is "e-book (PDF)". You can also try out the "RDF to Text (TXT)" renderer, but that's not on the critical path. As of right now we do not have a bugzilla project entry so reply to this email, or email me directly -- we'll need one of: the name of the page, the name of the collection, or the collection_id parameter from the URL to debug.
There are some code bits that we know are still missing that we will have to address in the coming weeks or in another sprint. * Attribution for images and text. The APIs are done, but we still need to massage that information into the document. * Message translation -- right now all internal messages are in English which is not so helpful to non English speakers. * Things using the <cite> tag and the Cite extension are not currently supported (meaning you won't get nice references.) * Tables may not render at all, or may break the renderer. * Caching needs to be greatly improved.
Looking longer term into deployment on wiki, my plans right now are to get this into beta labs for general testing and connect test.wikipedia.org up to our QA hardware for load testing. The major blocker there is acceptance of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal aptitude package source. This is not quite as easy as it sounds, we already use TexLive 2009 in production for the Math extension and we must apply thorough tests to ensure we do not introduce any regressions when we update to the 2012 package. I'm not sure what actual dates for those migrations / testing will be because it greatly depends on when Ops has time. In the meantime, our existing PDF cluster based on mwlib will continue to serve our offline needs. Once our solution is deployed and tested, mwlib (pdf[1-3]) will be retired here at the WMF and print on demand services will be provided directly by PediaPress servers.
For the technically curious; we're approximately following the parsoid deployment model -- using trebuchet to push out a source repository (services/ocg-collection) that has the configuration and node dependencies built on tin along with git submodules containing the actual service code.
It may not look like it on the surface, but we've come a long way and it wouldn't have been possible without the (probably exasperated) help from Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their work, and Gabriel for some head thunking. C. Scott and I are not quite off the hook yet, as indicated by the list above, but hopefully soon enough we'll be enjoying the cake and cookies from another new product launch. (And yes, even if you're remote if I promised you cookies as bribes I'll ship them to you :p)
~Matt Walker