is there a particular reason, why OutputPage::addModuleScripts,
OutputPage::addModuleStyles, ParserOutput::addModuleScripts, and
ParserOutput::addModuleStyles do not resolve module dependencies? Or
is this just something nobody ever bothered to implement?
If you were following our planning process this past spring/summer, you
probably heard that we had planned to deploy both OAuth and OpenID by the
end of 2013.
The good news is that we were able to complete our OAuth deployment (see
Dan Garry's blog post on the subject). The bad news is that we weren't
able to get OpenID complete enough for deployment, we've decided to put
OpenID on hold to make room for some of our other work. Thomas Gries has
done some great work on this, and has been very responsive to our input
(we're a finicky bunch!) We fully anticipate being able to deploy this
sometime in 2014, but we don't yet have an exact plan for making this
We've communicated this through other channels, but hadn't broadcasted it
here, so we're a bit overdue for an update.
The Auth Systems page now has the latest information on what we were able
to do, and will have more on our future plans as we make them: <
 "OAuth now available on Wikimedia wikis" Dan Garry: <
For those who are not aware, the WMF is currently attempting to replace the
backend renderer for the Collection extension (mwlib). This is the renderer
that creates the PDFs for the 'Download to PDF' sidebar link and creates
books (downloadable in multipe formats and printable via PediaPress), using
Special:Book. We're taking the data centre migration as our cue to replace
mwlib for several reasons; high among them being the desire to use Parsoid
to do the parsing from wikitext into something usable by an external tool
-- mwlib currently does this conversion internally. This should allow us to
solve several other long standing mwlib issues with respect to the
rendering of non latin languages.
Last week we started work on the new parser, which we're calling the
'Collection Offline Content Generator' or OCG-C. Today I can say that where
we are is promising but by no means complete. We as yet only have basic
support for rendering articles, and a lot of complex articles are failing
to render. For the curious we have an the alpha product  and a public
coordination / documentation page  -- you can also join us in
In broad strokes ; our solution is a LVS fronted Node.JS backend cluster
with a Redis job queue. Bundling (content gather from the wiki) and
Rendering are two distinct processes with an intermediate file  in
between. Any renderer should be able to pick the intermediate file up and
produce output . We will store bundle files and generated documents
under a short timeout in Swift, and have a somewhat longer frontend cache
period in varnish for the final documents. Deployments will be happening
via Trebuchet, and node dependencies are stored in a seperate git
repository -- much like Parsoid and eventually Mathoid .
The Foundation is still partnering with PediaPress to provide print on
demand books. However, bundling and rendering will in future be performed
on their servers.
The team will continue to work on this project over the coming weeks. Big
mileposts in no particular order are table support, puppetization into beta
labs, load testing, and multilingual support. Our plan is have something
that the community can reliably beta test soon with final deployment into
production happening, probably, early January . Decommisioning of the
old servers is expected to happen by late January, so that's our hard
deadline to wrap things up.
Big thanks to Max, Scott, Brad & Jeff for all their help so far, and to
Faidon, Ryan and other ops team members for their support.
If you'd like to help, ping me on IRC, and you'll continue to find us on
~ Matt Walker
 More detail available at
 The format is almost exactly the same as the format mwlib uses, just
with RDF instead of HTML
 Right now the alpha solution only has a LaTeX renderer, but we have
plans for a native HTML renderer (both for PDF and epub) and the ZIM
community has been in contact with us about their RDF to ZIM renderer.
 Mathoid is the LaTeX math renderer that Gabriel wrote which will run on
the same servers as this service. Both falling under this nebulous category
of node based 'Offline Content Generators'
 I'm being hazy here because we have other duties to other teams again.
Tuesdays until Jan 1 are my dedicated days to working on this project, and
I become full time again to this project come Jan 1. Erik will reach out to
organize a follow-up sprint.
On Wed, Nov 27, 2013 at 9:52 AM, Bjoern Hassler <bjohas+mw(a)gmail.com> wrote:
> could I check whether this new process would pick up formatting inserted via
> css styles, e.g. attached to a <span> or <div>?
> On our mediawiki (http://www.oer4schools.org) we use a handful of different
> css styles to provide boxes for different types of text (such as facilitator
> notes or background reading). With the PediaPress tools this didn't render
> at all (because the wiki text was parsed directly), but also <blockquote>
> and table background colors did not render nicely, leaving us very few
> options for highlighting blocks of text. (See here for an example of two
> types of boxed text:
The current plan is for the latex renderer *NOT* to pick up CSS
styles, in general. The latex renderer will be a 'semantic renderer'
-- it will normalize the formatting to make it conform to house style.
It will be tuned to the needs of the Wikipedias. It knows about
certain CSS classes and Templates, but is not particularly
...which is why it won't be the only backend! We also expect to have
an "HTML" renderer, which will apply CSS styles to the Parsoid output
and render to PDF via phantom JS (aka webkit).
This gives you two options, "faithful" and "beautiful". In my
experience so far, the LaTeX output, when it works, produces superior
output -- the typesetting is better, the ligatures and non-latin
support ought to be superior, the justification is nicer, and math
rendering should be stellar. We also use a two column layout and
normalize figure sizes to match the column widths, which helps
maintain a clean appearance. However, as you have noted, the LaTeX
renderer isn't particularly extensible, and there are cases where we
need to preserve the author's styling even at the cost of somewhat
less 'clean' output. Some articles can't easily be shoehorned into
our 'house style'. The Parsoid->HTML->webkit->PDF render path should
be a good solution in these cases, even if (for instance) the
paragraph justification and page splitting isn't quite as pretty.
(Browser technology continues to improve; one day it may be possible
to make the HTML->PDF pipeline just as pretty. So the "faithful"
approach is also our "forward-looking" renderer.)
Our architecture allows multiple 'backends' to be plugged in, so it is
possible there could be other options as well. I hope to refactor the
LaTeX backend at some point, for instance, to make it more extensible
so that you could in theory add special 'tweaks' for your wiki's
"house style". I could also add a CSS engine so that the LaTeX
backend could pick up certain CSS styles -- like table background
color, for instance.
It's all a work in progress, of course! But the
"faithful"/"beautiful" split is the principle we're working with.
tl;dr - How do we add 3rd party libs to core: composer, git submodules, or
copying the code?
So I have a question to discuss concerning MW core that I was hoping to get
some feedback on: what is our policy on including third-party libraries in
To clarify, by policy I don't mean what factors do we take into account
when deciding to include a library (although feel free to weigh in on that
if you want to say something), but rather how one would go about doing it.
Here are the possibilities:
1) Use Composer to install dependencies
2) Use git submodules to store a reference to the repository
3) Copy the code and add a note somewhere of where it came from
(If I am missing an option, please enlighten me.)
My opinion on the matter is that option 1 is probably the best, primarily
because Composer was designed specifically for this purpose, and it is
widely used and is unlikely to randomly disappear in the near future. Also,
it makes the incorporation of these libraries trivial, since the autoloader
will be automatically registered using Composer. However, the method is not
without fault. A recent patch to core actually removed our composer.json
file, in hopes of allowing MediaWiki sysadmins to make their own custom
composer.json file so that extensions could be installed that way. Which is
more important: better maintenance of core dependencies, or allowing easier
extension installation? I don't know; that's for us to decide. I'm a bit
conflicted on the matter because I really do want to make extension
installation and management easier, but at the same time making sure the
core itself is easy to use should probably be a higher priority.
The next option is pretty much similar to Composer in that you have a
reference to some external code that will be downloaded when told to do so
by the user. However, it's different from Composer in a number of ways: 1)
when packaging tarballs, the code has to be downloaded anyway since
submodules are git-specific, and 2) we have to manage the autoloader
manually. Not too bad. If we decide the Composer option is not viable, I
think this would be a good alternative.
I don't like the final option at all, but it seems to be our current
approach. It's basically the same thing as git submodules except rather
than having a clear reference to where the code came from and where we can
update it, we have to add a README or something explaining it.
Also, just to clarify, this is not an out-of-the-blue request for comment.
I am currently considering whether we might want to replace our
HttpFunctions file with the third-party Guzzle library, since the latter is
very stable, much much more functional, and a lot easier to use. However,
this is out-of-scope for the discussion, so if you have an opinion on
whether doing this is a good/bad idea, please start another thread.
Stevens Institute of Technology, Class of 2016
Major in Computer Science