On Wed, Nov 27, 2013 at 9:52 AM, Bjoern Hassler bjohas+mw@gmail.com wrote:
could I check whether this new process would pick up formatting inserted via css styles, e.g. attached to a <span> or <div>?
On our mediawiki (http://www.oer4schools.org) we use a handful of different css styles to provide boxes for different types of text (such as facilitator notes or background reading). With the PediaPress tools this didn't render at all (because the wiki text was parsed directly), but also <blockquote> and table background colors did not render nicely, leaving us very few options for highlighting blocks of text. (See here for an example of two types of boxed text: http://orbit.educ.cam.ac.uk/wiki/OER4Schools/ICTs_in_interactive_teaching.)
The current plan is for the latex renderer *NOT* to pick up CSS styles, in general. The latex renderer will be a 'semantic renderer' -- it will normalize the formatting to make it conform to house style. It will be tuned to the needs of the Wikipedias. It knows about certain CSS classes and Templates, but is not particularly extensible...
...which is why it won't be the only backend! We also expect to have an "HTML" renderer, which will apply CSS styles to the Parsoid output and render to PDF via phantom JS (aka webkit).
This gives you two options, "faithful" and "beautiful". In my experience so far, the LaTeX output, when it works, produces superior output -- the typesetting is better, the ligatures and non-latin support ought to be superior, the justification is nicer, and math rendering should be stellar. We also use a two column layout and normalize figure sizes to match the column widths, which helps maintain a clean appearance. However, as you have noted, the LaTeX renderer isn't particularly extensible, and there are cases where we need to preserve the author's styling even at the cost of somewhat less 'clean' output. Some articles can't easily be shoehorned into our 'house style'. The Parsoid->HTML->webkit->PDF render path should be a good solution in these cases, even if (for instance) the paragraph justification and page splitting isn't quite as pretty. (Browser technology continues to improve; one day it may be possible to make the HTML->PDF pipeline just as pretty. So the "faithful" approach is also our "forward-looking" renderer.)
Our architecture allows multiple 'backends' to be plugged in, so it is possible there could be other options as well. I hope to refactor the LaTeX backend at some point, for instance, to make it more extensible so that you could in theory add special 'tweaks' for your wiki's "house style". I could also add a CSS engine so that the LaTeX backend could pick up certain CSS styles -- like table background color, for instance.
It's all a work in progress, of course! But the "faithful"/"beautiful" split is the principle we're working with. --scott