Hello, I did some investigation on how to compile MediaWiki to LaTeX. In this Email I will discuss only the problems caused by the fact that MediaWiki uses Unicode and how to use Unicode with LaTeX.
1) At first Unicode uses the same codepoint for different glyphs in Chinese Japanse Korean. In Wikipedia there are special templates to work around this problem, but there are many cases where these templates are not used, so this causes essentially an unsolvable problem. In LaTeX you got all needed glyphs available but if you just got the codepoint you cannot know which one to chose.
2) There are currently three good LaTeX compilers. I think it is hard to chose one, because each of them has got a significant disadvantage. One point to understand here is microtype. It is about applying tiny changes to glyphs to get better margins and better line breaking, which is something very often done in professionally printed books, but something only the pdflatex and lualatex compilers can do. The remaining xelatex compiler can't do it. pdflatex can basically not really do unicode. I made it do unicode by hacking the cjk package, but this requieres a special hacked font, which legal under gpl, but it is still a hack and will surely never make it into debain. I had a long discussion with the developer of the CJK package, and essentially we didn't find any way to make pdflatex do unicode in a way acceptable by Debian. The remaining compiler is lualatex. This does not allow the change of fonts in the current version of Ubuntu. But it does so in the current testing version of Debian. But here is consumes a little bit more than one GByte of RAM when changing fonts, which is also reported by other users and does not seem to be a memory leak.
So what choises are there: 1) A wired Hack -> pdflatex 2) No microtype -> xelatex 3) 1GByte Memory Consumption and debian testing -> lualatex
If you can decide for one of these options, I will work towards an offical debian package doing that. I personally prefer lualatex. Yours Dirk