I'm trying to render an image which uses characters from all of the languages supported by WP. Is there a single font deployed on production servers that include all scripts?
The Autonym font includes characters for all the languages supported by MediaWiki, but only a small subset: https://www.mediawiki.org/wiki/Universal_Language_Selector/AutonymFont
Ryan Kaldari
On Wed, Jul 30, 2014 at 7:48 AM, C. Scott Ananian cananian@wikimedia.org wrote:
In general, "one font to rule them all" is highly discouraged/impractical as a means to achieve reasonable results in a variety of world languages. Indic fonts, for example, typically contain complex shaping engines in bytecode -- it's just not practical to try to write one engine for everything. All of the "one font with wide coverage" attempts that I have seen look okay for Latin languages, but fail for the rest of the world.
<rant>...which is typically the opinion inadvertently expressed by these "characters from every language" projects anyway. Without real knowledge of the rest of the world's languages and scripts, we get something that shows that the creator valued the rest of the world only for "looking exotic", and was not interested in true understanding.</rant>
Most modern font systems have a mechanism to merge multiple fonts under one virtual name as needed in order to get good coverage. So you don't need to find a find font to rule them all. --scott
ps. "Font synthesis" systems actually have a big problem in that parts of the unicode character space are shared by different languages with different rules for shaping and ligatures, etc. So you really need to explicitly annotate the language and then chose a font specific for that *language*, not rely simply on codepoint. (Unfortunately much of the "foreign language" content in wikipedia (ie short texts which are not in the main language of the wiki) is not explicitly annotated with language information.)
pps. for those actually interested in getting the details of world writing systems correct, I could use some help with the new OCG PDF rendering backend, which just went live in production yesterday. It uses XeLaTeX, which actually does pay careful attention to Indic shaping and ligatures, etc, but it is not a "modern system" as described above in terms of synthesizing coverage from multiple fonts. Patches would be helpful to make better guesses about the native language of "foreign language" spans, which would then ensure an appropriate font was used.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l