On Tue, Jun 23, 2015 at 10:48 AM, Ori Livneh ori@wikimedia.org wrote:
Hello,
Over the course of the next two days, a major update to the SyntaxHighlight_GeSHi extension will be rolled out to Wikimedia wikis. The change swaps geshi, the unmaintained PHP library which performs the lexical analysis and output formatting of code, for another library, called Pygments.
The roll-out will remove support for 31 languages while adding support for several hundred languages not previously supported, including Dart, Rust, Julia, APL, Mathematica, SNOBOL, Puppet, Dylan, Racket, Swift, and many others. See https://people.wikimedia.org/~ori/geshi_changes.txt for a full list.
A very welcome list of additions!
The languages that will lose support are mostly obscure, with the notable exception of ALGOL68, Oz, and MMIX.
There are a few more exceptions, such as bnf, dot, pcre, email, and bibtex.
Hmm, algol support was recently added to Pygments.
https://bitbucket.org/birkenfeld/pygments-main/issue/1090/update-to-m2-lexer...
Perhaps it needs to be backported into their stable branch, and a new minor release pushed out for use on Wikimedia?
Since it is short, here is the full list of languages being de-supported.
6502acme 68000devpac algol68 arm avisynth bibtex bnf cil dot e email euphoria gml ldif lolcode mirc mmix mpasm oz parigp pcre pic16 pli q robots sas teraterm typoscript unicon whois xpp
Of the English Wikipedia articles about those concept that I have looked at, so far they all use <source> with the appropriate language set, so they will all regress down to plain monospaced text.
Have you identified how many times each of these have been used on Wikimedia projects? Perhaps that might help us identify the priority of languages needing to be added to Pygments (and which ones are entirely useless.
Some of them have bugs raised in Pygments https://bitbucket.org/birkenfeld/pygments-main/issue/1024/add-dot-lexer-grap...
Lastly, the way the extension handles unfamiliar languages will change. Previously, if the specified language was not supported by the extension, instead of a code block, the extension would print an error message. From now on, it will simply output a plain, unhighlighted block of monospaced code.
Ugh, is there a way to configure pygments to have fallbacks for languages which are substantially based on another? e.g. xpp is basically java, and looks quite good when I tested it on betalabs. I am sure that Pygments has some parser close to 'email', as they do support a 'http session' language.