Brian wrote:
Thank for your answers.
ParserFunctions are my specific example of how the current development process is very, very broken, and out of touch with the community. According to Jimbo's user page (his bolded): "*Any changes to the software must be gradual and reversible.* We need to make sure that any changes contribute positively to the community, as ultimately determined by everybody in Wikipedia, in full consultation with the community consensus."
I believe that the introduction of ParserFunctions to MediaWiki was not done with community consensus and has led to an extremely fast devolution in wiki syntax. Further, the usability of Wikipedia has declined at a rate proportional to the adoption of parser functions.
The evolution of templates, and then ParserFunctions, was led by community demand and was widely encouraged by the community. I was concerned about the usability implications of ParserFunctions, but the community demonstrated its intent to ignore any usability concerns by implementing complex templates, very similar to the ones seen today, using the parameter default mechanism alone. Resistance to this trend seemed very weak.
The decline of usability in the template namespace has been driven by technically-minded editors who are proud of their ability to make use of an arcane and cryptic syntax to produce ever more complex feats of text processing. This is an editorial issue and I cannot accept responsibility for it.
However, I am aware that I enabled this process, by implementing the few simple features that they needed. I regret my role in it. That's one of the reasons why I've been resisting the constant community pressure to enable StringFunctions, which I believe will lead to compiler-like functionality implemented in the template namespace. Instead, I've been trying to steer development in the direction of a readable embedded programming language.
If you want a wiki with infoboxes (and I suppose I do since I wrote one of them in the pre-template era using an Excel VBA macro), then I suppose we need some form of template feature. The problem with present-day parser functions is that they are terribly ugly, excessively punctuated, dense to the point of unreadability, with very limited commenting and self-documentation.
I believe that the solution to this problem lies in borrowing concepts from software engineering, such as variables, functions, minimally parenthesized programming languages, libraries, objects, etc. I know that many template programmers cannot program in a traditional programming language, but I have a feeling they could if they wanted to. I certainly find PHP programming much easier than template programming, after a few years of familiarity with both.
I'm also aware that most (non-template) Wikipedia editors have no desire to learn how to program, and do not believe that it should be necessary in the course of editing articles. I think that with enough development time, a suitable platform in MediaWiki could connect these two types of editors. For example there could be an easy-to-use form-based template invocation generator, with forms written by the same technically minded editors who write arcane templates today. Citations could be inserted into articles by invoking a popup box and entering text into clearly labelled form fields.
From another post: We do not even have a parser. I am sure you know that MediaWiki does not actually parse. It is 5000 lines worth of regexes, for the most part.
"Parser" is a convenient and short name for it.
I've reviewed all of the regexes, and I stand by the vast majority of them. The PCRE regular expression module is a versatile text scanning language, which is compiled to bytecode and executed in a VM, very much like PHP. It just so happens that for most text processing tasks where there is a choice between PHP or PCRE, PCRE is faster. In certain special cases, it's possible to gain extra performance by using primitive text scanning functions like strpos() which are implemented in C. Where this is possible, I have done so. But if you want to, say, find the first match from a list of strings in a single subject, searching from a given offset, then the fastest way to do it in standard PHP is a regex with the /S modifier.
In two cases, I found the available algorithms accessible from standard PHP to be inconveniently slow, so I wrote the FSS and wikidiff2 extensions in C and C++ respectively.
Perhaps, like so many computer science graduates, you are enamored with the taxonomy of formal grammars and the parsers that go with them. There are a number of problems with these traditional solutions.
Firstly, there are theoretical problems. The concept of a regular grammar is not versatile enough to describe languages such as XML, and not descriptive enough to allow unambiguous parse tree production from a language like wikitext. It's trivial to invent irregular grammars which can be nonetheless processed in linear time. My aims for wikitext, namely that it be easy for humans to write but fast to convert to HTML, do not coincide well with the taxonomy of formal grammars.
Secondly, there are practical problems. Past projects attempting to parse wikitext using flex/bison or similar schemes have failed to achieve the performance of the present parser, which is surprising because I didn't think I was setting the bar very high. You can bet that if I ever rewrote it in C++ myself, it would be much faster. The PHP compiler community is currently migrating away from LALR towards a regex-based parser called re2c, mostly for performance reasons.
Thirdly, there is the fact that certain phases of MediaWiki's parser are already very similar to the textbook parsers and can be analysed in those terms. The main difference is that our parser is better optimised. For example, the preprocessor acts like a recursive descent parser, but with a non-recursive frontend (using an internal stack), a caching phase, and a parse tree expansion phase with special-case recursive to iterative transformations to minimise stack depth.
Yet another post:
I don't believe a computer scientist would have a huge problem writing a proper parser. Are any of the core developers computer scientists?
Frankly, as an ex-physicist, I don't find the field of computer science particularly impressive, either in terms of academic rigour or practical applications. I think my time would be best spent working as a software engineer for a cause that I believe in, rather than going back to university and studying another socially-disconnected field.
-- Tim Starling