[Foundation-l] Why is the software out of reach of the community?

Gazimoff gazimoff at o2.co.uk
Tue Jan 13 18:29:14 UTC 2009


As a qualified software engineer, I'm inclined to agree with you on a  
number of points. I apologize for my brevity, but I am somewhat  
restricted by the use of a mobile device.

After downloading, installing and maintaining a low-traffic Mediawiki  
setup, I think the experience can be improved dramatically. It's clear  
that a heavy amount of work has been done on improving processing  
speed and providing additional functionality, but in terms of a  
polished experience it doesn't match publishing tools like phpbb or  
Wordpress. Whether this is due to versatility requirements, or a lack  
of focus on this as a design requirement, I'm not sure.

On the subject of templates, they can be very much a black art.  
Templates are seldom commented to describe their function, import  
other templates without making it clear and place dependancies on  
extensions that may not be clear. Coupled with this, there's no peer  
review on templates as there is with articles. If a template performs  
the required function it is accepted and reused regardless of how  
clear or efficient the underlying code is. While suggesting that every  
high-use template be subjected to a formal code review is a somewhat  
silly idea, I think that you have three challenges on your hands.

The first is to make template writing more accessible, through the use  
of easily digestible starter documentation leading on to more complex  

The second is to encourage good use of templates, both through code  
comments and supplementary document subpages. Note that the two are  
different - one tells you how it works while the other tells you how  
to use it.

The third is to examine template parsing itself, with a view to  
revisiting the language and perhaps performing a refresh if appropriate.

Hope all this helps.


Sent from my iPhone

On 13 Jan 2009, at 16:13, Tim Starling <tstarling at wikimedia.org> wrote:

> Brian wrote:
>> Thank for your answers.
>> ParserFunctions are my specific example of how the current  
>> development
>> process is very, very broken, and out of touch with the community.
>> According to Jimbo's user page (his bolded): "*Any changes to the  
>> software
>> must be gradual and reversible.* We need to make sure that any  
>> changes
>> contribute positively to the community, as ultimately determined by
>> everybody in Wikipedia, in full consultation with the community  
>> consensus."
>> I believe that the introduction of ParserFunctions to MediaWiki was  
>> not done
>> with community consensus and has led to an extremely  fast  
>> devolution in
>> wiki syntax. Further, the usability of Wikipedia has declined at a  
>> rate
>> proportional to the adoption of parser functions.
> The evolution of templates, and then ParserFunctions, was led by  
> community
> demand and was widely encouraged by the community. I was concerned  
> about
> the usability implications of ParserFunctions, but the community
> demonstrated its intent to ignore any usability concerns by  
> implementing
> complex templates, very similar to the ones seen today, using the
> parameter default mechanism alone. Resistance to this trend seemed  
> very weak.
> The decline of usability in the template namespace has been driven by
> technically-minded editors who are proud of their ability to make  
> use of
> an arcane and cryptic syntax to produce ever more complex feats of  
> text
> processing. This is an editorial issue and I cannot accept  
> responsibility
> for it.
> However, I am aware that I enabled this process, by implementing the  
> few
> simple features that they needed. I regret my role in it. That's one  
> of
> the reasons why I've been resisting the constant community pressure to
> enable StringFunctions, which I believe will lead to compiler-like
> functionality implemented in the template namespace. Instead, I've  
> been
> trying to steer development in the direction of a readable embedded
> programming language.
> If you want a wiki with infoboxes (and I suppose I do since I wrote  
> one of
> them in the pre-template era using an Excel VBA macro), then I  
> suppose we
> need some form of template feature. The problem with present-day  
> parser
> functions is that they are terribly ugly, excessively punctuated,  
> dense to
> the point of unreadability, with very limited commenting and
> self-documentation.
> I believe that the solution to this problem lies in borrowing concepts
> from software engineering, such as variables, functions, minimally
> parenthesized programming languages, libraries, objects, etc. I know  
> that
> many template programmers cannot program in a traditional programming
> language, but I have a feeling they could if they wanted to. I  
> certainly
> find PHP programming much easier than template programming, after a  
> few
> years of familiarity with both.
> I'm also aware that most (non-template) Wikipedia editors have no  
> desire
> to learn how to program, and do not believe that it should be  
> necessary in
> the course of editing articles. I think that with enough development  
> time,
> a suitable platform in MediaWiki could connect these two types of  
> editors.
> For example there could be an easy-to-use form-based template  
> invocation
> generator, with forms written by the same technically minded editors  
> who
> write arcane templates today. Citations could be inserted into  
> articles by
> invoking a popup box and entering text into clearly labelled form  
> fields.
> From another post:
>> We do not even have a parser. I am sure you know that MediaWiki  
>> does not
>> actually parse. It is 5000 lines worth of regexes, for the most part.
> "Parser" is a convenient and short name for it.
> I've reviewed all of the regexes, and I stand by the vast majority of
> them. The PCRE regular expression module is a versatile text scanning
> language, which is compiled to bytecode and executed in a VM, very  
> much
> like PHP. It just so happens that for most text processing tasks where
> there is a choice between PHP or PCRE, PCRE is faster. In certain  
> special
> cases, it's possible to gain extra performance by using primitive text
> scanning functions like strpos() which are implemented in C. Where  
> this is
> possible, I have done so. But if you want to, say, find the first  
> match
> from a list of strings in a single subject, searching from a given  
> offset,
> then the fastest way to do it in standard PHP is a regex with the /S  
> modifier.
> In two cases, I found the available algorithms accessible from  
> standard
> PHP to be inconveniently slow, so I wrote the FSS and wikidiff2  
> extensions
> in C and C++ respectively.
> Perhaps, like so many computer science graduates, you are enamored  
> with
> the taxonomy of formal grammars and the parsers that go with them.  
> There
> are a number of problems with these traditional solutions.
> Firstly, there are theoretical problems. The concept of a regular  
> grammar
> is not versatile enough to describe languages such as XML, and not
> descriptive enough to allow unambiguous parse tree production from a
> language like wikitext. It's trivial to invent irregular grammars  
> which
> can be nonetheless processed in linear time. My aims for wikitext,  
> namely
> that it be easy for humans to write but fast to convert to HTML, do  
> not
> coincide well with the taxonomy of formal grammars.
> Secondly, there are practical problems. Past projects attempting to  
> parse
> wikitext using flex/bison or similar schemes have failed to achieve  
> the
> performance of the present parser, which is surprising because I  
> didn't
> think I was setting the bar very high. You can bet that if I ever  
> rewrote
> it in C++ myself, it would be much faster. The PHP compiler  
> community is
> currently migrating away from LALR towards a regex-based parser called
> re2c, mostly for performance reasons.
> Thirdly, there is the fact that certain phases of MediaWiki's parser  
> are
> already very similar to the textbook parsers and can be analysed in  
> those
> terms. The main difference is that our parser is better optimised. For
> example, the preprocessor acts like a recursive descent parser, but  
> with a
> non-recursive frontend (using an internal stack), a caching phase,  
> and a
> parse tree expansion phase with special-case recursive to iterative
> transformations to minimise stack depth.
> Yet another post:
>> I don't believe a computer scientist would have a huge problem  
>> writing
>> a proper parser. Are any of the core developers computer scientists?
> Frankly, as an ex-physicist, I don't find the field of computer  
> science
> particularly impressive, either in terms of academic rigour or  
> practical
> applications. I think my time would be best spent working as a  
> software
> engineer for a cause that I believe in, rather than going back to
> university and studying another socially-disconnected field.
> -- Tim Starling
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

More information about the foundation-l mailing list