Re: [Foundation-l] Why is the software out of reach of the community?

13 Jan 2009


      Tim,
As a qualified software engineer, I'm inclined to agree with you on a  
number of points. I apologize for my brevity, but I am somewhat  
restricted by the use of a mobile device.
After downloading, installing and maintaining a low-traffic Mediawiki  
setup, I think the experience can be improved dramatically. It's clear  
that a heavy amount of work has been done on improving processing  
speed and providing additional functionality, but in terms of a  
polished experience it doesn't match publishing tools like phpbb or  
Wordpress. Whether this is due to versatility requirements, or a lack  
of focus on this as a design requirement, I'm not sure.
On the subject of templates, they can be very much a black art.  
Templates are seldom commented to describe their function, import  
other templates without making it clear and place dependancies on  
extensions that may not be clear. Coupled with this, there's no peer  
review on templates as there is with articles. If a template performs  
the required function it is accepted and reused regardless of how  
clear or efficient the underlying code is. While suggesting that every  
high-use template be subjected to a formal code review is a somewhat  
silly idea, I think that you have three challenges on your hands.
The first is to make template writing more accessible, through the use  
of easily digestible starter documentation leading on to more complex  
examples
The second is to encourage good use of templates, both through code  
comments and supplementary document subpages. Note that the two are  
different - one tells you how it works while the other tells you how  
to use it.
The third is to examine template parsing itself, with a view to  
revisiting the language and perhaps performing a refresh if appropriate.
Hope all this helps.
Gazimoff
Sent from my iPhone
On 13 Jan 2009, at 16:13, Tim Starling tstarling@wikimedia.org wrote:
...
Brian wrote:
...
Thank for your answers.
ParserFunctions are my specific example of how the current  
development
process is very, very broken, and out of touch with the community.
According to Jimbo's user page (his bolded): "*Any changes to the  
software
must be gradual and reversible.* We need to make sure that any  
changes
contribute positively to the community, as ultimately determined by
everybody in Wikipedia, in full consultation with the community  
consensus."
I believe that the introduction of ParserFunctions to MediaWiki was  
not done
with community consensus and has led to an extremely  fast  
devolution in
wiki syntax. Further, the usability of Wikipedia has declined at a  
rate
proportional to the adoption of parser functions.
The evolution of templates, and then ParserFunctions, was led by  
community
demand and was widely encouraged by the community. I was concerned  
about
the usability implications of ParserFunctions, but the community
demonstrated its intent to ignore any usability concerns by  
implementing
complex templates, very similar to the ones seen today, using the
parameter default mechanism alone. Resistance to this trend seemed  
very weak.
The decline of usability in the template namespace has been driven by
technically-minded editors who are proud of their ability to make  
use of
an arcane and cryptic syntax to produce ever more complex feats of  
text
processing. This is an editorial issue and I cannot accept  
responsibility
for it.
However, I am aware that I enabled this process, by implementing the  
few
simple features that they needed. I regret my role in it. That's one  
of
the reasons why I've been resisting the constant community pressure to
enable StringFunctions, which I believe will lead to compiler-like
functionality implemented in the template namespace. Instead, I've  
been
trying to steer development in the direction of a readable embedded
programming language.
If you want a wiki with infoboxes (and I suppose I do since I wrote  
one of
them in the pre-template era using an Excel VBA macro), then I  
suppose we
need some form of template feature. The problem with present-day  
parser
functions is that they are terribly ugly, excessively punctuated,  
dense to
the point of unreadability, with very limited commenting and
self-documentation.
I believe that the solution to this problem lies in borrowing concepts
from software engineering, such as variables, functions, minimally
parenthesized programming languages, libraries, objects, etc. I know  
that
many template programmers cannot program in a traditional programming
language, but I have a feeling they could if they wanted to. I  
certainly
find PHP programming much easier than template programming, after a  
few
years of familiarity with both.
I'm also aware that most (non-template) Wikipedia editors have no  
desire
to learn how to program, and do not believe that it should be  
necessary in
the course of editing articles. I think that with enough development  
time,
a suitable platform in MediaWiki could connect these two types of  
editors.
For example there could be an easy-to-use form-based template  
invocation
generator, with forms written by the same technically minded editors  
who
write arcane templates today. Citations could be inserted into  
articles by
invoking a popup box and entering text into clearly labelled form  
fields.
From another post:
...
We do not even have a parser. I am sure you know that MediaWiki  
does not
actually parse. It is 5000 lines worth of regexes, for the most part.
"Parser" is a convenient and short name for it.
I've reviewed all of the regexes, and I stand by the vast majority of
them. The PCRE regular expression module is a versatile text scanning
language, which is compiled to bytecode and executed in a VM, very  
much
like PHP. It just so happens that for most text processing tasks where
there is a choice between PHP or PCRE, PCRE is faster. In certain  
special
cases, it's possible to gain extra performance by using primitive text
scanning functions like strpos() which are implemented in C. Where  
this is
possible, I have done so. But if you want to, say, find the first  
match
from a list of strings in a single subject, searching from a given  
offset,
then the fastest way to do it in standard PHP is a regex with the /S  
modifier.
In two cases, I found the available algorithms accessible from  
standard
PHP to be inconveniently slow, so I wrote the FSS and wikidiff2  
extensions
in C and C++ respectively.
Perhaps, like so many computer science graduates, you are enamored  
with
the taxonomy of formal grammars and the parsers that go with them.  
There
are a number of problems with these traditional solutions.
Firstly, there are theoretical problems. The concept of a regular  
grammar
is not versatile enough to describe languages such as XML, and not
descriptive enough to allow unambiguous parse tree production from a
language like wikitext. It's trivial to invent irregular grammars  
which
can be nonetheless processed in linear time. My aims for wikitext,  
namely
that it be easy for humans to write but fast to convert to HTML, do  
not
coincide well with the taxonomy of formal grammars.
Secondly, there are practical problems. Past projects attempting to  
parse
wikitext using flex/bison or similar schemes have failed to achieve  
the
performance of the present parser, which is surprising because I  
didn't
think I was setting the bar very high. You can bet that if I ever  
rewrote
it in C++ myself, it would be much faster. The PHP compiler  
community is
currently migrating away from LALR towards a regex-based parser called
re2c, mostly for performance reasons.
Thirdly, there is the fact that certain phases of MediaWiki's parser  
are
already very similar to the textbook parsers and can be analysed in  
those
terms. The main difference is that our parser is better optimised. For
example, the preprocessor acts like a recursive descent parser, but  
with a
non-recursive frontend (using an internal stack), a caching phase,  
and a
parse tree expansion phase with special-case recursive to iterative
transformations to minimise stack depth.
Yet another post:
...
I don't believe a computer scientist would have a huge problem  
writing
a proper parser. Are any of the core developers computer scientists?
Frankly, as an ex-physicist, I don't find the field of computer  
science
particularly impressive, either in terms of academic rigour or  
practical
applications. I think my time would be best spent working as a  
software
engineer for a cause that I believe in, rather than going back to
university and studying another socially-disconnected field.
-- Tim Starling

foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Foundation-l] Why is the software out of reach of the community?