I created a yahoo group for people interested in continuing the discussion on "Community vs. centralized development" as well as up to date wiki backups. Please join if you want to help to keep the Wikimedia foundation part of the community or just like chatting about it! Here is the group link:
"In this group topics are discussed related to the Wikimedia Foundation's relationship to the community of volunteer developers and users, as well as distribution of wiki backups and image backups. Two main goals of the group are to ensure that Wikimedia foundation development is community centered and also to have up to date full history Wikimedia Foundation wiki's and wiki images freely available for download."
For various reasons, I'm now using the Opera 10.10 browser on Linux.
With the new vector skin, trying to open an edit form takes 4 or 5
seconds. This is not because the servers are slow, the whole is quite
fast in Firefox, but in Opera time elapses while executing the
rearranges the left menu. During this time, I can sometimes set
the cursor in the textarea, but two seconds later focus is gone,
and I have to set the cursor again before I can start to edit.
It was not this slow under the old monobook skin.
I know I can go back to monobook, so don't tell me. I just wanted
to report this. Perhaps there's some part of vector that can be
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
Over the last couple of years, MediaWiki development has moved from
being almost entirely volunteer-based to having a large contingent of
paid developers. A lot of people have noted that this has led to a
lot of work being done without much community involvement. Just for a
basic statistic, in July, I estimate that about 90% of
non-localization commits to extensions/UsabilityInitiative/ were by
paid employees. (I use "employee" loosely in this post, to include
all paid staff, such as contractors.) By contrast, about 25%
(ballpark figure) of non-localization commits to phase3/ were by paid
employees, and the number of volunteer commits to phase3/ was much
higher than the total number of commits to UsabilityInitiative, so
this isn't just a matter of community members not doing as much work
I've commented on this a few times before, but never at length. I
think there's widespread confusion about what the problem even is,
never mind how to solve it, so I'm writing this to set out at least my
own views on the topic. Since my shorter remarks in other places
tended to be misunderstood, I'll start at the beginning and go into
considerable detail, which means this post will probably end up pretty
long. I should say in advance that I'm discussing institutional
problems here, not anything specific to individuals or projects, and
no one should feel slighted if I pick them as an example. If you
aren't really interested, start skimming. ;)
Let me begin with definitions. I will draw a basic distinction
between community development and centralized development. I'll start
with two motivating examples.
Firefox is developed by a community. Everything involved in the
project and its development is open. Most of the work is done by
employees of Mozilla, and all important decisions are made by
employees of Mozilla, but anyone on the Internet can view what's
happening and get involved. Bugs you open might get ignored forever,
and you might have to poke people a bunch to get patches reviewed, and
you might have to tolerate a considerable amount of bluntness and
follow other people's marching orders if you want to contribute
anything. But in principle, any random person in the world can make
largely the same contributions as a Mozilla employee.
Internet Explorer is developed by a centralized team. They have blogs
where they sometimes share detailed info about their development
process and reasoning. They very carefully read all user feedback
left in the comments. They have a bug tracker where anyone can file
bugs, and they guarantee that they'll look at and attempt to reproduce
every single bug filed in a timely fashion. But although they pay
close attention to feedback, giving feedback is the only way you can
really participate without getting hired by Microsoft. You can't
write any code, or have a voice in discussions at all comparable to an
IE team member.
These examples illustrate some important things:
* Community development does not mean democracy. Even in a totally
community-oriented project, all decisions might ultimately be made by
a small group of individuals. (For instance, in the case of the Linux
kernel, one person.)
* Community development does not mean community members do most of the
work. From what I've heard, employees of Mozilla write most of
Firefox's code, but it's still completely community-oriented
* Listening to feedback is not the same as actually involving the
community. Even a totally closed project can be extremely attentive
to feedback. In fact, it's common for community projects to be *less*
receptive to feedback, taking a "we'll listen to you when you write
the code" attitude.
Keeping these in mind, I'll characterize a perfectly community-based
development process like this: your say in the project is proportional
to your contributions, and nothing prevents you from contributing as
much as your time and ability allow. If you happen to be paid, it
doesn't give you any additional say -- you just happen to be able to
spend more time contributing. The decision-making process is open and
transparent, and arguments are weighed on the basis of their merits
and the speaker's history of contributions. This is of course not
fully attainable in practice, but one can see how close or far a
project is from the ideal.
Centralized and community development processes both have advantages
and disadvantages. Some of the advantages of centralized development
(as relevant to open-source projects) are:
* Paid employees don't have to spend time reviewing code from a lot of
people who will only ever contribute a few patches, so they don't
duplicate effort teaching everyone their project's coding conventions,
or even educating them on basic things like XSS.
* Because discussion can be private and everyone is more likely to be
in similar time zones, it's possible to rely heavily on face-to-face
or voice communication, which a lot of people are more comfortable
with and which is a lot more efficient.
* Since there are many fewer developers, they can socialize and get to
know each other, reducing conflict and argument.
* Full-time developers don't have to try coordinating with volunteers
who may only be available at odd times or who may disappear randomly
In short, centralized development allows employees' time to be spent
more on actual coding, and less on communication. It's (at least
superficially) more efficient. On the other hand, community
development has advantages as well:
* You get work done for free. If it's easier to volunteers to make a
meaningful difference, you'll get many more volunteers. Once they're
up to speed, you don't have to watch over them much more than you
would an employee, but you get their work for free.
* You can hire community developers. You already know how good they
are and they don't need to be brought up to speed with your codebase,
saving you a lot of money and trouble compared to advertising for
* Your software becomes more versatile, because volunteers will work
on aspects that interest them even if they aren't in the interest of
the controlling organization. This gets you more users and more
Although there are superficial efficiency advantages to centralizing
development, experience indicates that community-based development can
be much more cost-effective in practice. Projects like Mozilla and
Apache (and for that matter Wikimedia until recently) make software
that's very competitive with centrally-developed competitors at a
fraction of the cost.
On top of that, of course, the idea of centralized development is
contrary to Wikimedia's ideals. Just as the Board is trying to pursue
individual donations over corporate sponsorship, it fits with
Wikimedia's goals and structure to have as community-oriented a
structure as possible. Projects like Mozilla make it clear that this
is attainable and productive.
Returning to the concept of community development, let's look at two
key things: actual coding, and decision-making. In community-based
development, anyone who's willing to write good code can get it
submitted and included into the product. Someone with a greater
history of contributions will be able to get their code included more
easily, but only because the development community is willing to trust
them more. They get by with less review, and the review is more
readily given because of a greater expectation that it will be
productive. Similarly, when it comes to decision-making, anyone has
an equal opportunity to try convincing the decision-makers (who might
be only one or a few people) of their point of view. In the end, the
decision is made by appointed decision-makers, but with great
deference toward the opinions of other established contributors.
>From my perspective as a volunteer developer since 2006
(notwithstanding a few hours of contracting just now), Wikimedia has
been failing badly on both of these issues for months, at least.
There's a giant code review backlog, so very little code of the last
several months gets synced -- except code by employees. Some
employees apparently have shell access for the sole purpose of syncing
their own code without going through the normal review process. No
volunteer has been given such access, to my knowledge -- indeed, AFAIK
it's been years since any non-employee has been given shell access at
all. This is a bright line that deprives volunteers of any semblance
of parity with staff.
Communication is a serious problem as well. I can't pin this one down
so well, because I simply have no idea how employees are
communicating, but I can observe that there's a ton of code being
written with no discussion on #mediawiki or wikitech-l or any other
MediaWiki development forum I know of. There are a lot of paid
developers who I've never seen in either #mediawiki or wikitech-l. I
infer that they must be communicating somehow, unless they all have a
policy of committing code without speaking to anyone about it.
A lot of employees are in the same office, so I guess there's
face-to-face communication going on. There's a secret staff IRC
channel, and a staff-only mailing list or list alias or something
(which I know about because a staff member complained about it in the
secret staff IRC channel), and I think I've heard rumor of
teleconferences. There are have also been various nominally public
fora that only particular groups of employees use much in practice,
like the Usability wiki and IRC channel (the latter now kind of
discontinued but not really). I don't know, but it doesn't matter in
the end. What it amounts to is that volunteers are often completely
cut out of planning and design.
That's what leads to things like
people said that maybe that could have been phrased better, or
something. But the revert wasn't the problem, it was a symptom of the
problem. The problem was that the design was decided on somewhere
that volunteers couldn't or wouldn't participate. Of course you
revert something that contradicts an agreed-upon design -- the problem
is that the agreed-upon design was only agreed upon by a small group
of employees. How are volunteers supposed to contribute in that
environment, if they don't know what tune they're supposed to be
The interlanguage link issue perfectly highlights the problem of
mentality we have right now. (I'm not picking on the Usability
Initiative particularly here, by the way -- it just provides the most
ready examples because it's the largest.) You just need to look at
this e-mail: <http://lists.wikimedia.org/pipermail/foundation-l/2010-June/058936.html>
It begins "The Usability team discussed this issue at length this
afternoon." The Usability Team is a separate body from the community,
which holds its discussions separately. "We listened closely to the
feedback and have come up with solution which we hope will work for
everyone." Listening to feedback, not discussing the merits of the
issue with peers.
What should have happened in that case is that each individual
Usability Team member who saw the complaint should have posted their
own individual, unrehearsed thoughts as an individual. What actually
happened was a quintessentially centralized response: secret internal
discussion followed by an official position statement. That is not
the way that you treat peers. It's how you treat customers or
I've seen this mentality again and again over the last year or two.
One time I was discussing a design issue with a Wikimedia employee in
#mediawiki, and after a brief discussion, he said (paraphrased)
"Sorry, I need to get back to work." Apparently it's only "work" when
you're talking to other employees.
a clear line at every step between employees and developers. This is
not the way to attract or keep a healthy volunteer development
The solution is not to increase communication between staff and
volunteers. It's to make the distinction as irrelevant as possible to
actual development. They're all developers, and some happen to get
paid. Specific changes I would propose include:
* Consider what to do about code review. This is pretty much the
hardest problem on this list, which is why I don't propose a specific
solution here, but there has to be a better solution than "assume a
bunch of employees are trusted enough to sync their own code, force
everyone else to wait months for central review".
* Stop concentrating tech employees in San Francisco. Either have
most of them work from home, or perhaps establish other small offices
so that they're split up. The point is, make them rely on
telecommunication, because if you put people in the same office
they'll talk a lot face-to-face, and volunteers simply cannot
participate. The purpose of putting people together in an office is
so that they work together as a team, and this is exactly what you do
*not* want, because volunteers cannot be part of that team. This is
the second-hardest problem, or maybe the hardest, and I can't give a
full solution for it either. I'd suggest checking with Mozilla about
how they do it, because I know they do have offices, but they're a
perfect example of community-oriented development.
* Explicitly encourage all paid developers to do everything in public
and to treat volunteer developers as they would paid ones. I'm not
saying this should be enforced in any particular manner, but it should
be clearly stated so that everyone knows how things are intended to
* Shut down the secret staff IRC channel. Development discussion can
take place in #mediawiki, ops in #wikimedia-tech, other stuff in
#wikimedia or whatever. If users interfere with ops' discussions
sometimes in #wikimedia-tech during outages or such, set all sysadmins
+v and set the channel +m as necessary. That's worked in the past.
* Shut down #wikimedia-dev (formerly #wikipedia_usability, kind of).
The explicit purpose of the channel is to allow development discussion
with less noise, but "noise" here means community involvement. In
community development, you do get a lot more discussion, but that's
not something you should try avoiding. In general, use existing
discussion fora wherever possible, and if you do fragment them, make
sure you don't have too much of a staff-volunteer split in which fora
* Don't conduct teleconferences about development, ever. Even if
volunteers are invited (are they?), time zones and non-MediaWiki
obligations make all synchronous communication much harder for
volunteers to participate in. Rely primarily on mailing lists, and
secondarily on publicly-logged IRC channels (where at least it's easy
to read backscroll).
* Stop using private e-mail for development, at least to any
significant extent. If there are any internal development mailing
lists or aliases or whatever used for development, retire them.
I don't know how seriously these suggestions will be taken in practice
by the powers that be, but I hope I've made a detailed and cogent
enough case to make at least some impact.
Tim Starling wrote:
> As for fundraising, the work is uninspiring, and I don't think we've
> ever managed to get volunteers interested in it regardless of how open
> we've been.
I must take exception to that because I did a lot of work last year on
several aspects of fundraising, including button design, some of which
(e.g. the proposed button with Jimbo's face on it) wasn't even A/B
tested even after the A/B test harness had been developed. I was never
told why there was no A/B test of that button. It seems like I had to
ask over and over before anyone even did any A/B tests in the first
place. Frankly, my efforts to help with fundraising are more
inspiring than a lot of the other things I try to do to help, but
inspiration is generally orthogonal to frustration. However, I know
one of my responsibilities as a volunteer to keep asking until things
get done. Furthermore, how do you expect effective help with
fundraising when the fundraising mailing list and archives are closed?
Danese Cooper wrote:
> 1. Eliminate single points of failure / bottlenecks....
I am glad that is the top priority, because there are clearly failures
and bottlenecks in external code review, production of image bundle
dumps, auctioning search failover links to wealthy search engine
donors, steps to make Wikinews an independent, funded, and respected
bona fide news organization, general bugzilla queue software
About eight months ago I was told that fundraising this year will
allow donors to pick an optional earmark for their funds. Is that
still the plan?
Donors should be allowed to optionally mark their donations for
projects including (1) the review of externally submitted code, (2)
the production of image bundles along with the dumps, (3) auctioning
the order of appearance of several search failover gadget links to
external search engines (such as users were able to use before they
were rendered unusable by the usability project) to wealthy search
engine donors, (4) a way to pay people who work on the bugzilla queue
(e.g. through http://odesk.com or the like) without having to set up
lengthy contracts, and (5) a way to pay for Wikinews journalism
awards, travel expenses, reporters, fact checkers, photographers,
camera and recording equipment, and proofreaders, etc.
Are there any reasons not to allow donors to earmark categories? I am
not saying that those are the only earmarks which should be offered,
but I am certain that at least those five should be included.
What are other problems which might be solved by donor earmarks?
There are ten rejected GSoC projects which I feel strongly about
because they were scored positively by the mentors but rejected
because of the number of slots requested. Could those be funded by
I was wondering if there are any plans to provide incremental dumps (ie.
the diff between each dump and the previous one) at
download.wikimedia.org. It seems to me that such diffs would help save
bandwidth because mirrors could stay up to date by downloading the diffs
and applying them rather than downloading the whole dump each time.
I am new here, so I hope that this mailing-list is the correct place for
this kind of suggestions. If I am wrong, please tell me.
On Fri, Sep 3, 2010 at 12:19 PM, Aryeh Gregor
> On Fri, Sep 3, 2010 at 2:44 PM, Chad <innocentkiller(a)gmail.com> wrote:
>> Given those criteria, I think that the following have "full support"
>> in MediaWiki:
>> * MySQL
>> * SQLite
>> * PostgreSQL
> In practice, though, SQLite and PostgreSQL are more likely to break
> than MySQL, right? If so, we should make this clear in the installer
> UI. Or are they really about as well-supported as MySQL these days,
> minus a moderate lag in schema updates for pgsql?
> Ideally, we could run test suites by default on all available DBs
> instead of just on the one the wiki currently uses. In particular,
> SQLite currently uses the same schema as MySQL and is available in PHP
> by default, plus it doesn't require any setup (providing admin login,
> etc.), so it would be great if we could run SQLite tests right now
> whenever people run tests. It would be great if people could set up
> pgsql to automatically run too, but that would require manual setup.
> This is the kind of thing automated tests are really helpful for.
> (But that's kind of tangential.)
Out of curiosity - what regression test suites are in use to QA MW builds?
-george william herbert
I was working on this in branches/phpunit-restructure
I've merged it into head as of
Essentially, it's just organizing our PHPUnit tests the way the rest of
the world does, as a mirrored directory structure to the units under
test. Also, we need to avoid like the plaugue making too much crazy
custom phpunit invokation scripts, as was done previously, because it
locks our tests down to a really specific version of phpunit, which is
thankfully under constant development.
In short, use things as they are meant to be used... We probably
reinvent too many wheels around here.
People have been wondering a lot about our database drivers recently.
Two days ago I was asked specifically which ones we support. I think
the subject needs clarifying, especially in light of the new installer and
associated update refactoring. First and foremost, people need to
remember that MediaWiki is written with MySQL in mind. It's the
primary target, and can always be expected to work. That being said,
I think that supporting other DBMSes is great and I'm glad we do.
Earlier today on IRC, I outlined what I consider to be the DBMSes we
support and a rough criteria of why I think so. "Full support" means
that the schema and DatabaseBase subclass should be fully
functional. Patches should be written when schema changes occur
so people can stay up to date. "Partial support" means that there is a
functional DatabaseBase subclass and working schema. There may
be some edge cases it doesn't support. Updaters probably aren't
written. "Experimental" is anything less. Typically a half-implemented
DatabaseBase subclass exists.
Given those criteria, I think that the following have "full support"
* Oracle (works, but lacks updates)
When the new installer ships (hopefully) in 1.17, it will contain
support for MySQL, SQLite and PostgreSQL. I'm in favor of
adding Oracle in as well, as long as it's clearly labelled as
still a work in progress.
As far as the "experimental" group go, I see no harm in leaving
them in SVN. They are still mostly in development (some more
active than others) and keeping the various subclasses around
won't hurt anything. Once support gets a little more solid, then
we can look to adding them to the installer (once they're done,
it should just be a 1-line addition to Installer::$dbTypes, plus
some extra i18n)
Ryan Kaldari wrote:
> ... [we're] in the process of hooking up Open Web Analytics
It's great to see that donor logs are going in to a database instead
of just a text file, but multiple regression in SQL is absurdly
difficult because of the limitations of SQL, so I still recommend R,
in particular: http://cran.r-project.org/web/packages/RMySQL/RMySQL.pdf
I will ask Arthur Richards for data coding formats.
I predict that multiple response checkboxes will do better than the
more constraining radio buttons, but there is no reason that they
should not be measured as any other independent variable. It is
probably a lot more important to measure the number of earmarks
offered: 0-26. There is plenty of reason to believe that showing 26
options will have a slight advantage over 25, but I can't see the test
results from the Red Cross (they measure the things which increase
donations of blood much more carefully than money, at least in their
publications that I've been able to find.) Don't forget the control
case where no donor selections are offered. Optimization requires
measurement, and it is easy to measure offering a lot of options up
Do you think that variations on the disclaimer should also be tried?
I think there is reason to believe something terse might result in
more donations, e.g.: "These options are advisory only." and/or "The
Wikimedia Foundation reserves the right to override donor selections,
cancel any project, and use any funds for any purpose." and/or "All
donations are discretionary, these options are offered for polling
purposes only." or some combination. What does Mike Godwin think a
good set of disclaimers to test might be?
I conflated the proposed stimulus list down to 25 non-default items
and enumerated them with letters of the alphabet so that everyone
would understand that it is feasible to test additional proposals as
well. I have not yet surveyed the Village Pumps or mailing lists for
additional stimulatory ideas but I hope people who have or who see
anything missing will suggest at least five more. Translations would
be great, too.
(default) Use my donation where the need is greatest.
A. Auction the order of search failover links to search engine companies.
B. Broaden demographics of active editors.
C. Compensate people who submit improvements to the extent that they
are necessary and sufficient.
D. Display most popular related articles.
E. Enhance automation of project tasks.
F. Enhance site performance in underserved geographic regions.
G. Enhance visualizations of projects and their editing activity.
H. Establish journalism awards, expense accounts and compensation for
independent Wikinews reporters, fact checkers, photographers and
I. Establish secure off-site backup copies.
J. Establish simple Wikipedias for beginning readers in languages
other than English.
K. Improve math formula rendering.
L. Increase the number of active editors.
M. Increase the number of articles, images, and files.
N. Increase the number of unique readers.
O. Make it easier for people to add recorded audio pronunciations.
P. Obtain expert article assessments.
Q. Obtain reader quality assessments.
R. Perform external code reviews.
S. Perform independent usability testing.
T. Produce regular snapshots and archives.
U. Retain more active editors.
V. Strengthen Wikimedia Foundation financial stability.
W. Support a thriving research community.
X. Support an easier format to write quiz questions.
Y. Support more reliable server uptime.
Z. Support offline editing.
> What do you mean by "opening"?
> enwiki pages-meta-history is hard due to its size, not because
> Ariel or
> Tomasz being more stupid than any volunteer.
> I trust them to do it at least as well as a volunteer would.
> Of course, if you can perform better I'm all for giving you a
> shell to
> fix it, and the scripts are there for improvements as well.
I wasn't aware that the dump scripts were publicly available, where can they be downloaded from or are they part of mediawiki?
> What do you need exactly about the images? Which image dumps do you
> want? Do you have enough terabytes to store them?
> Dumps/Access has been given by request in the past to that data.
> If it's not there it's because:
> a) Those dumps would take a lot of space.
I don't think that is a valid reason, thumbnail dumps of all the images from enwiki would probably be a smaller file than the current enwiki pages-meta-history bz2 file.
> b) Nobody feels particulary interested in them.
I disagree, there has been a lot of interest in having image dumps available for download. There was a discussion on this recently on the xmldatadumps list, that basically concluded that subsets of images (ie. enwiki thumbnails) would be useful. There are wiki pages dedicated to this topic of how to download images, this is because there are no image dumps available. Is the wikimedia foundation interested to host image dumps again? If they are maybe we can start a discussion on how to make the script and what image dumps to start with.