Wikitech-l September 2010

wikitech-l@lists.wikimedia.org

102 participants
93 discussions

Community vs. centralized development
by Jamie Morken 09 Sep '10

09 Sep '10

Hi, I created a yahoo group for people interested in continuing the discussion on "Community vs. centralized development" as well as up to date wiki backups. Please join if you want to help to keep the Wikimedia foundation part of the community or just like chatting about it! Here is the group link: http://tech.groups.yahoo.com/group/wikishare/ "In this group topics are discussed related to the Wikimedia Foundation's relationship to the community of volunteer developers and users, as well as distribution of wiki backups and image backups. Two main goals of the group are to ensure that Wikimedia foundation development is community centered and also to have up to date full history Wikimedia Foundation wiki's and wiki images freely available for download." cheers, Jamie

8 9

Page display slowness
by Lars Aronsson 09 Sep '10

09 Sep '10

For various reasons, I'm now using the Opera 10.10 browser on Linux. With the new vector skin, trying to open an edit form takes 4 or 5 seconds. This is not because the servers are slow, the whole is quite fast in Firefox, but in Opera time elapses while executing the JavaScript that adds the toolbar above the main textarea and rearranges the left menu. During this time, I can sometimes set the cursor in the textarea, but two seconds later focus is gone, and I have to set the cursor again before I can start to edit. It was not this slow under the old monobook skin. I know I can go back to monobook, so don't tell me. I just wanted to report this. Perhaps there's some part of vector that can be improved. -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

2 1

Community vs. centralized development
by Aryeh Gregor 08 Sep '10

08 Sep '10

Over the last couple of years, MediaWiki development has moved from being almost entirely volunteer-based to having a large contingent of paid developers. A lot of people have noted that this has led to a lot of work being done without much community involvement. Just for a basic statistic, in July, I estimate that about 90% of non-localization commits to extensions/UsabilityInitiative/ were by paid employees. (I use "employee" loosely in this post, to include all paid staff, such as contractors.) By contrast, about 25% (ballpark figure) of non-localization commits to phase3/ were by paid employees, and the number of volunteer commits to phase3/ was much higher than the total number of commits to UsabilityInitiative, so this isn't just a matter of community members not doing as much work overall. I've commented on this a few times before, but never at length. I think there's widespread confusion about what the problem even is, never mind how to solve it, so I'm writing this to set out at least my own views on the topic. Since my shorter remarks in other places tended to be misunderstood, I'll start at the beginning and go into considerable detail, which means this post will probably end up pretty long. I should say in advance that I'm discussing institutional problems here, not anything specific to individuals or projects, and no one should feel slighted if I pick them as an example. If you aren't really interested, start skimming. ;) Let me begin with definitions. I will draw a basic distinction between community development and centralized development. I'll start with two motivating examples. Firefox is developed by a community. Everything involved in the project and its development is open. Most of the work is done by employees of Mozilla, and all important decisions are made by employees of Mozilla, but anyone on the Internet can view what's happening and get involved. Bugs you open might get ignored forever, and you might have to poke people a bunch to get patches reviewed, and you might have to tolerate a considerable amount of bluntness and follow other people's marching orders if you want to contribute anything. But in principle, any random person in the world can make largely the same contributions as a Mozilla employee. Internet Explorer is developed by a centralized team. They have blogs where they sometimes share detailed info about their development process and reasoning. They very carefully read all user feedback left in the comments. They have a bug tracker where anyone can file bugs, and they guarantee that they'll look at and attempt to reproduce every single bug filed in a timely fashion. But although they pay close attention to feedback, giving feedback is the only way you can really participate without getting hired by Microsoft. You can't write any code, or have a voice in discussions at all comparable to an IE team member. These examples illustrate some important things: * Community development does not mean democracy. Even in a totally community-oriented project, all decisions might ultimately be made by a small group of individuals. (For instance, in the case of the Linux kernel, one person.) * Community development does not mean community members do most of the work. From what I've heard, employees of Mozilla write most of Firefox's code, but it's still completely community-oriented development. * Listening to feedback is not the same as actually involving the community. Even a totally closed project can be extremely attentive to feedback. In fact, it's common for community projects to be *less* receptive to feedback, taking a "we'll listen to you when you write the code" attitude. Keeping these in mind, I'll characterize a perfectly community-based development process like this: your say in the project is proportional to your contributions, and nothing prevents you from contributing as much as your time and ability allow. If you happen to be paid, it doesn't give you any additional say -- you just happen to be able to spend more time contributing. The decision-making process is open and transparent, and arguments are weighed on the basis of their merits and the speaker's history of contributions. This is of course not fully attainable in practice, but one can see how close or far a project is from the ideal. Centralized and community development processes both have advantages and disadvantages. Some of the advantages of centralized development (as relevant to open-source projects) are: * Paid employees don't have to spend time reviewing code from a lot of people who will only ever contribute a few patches, so they don't duplicate effort teaching everyone their project's coding conventions, or even educating them on basic things like XSS. * Because discussion can be private and everyone is more likely to be in similar time zones, it's possible to rely heavily on face-to-face or voice communication, which a lot of people are more comfortable with and which is a lot more efficient. * Since there are many fewer developers, they can socialize and get to know each other, reducing conflict and argument. * Full-time developers don't have to try coordinating with volunteers who may only be available at odd times or who may disappear randomly for weeks. In short, centralized development allows employees' time to be spent more on actual coding, and less on communication. It's (at least superficially) more efficient. On the other hand, community development has advantages as well: * You get work done for free. If it's easier to volunteers to make a meaningful difference, you'll get many more volunteers. Once they're up to speed, you don't have to watch over them much more than you would an employee, but you get their work for free. * You can hire community developers. You already know how good they are and they don't need to be brought up to speed with your codebase, saving you a lot of money and trouble compared to advertising for applicants. * Your software becomes more versatile, because volunteers will work on aspects that interest them even if they aren't in the interest of the controlling organization. This gets you more users and more developers. Although there are superficial efficiency advantages to centralizing development, experience indicates that community-based development can be much more cost-effective in practice. Projects like Mozilla and Apache (and for that matter Wikimedia until recently) make software that's very competitive with centrally-developed competitors at a fraction of the cost. On top of that, of course, the idea of centralized development is contrary to Wikimedia's ideals. Just as the Board is trying to pursue individual donations over corporate sponsorship, it fits with Wikimedia's goals and structure to have as community-oriented a structure as possible. Projects like Mozilla make it clear that this is attainable and productive. Returning to the concept of community development, let's look at two key things: actual coding, and decision-making. In community-based development, anyone who's willing to write good code can get it submitted and included into the product. Someone with a greater history of contributions will be able to get their code included more easily, but only because the development community is willing to trust them more. They get by with less review, and the review is more readily given because of a greater expectation that it will be productive. Similarly, when it comes to decision-making, anyone has an equal opportunity to try convincing the decision-makers (who might be only one or a few people) of their point of view. In the end, the decision is made by appointed decision-makers, but with great deference toward the opinions of other established contributors. >From my perspective as a volunteer developer since 2006 (notwithstanding a few hours of contracting just now), Wikimedia has been failing badly on both of these issues for months, at least. There's a giant code review backlog, so very little code of the last several months gets synced -- except code by employees. Some employees apparently have shell access for the sole purpose of syncing their own code without going through the normal review process. No volunteer has been given such access, to my knowledge -- indeed, AFAIK it's been years since any non-employee has been given shell access at all. This is a bright line that deprives volunteers of any semblance of parity with staff. Communication is a serious problem as well. I can't pin this one down so well, because I simply have no idea how employees are communicating, but I can observe that there's a ton of code being written with no discussion on #mediawiki or wikitech-l or any other MediaWiki development forum I know of. There are a lot of paid developers who I've never seen in either #mediawiki or wikitech-l. I infer that they must be communicating somehow, unless they all have a policy of committing code without speaking to anyone about it. A lot of employees are in the same office, so I guess there's face-to-face communication going on. There's a secret staff IRC channel, and a staff-only mailing list or list alias or something (which I know about because a staff member complained about it in the secret staff IRC channel), and I think I've heard rumor of teleconferences. There are have also been various nominally public fora that only particular groups of employees use much in practice, like the Usability wiki and IRC channel (the latter now kind of discontinued but not really). I don't know, but it doesn't matter in the end. What it amounts to is that volunteers are often completely cut out of planning and design. That's what leads to things like <http://www.mediawiki.org/wiki/Special:Code/MediaWiki/67299>. Some people said that maybe that could have been phrased better, or something. But the revert wasn't the problem, it was a symptom of the problem. The problem was that the design was decided on somewhere that volunteers couldn't or wouldn't participate. Of course you revert something that contradicts an agreed-upon design -- the problem is that the agreed-upon design was only agreed upon by a small group of employees. How are volunteers supposed to contribute in that environment, if they don't know what tune they're supposed to be dancing to? The interlanguage link issue perfectly highlights the problem of mentality we have right now. (I'm not picking on the Usability Initiative particularly here, by the way -- it just provides the most ready examples because it's the largest.) You just need to look at this e-mail: <http://lists.wikimedia.org/pipermail/foundation-l/2010-June/058936.html> It begins "The Usability team discussed this issue at length this afternoon." The Usability Team is a separate body from the community, which holds its discussions separately. "We listened closely to the feedback and have come up with solution which we hope will work for everyone." Listening to feedback, not discussing the merits of the issue with peers. What should have happened in that case is that each individual Usability Team member who saw the complaint should have posted their own individual, unrehearsed thoughts as an individual. What actually happened was a quintessentially centralized response: secret internal discussion followed by an official position statement. That is not the way that you treat peers. It's how you treat customers or clients. I've seen this mentality again and again over the last year or two. One time I was discussing a design issue with a Wikimedia employee in #mediawiki, and after a brief discussion, he said (paraphrased) "Sorry, I need to get back to work." Apparently it's only "work" when you're talking to other employees. <http://www.mediawiki.org/wiki/Development_process_improvement> draws a clear line at every step between employees and developers. This is not the way to attract or keep a healthy volunteer development community. The solution is not to increase communication between staff and volunteers. It's to make the distinction as irrelevant as possible to actual development. They're all developers, and some happen to get paid. Specific changes I would propose include: * Consider what to do about code review. This is pretty much the hardest problem on this list, which is why I don't propose a specific solution here, but there has to be a better solution than "assume a bunch of employees are trusted enough to sync their own code, force everyone else to wait months for central review". * Stop concentrating tech employees in San Francisco. Either have most of them work from home, or perhaps establish other small offices so that they're split up. The point is, make them rely on telecommunication, because if you put people in the same office they'll talk a lot face-to-face, and volunteers simply cannot participate. The purpose of putting people together in an office is so that they work together as a team, and this is exactly what you do *not* want, because volunteers cannot be part of that team. This is the second-hardest problem, or maybe the hardest, and I can't give a full solution for it either. I'd suggest checking with Mozilla about how they do it, because I know they do have offices, but they're a perfect example of community-oriented development. * Explicitly encourage all paid developers to do everything in public and to treat volunteer developers as they would paid ones. I'm not saying this should be enforced in any particular manner, but it should be clearly stated so that everyone knows how things are intended to be. * Shut down the secret staff IRC channel. Development discussion can take place in #mediawiki, ops in #wikimedia-tech, other stuff in #wikimedia or whatever. If users interfere with ops' discussions sometimes in #wikimedia-tech during outages or such, set all sysadmins +v and set the channel +m as necessary. That's worked in the past. * Shut down #wikimedia-dev (formerly #wikipedia_usability, kind of). The explicit purpose of the channel is to allow development discussion with less noise, but "noise" here means community involvement. In community development, you do get a lot more discussion, but that's not something you should try avoiding. In general, use existing discussion fora wherever possible, and if you do fragment them, make sure you don't have too much of a staff-volunteer split in which fora people use. * Don't conduct teleconferences about development, ever. Even if volunteers are invited (are they?), time zones and non-MediaWiki obligations make all synchronous communication much harder for volunteers to participate in. Rely primarily on mailing lists, and secondarily on publicly-logged IRC channels (where at least it's easy to read backscroll). * Stop using private e-mail for development, at least to any significant extent. If there are any internal development mailing lists or aliases or whatever used for development, retire them. I don't know how seriously these suggestions will be taken in practice by the powers that be, but I hope I've made a detailed and cogent enough case to make at least some impact.

22 40

fundraising earmarks for code review, image bundle dumps, search failover auction, Wikinews independence, bugzilla queue maintenance etc.
by James Salsman 07 Sep '10

07 Sep '10

Tim Starling wrote: > > As for fundraising, the work is uninspiring, and I don't think we've > ever managed to get volunteers interested in it regardless of how open > we've been. I must take exception to that because I did a lot of work last year on several aspects of fundraising, including button design, some of which (e.g. the proposed button with Jimbo's face on it) wasn't even A/B tested even after the A/B test harness had been developed. I was never told why there was no A/B test of that button. It seems like I had to ask over and over before anyone even did any A/B tests in the first place. Frankly, my efforts to help with fundraising are more inspiring than a lot of the other things I try to do to help, but inspiration is generally orthogonal to frustration. However, I know one of my responsibilities as a volunteer to keep asking until things get done. Furthermore, how do you expect effective help with fundraising when the fundraising mailing list and archives are closed? Danese Cooper wrote: > > 1. Eliminate single points of failure / bottlenecks.... I am glad that is the top priority, because there are clearly failures and bottlenecks in external code review, production of image bundle dumps, auctioning search failover links to wealthy search engine donors, steps to make Wikinews an independent, funded, and respected bona fide news organization, general bugzilla queue software maintenance, etc. About eight months ago I was told that fundraising this year will allow donors to pick an optional earmark for their funds. Is that still the plan? Donors should be allowed to optionally mark their donations for projects including (1) the review of externally submitted code, (2) the production of image bundles along with the dumps, (3) auctioning the order of appearance of several search failover gadget links to external search engines (such as users were able to use before they were rendered unusable by the usability project) to wealthy search engine donors, (4) a way to pay people who work on the bugzilla queue (e.g. through http://odesk.com or the like) without having to set up lengthy contracts, and (5) a way to pay for Wikinews journalism awards, travel expenses, reporters, fact checkers, photographers, camera and recording equipment, and proofreaders, etc. Are there any reasons not to allow donors to earmark categories? I am not saying that those are the only earmarks which should be offered, but I am certain that at least those five should be included. What are other problems which might be solved by donor earmarks? There are ten rejected GSoC projects which I feel strongly about because they were scored positively by the mentors but rejected because of the number of slots requested. Could those be funded by donor earmarks? Regards, James Salsman

9 16

Incremental dumps on download
by Antoine Amarilli 07 Sep '10

07 Sep '10

Hi all, I was wondering if there are any plans to provide incremental dumps (ie. the diff between each dump and the previous one) at download.wikimedia.org. It seems to me that such diffs would help save bandwidth because mirrors could stay up to date by downloading the diffs and applying them rather than downloading the whole dump each time. I am new here, so I hope that this mailing-list is the correct place for this kind of suggestions. If I am wrong, please tell me. Regards, -- Antoine Amarilli

2 1

Regression suite? Was: Re: Database drivers
by George Herbert 07 Sep '10

07 Sep '10

On Fri, Sep 3, 2010 at 12:19 PM, Aryeh Gregor <Simetrical+wikilist(a)gmail.com> wrote: > On Fri, Sep 3, 2010 at 2:44 PM, Chad <innocentkiller(a)gmail.com> wrote: >> Given those criteria, I think that the following have "full support" >> in MediaWiki: >> * MySQL >> * SQLite >> * PostgreSQL > > In practice, though, SQLite and PostgreSQL are more likely to break > than MySQL, right? If so, we should make this clear in the installer > UI. Or are they really about as well-supported as MySQL these days, > minus a moderate lag in schema updates for pgsql? > > Ideally, we could run test suites by default on all available DBs > instead of just on the one the wiki currently uses. In particular, > SQLite currently uses the same schema as MySQL and is available in PHP > by default, plus it doesn't require any setup (providing admin login, > etc.), so it would be great if we could run SQLite tests right now > whenever people run tests. It would be great if people could set up > pgsql to automatically run too, but that would require manual setup. > This is the kind of thing automated tests are really helpful for. > (But that's kind of tangential.) Out of curiosity - what regression test suites are in use to QA MW builds? Thanks... -- -george william herbert george.herbert(a)gmail.com

2 1

PHPUnit Reorg
by Trevor Parscal 07 Sep '10

07 Sep '10

I was working on this in branches/phpunit-restructure I've merged it into head as of http://www.mediawiki.org/wiki/Special:Code/MediaWiki/72566 Essentially, it's just organizing our PHPUnit tests the way the rest of the world does, as a mirrored directory structure to the units under test. Also, we need to avoid like the plaugue making too much crazy custom phpunit invokation scripts, as was done previously, because it locks our tests down to a really specific version of phpunit, which is thankfully under constant development. In short, use things as they are meant to be used... We probably reinvent too many wheels around here. - Trevor

1 0

Database drivers
by Chad 07 Sep '10

07 Sep '10

Hi! People have been wondering a lot about our database drivers recently. Two days ago I was asked specifically which ones we support. I think the subject needs clarifying, especially in light of the new installer and associated update refactoring. First and foremost, people need to remember that MediaWiki is written with MySQL in mind. It's the primary target, and can always be expected to work. That being said, I think that supporting other DBMSes is great and I'm glad we do. Earlier today on IRC, I outlined what I consider to be the DBMSes we support and a rough criteria of why I think so. "Full support" means that the schema and DatabaseBase subclass should be fully functional. Patches should be written when schema changes occur so people can stay up to date. "Partial support" means that there is a functional DatabaseBase subclass and working schema. There may be some edge cases it doesn't support. Updaters probably aren't written. "Experimental" is anything less. Typically a half-implemented DatabaseBase subclass exists. Given those criteria, I think that the following have "full support" in MediaWiki: * MySQL * SQLite * PostgreSQL "Partial support": * Oracle (works, but lacks updates) "Experimental" * MSSQL * DB2 * Informix? When the new installer ships (hopefully) in 1.17, it will contain support for MySQL, SQLite and PostgreSQL. I'm in favor of adding Oracle in as well, as long as it's clearly labelled as still a work in progress. As far as the "experimental" group go, I see no harm in leaving them in SVN. They are still mostly in development (some more active than others) and keeping the various subclasses around won't hurt anything. Once support gets a little more solid, then we can look to adding them to the installer (once they're done, it should just be a 1-line addition to Installer::$dbTypes, plus some extra i18n) -Chad

7 13

list of proposed fundraising stimuli (was Re: fundraising...)
by James Salsman 07 Sep '10

07 Sep '10

Ryan Kaldari wrote: > > ... [we're] in the process of hooking up Open Web Analytics > http://www.openwebanalytics.com It's great to see that donor logs are going in to a database instead of just a text file, but multiple regression in SQL is absurdly difficult because of the limitations of SQL, so I still recommend R, in particular: http://cran.r-project.org/web/packages/RMySQL/RMySQL.pdf and http://wiener.math.csi.cuny.edu/Statistics/R/simpleR/stat006.html I will ask Arthur Richards for data coding formats. I predict that multiple response checkboxes will do better than the more constraining radio buttons, but there is no reason that they should not be measured as any other independent variable. It is probably a lot more important to measure the number of earmarks offered: 0-26. There is plenty of reason to believe that showing 26 options will have a slight advantage over 25, but I can't see the test results from the Red Cross (they measure the things which increase donations of blood much more carefully than money, at least in their publications that I've been able to find.) Don't forget the control case where no donor selections are offered. Optimization requires measurement, and it is easy to measure offering a lot of options up front. Do you think that variations on the disclaimer should also be tried? I think there is reason to believe something terse might result in more donations, e.g.: "These options are advisory only." and/or "The Wikimedia Foundation reserves the right to override donor selections, cancel any project, and use any funds for any purpose." and/or "All donations are discretionary, these options are offered for polling purposes only." or some combination. What does Mike Godwin think a good set of disclaimers to test might be? I conflated the proposed stimulus list down to 25 non-default items and enumerated them with letters of the alphabet so that everyone would understand that it is feasible to test additional proposals as well. I have not yet surveyed the Village Pumps or mailing lists for additional stimulatory ideas but I hope people who have or who see anything missing will suggest at least five more. Translations would be great, too. (default) Use my donation where the need is greatest. A. Auction the order of search failover links to search engine companies. B. Broaden demographics of active editors. C. Compensate people who submit improvements to the extent that they are necessary and sufficient. D. Display most popular related articles. E. Enhance automation of project tasks. F. Enhance site performance in underserved geographic regions. G. Enhance visualizations of projects and their editing activity. H. Establish journalism awards, expense accounts and compensation for independent Wikinews reporters, fact checkers, photographers and proofreaders. I. Establish secure off-site backup copies. J. Establish simple Wikipedias for beginning readers in languages other than English. K. Improve math formula rendering. L. Increase the number of active editors. M. Increase the number of articles, images, and files. N. Increase the number of unique readers. O. Make it easier for people to add recorded audio pronunciations. P. Obtain expert article assessments. Q. Obtain reader quality assessments. R. Perform external code reviews. S. Perform independent usability testing. T. Produce regular snapshots and archives. U. Retain more active editors. V. Strengthen Wikimedia Foundation financial stability. W. Support a thriving research community. X. Support an easier format to write quiz questions. Y. Support more reliable server uptime. Z. Support offline editing.

3 2

Community vs. centralized development
by Jamie Morken 06 Sep '10

06 Sep '10

Hi, > > What do you mean by "opening"? > enwiki pages-meta-history is hard due to its size, not because > Ariel or > Tomasz being more stupid than any volunteer. > I trust them to do it at least as well as a volunteer would. > Of course, if you can perform better I'm all for giving you a > shell to > fix it, and the scripts are there for improvements as well. I wasn't aware that the dump scripts were publicly available, where can they be downloaded from or are they part of mediawiki? > > What do you need exactly about the images? Which image dumps do you > want? Do you have enough terabytes to store them? > Dumps/Access has been given by request in the past to that data. > If it's not there it's because: > a) Those dumps would take a lot of space. I don't think that is a valid reason, thumbnail dumps of all the images from enwiki would probably be a smaller file than the current enwiki pages-meta-history bz2 file. > > > b) Nobody feels particulary interested in them. I disagree, there has been a lot of interest in having image dumps available for download. There was a discussion on this recently on the xmldatadumps list, that basically concluded that subsets of images (ie. enwiki thumbnails) would be useful. There are wiki pages dedicated to this topic of how to download images, this is because there are no image dumps available. Is the wikimedia foundation interested to host image dumps again? If they are maybe we can start a discussion on how to make the script and what image dumps to start with. cheers, Jamie > >

3 2

← Newer
1
2
3
4
5
6
7
8
9
10
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l September 2010