V2.0?

List overview All Threads
Download

newer

older

Wikimedia + WikiTeX ?

HEADS UP: major database schema...

Magnus Manske

27 Sep 2004 27 Sep '04

5:51 a.m.

Yesterday, Erik Moeller mentioned that it might be best to hold off with the wikidata development, and instead do that in a "quantum leap" MediaWiki 2.0 version.

Which got me thinking: Should we start this (at least, plan it)?

There are quite some concepts and ideas that were proposed, but seem to be hard to do with the current line of development. Examples: * WikiData * cur/old integration * stable revision number for both cur/old * single login * XML parser * use of wikicommons * centralized interwiki link management * stable/editable version management * SVG support (editable SVG source?)

These are just the ones I come up with in a minute. There are, no doubt, more. Sure, some of them *can* be integrated into 1.3/1.4, but considering the sum of the above, it might call for some radical break.

Which leads to the question: Fork or rewrite?

Seriously: If the database structure in 2.0 would greatly differ from 1.4 (which is to be expected), a rewrite of the core parts is in order. * The parser will be rewritten anyway as XML-interpreting. * We can probably keep the skin system * Special pages will need a (partial) rewrite, depending on the new DB structure * Cache/squid/... can probably stay as they are

DATABASE REWRITE PROPOSAL

I vaguely remember there's one on meta, but I came up with this last night (don't ask;-), so here goes nothing:

* Object list table. An object is a page (article, talk, etc.), an image/media/binary file, or data; extensible with future types. * A table for each object type, which holds the actual data: Article text and user comment, revision number etc.

The object table only contains an ID and name (+namespace) for the object, and an ID number for the actual object in its table. So: * OBJ_ID, OBJ_TITLE, OBJ_NAMESPACE identifies the object * OBJ_TYPE (0 for page, 1 for image, 2 for data...) * OBJ_DATA_REVISION identifies the current object data *in its table*

An article has * ARTICLE_ID (matches OBJ_ID) * ARTICLE_REVISION (both cur and old; OBJ_DATA_REVISION has the latest ARTICLE_REVISION) * the text of that revision, the user id, text and comment, and all the other goodies

An image table would have * IMAGE_ID, IMAGE_REVISION * filename of the stored image, or reference to an external image (commons), with a local description

Similar for data etc. (maybe even users?)

A table for changes would thus store an OBJ_ID. Recent Changes can then look up what that object is, and then look up the changes in the appropriate table. As a result, we'd get * an "universal" interface for everything we store in the wiki * a (relatively) small table with all objects, equaling faster access times, that only references the actual data (in the appropriate table)

Now you see why I think "rewrite" for this one. I also strongly believe we should put *every* database access into the database class, capsuling it from the rest of the software. Had we done this in 1.4, basically only a rewrite of the database class(es) would be in order for the above proposal.

Enough shocking you for now, Magnus

Show replies by date

erik_moeller＠gmx.de

27 Sep 27 Sep

3:58 p.m.

Magnus-

...

WikiData

cur/old integration

stable revision number for both cur/old

single login

XML parser

use of wikicommons

centralized interwiki link management

stable/editable version management

and peer review mechanisms.

I've been thinking about the general problem of workflow, fact-markup etc. I'll have some more details soon.

...

SVG support (editable SVG source?)

* proper multi-language support within a single DB * completely new way of looking at discussion pages

The fact that we are using a mailing list to discuss this is a sign that our dogfood is not quite good enough to eat yet in the department of discussions. In fact, one Wikipedia WikiProject even created a separate phpBB-type online forum as a replacement for individual aricle talk pages.

Talk pages are confusing and do not offer most of the functionality that we have come to take for granted from other online forums, usenet, mailing lists etc.: - different view modes for a thread - sorting - reply button - quoting - comments are signed for me - watching individual threads for replies - isolated viewing/linking of individual comments/threads - automatic archiving - search by author, subject etc. - group or aggregate related discussions easily - ...

On the other hand, of course talk pages have the full power of wikis - they integrate into RC, they allow wiki syntax, refactoring, diffs etc. So I've been thinking in the last few weeks about a good way to combine the advantages of talk pages and discussion forums into a truly revolutionary new system.

This system, which I have dubbed "LiquidThreads" for now, requires each comment to be stored separately in the database. It will not just *allow* refactoring, it will in fact *require* it - the archiving process depends on old discussions being summarized. It will be possible to attach open wiki pages to a thread which can then be used to create a summary for one or several threads; only if such a summary exists, the thread will be automatically archived after a given time. For easy refactoring into summaries, it will be possible to generate a "flat" wikitext view of a thread.

The system will make it easy to create new "channels" of discussions and to "attach" channels to individual article pages - multiple channels per article, or multiple articles per channel. Discussions can be moved easily so that we can get a decent workflow on general discussion channels like Wikipedia:Village pump.

It will make it possible to immediately see recent comments on a Wikipedia article or a Wikidata entry without going to a separate "talk page", if you want to. You can get a "You have new messages" type notification for *any* reply, no matter where it appears. Discussions of interest to you can be followed on a single page. Yet the system will provide the whole flexibility of the traditional wiki talk system. It will allow setting individual comment permissions so that others can edit your comments. Each comment will have a history, diffs etc. And perhaps functionality for polls can be added, too.

OK, I know, someone has to code it, too ;-). I've created some mock-ups (yeah, I really like mock-ups) and will very soon start writing the specifications for this system, which I believe definitely needs to be taken into account when desigining the new database scheme. I'm fully aware that this will probably only get implemented if I code it myself.

So I hope we don't hurry too much, and at least spend a month or two laying out the requirements for the 2.0 database. Design flaws are always difficult to fix later. I want this design to be *good*, better than anything else out there. I want us to take wiki to the next level.

Regards,

Erik

Tim Starling

6:58 p.m.

New subject: The difficulty of collaborative design in a volunteer enviornment

Magnus Manske wrote:

...

Yesterday, Erik Moeller mentioned that it might be best to hold off with the wikidata development, and instead do that in a "quantum leap" MediaWiki 2.0 version.

Which got me thinking: Should we start this (at least, plan it)?

Forgive me for being jaded from past experience with MediaWiki development, but I care very little for plans. We have unbelievable numbers of plans piled up on meta and on the mailing list. The people who propose them generally overestimate the available developer workload, and fail to consider an important part of developer motivation.

In my experience, volunteer developers don't like being told what to do. Part of the fun of coding comes from creative expression -- the joy that comes from having an idea, and carrying it through until you see it realised. By publishing plans, you destroy that motivation for anyone else who had the same idea, or might have had it in the future.

What developers do when faced with this problem is they attempt to ignore all plans which have gone before. They invent their own, and fool themselves into thinking that their idea is truly creative and new. This allows them to regain the motivation that was lost.

It's not surprising then that every MediaWiki developer has their own plans for database schema redesign. I'm not going to comment on their similarity for fear of negating the reason the reason for their existence.

This curious aspect of volunteer psychology makes collaborative design very difficult. Fortunately there is an alternative which allows us to avoid the most obvious planning mistakes, and that is by the use of an oracle.

It works like this. When you have an idea which is complex and prone to errors, you don't publish it, you privately discuss it with senior developers. These developers have been active with the project for a long time. They have had many ideas of their own which they haven't had time to code, they have read many public proposals, and they have had many private planning discussions. Hence they know a great deal about possible design directions.

The oracle responds to an idea carefully, either saying that it sounds like a good idea, or explaining why it wouldn't work. They avoid bursting the developer's carefully constructed bubble by giving them mailing list references to identical proposals. The oracle should accept ideas which are slightly different to the way they would have done it -- those differences represent the creative expression which we are trying to preserve.

There are a couple of problems with this scheme. One is that project knowledge is concentrated in the senior developers, so their loss is a great loss to the project. The other is that it discourages public discussion, and so limits the pool of collective wisdom. But there is no other way. Many developers simply will not code something that has been designed by someone else.

In our project, this role of oracle is filled by Brion, and I'd like to think to a lesser extent myself. The oracle is a model, design knowledge is never truly concentrated in a single person. We have a number of knowledgable people on the project. And even the most senior developers need to discuss their own ideas among themselves.

Magnus' plan sounds fine.

-- Tim Starling

Kurt Jansson

9:02 p.m.

New subject: The difficulty of collaborative design in a volunteer enviornment

Tim Starling schrieb:

...

What developers do when faced with this problem is they attempt to ignore all plans which have gone before. They invent their own, and fool themselves into thinking that their idea is truly creative and new. This allows them to regain the motivation that was lost.

BTW, I've seen this mechanism work in Wikipedia many times. :-)

Kurt

erik_moeller＠gmx.de

9:48 p.m.

New subject: The difficulty of collaborative design in a volunteer enviornment

Tim-

...

Forgive me for being jaded from past experience with MediaWiki development, but I care very little for plans. We have unbelievable numbers of plans piled up on meta and on the mailing list. The people who propose them generally overestimate the available developer workload, and fail to consider an important part of developer motivation.

There is a lot of truth to this, and this is always something I worry about when I write a proposal - but it's a catch-22 situation, and I consider the potential benefits of publishing larger than the potential downsides of not doing it.

I also believe there's a better solution to the problem of volunteer motiviation than the one you suggest, and that is the building of group identity from the start. This is what WikiProjects on Wikipedia do, for example.

What we need to create, I believe, is workgroups for different project areas: - extension work group - template work group - 2.0 work group etc.

These groups would try to develop truly collaborative strategies and utilize all available knowledge, with all members working together on equal footing, regardless of the nature of their contributions. Positive group energy can generate a lot of motivation in itself. When people feel they are part of one group from the start, every member of that group can get a sense of accomplishment for what the group is doing.

The one problem that remains is the problem of respect, not just within a work group but within the larger MediaWiki community. If people don't respect each other, they can't work together well, and they won't *enjoy* working with each other. There are many strategies which try to address that problem, such as building reputation-based currencies where any highly respected member in the community can use that respect to "buy" other people's time. However, based on my experiences on Wikipedia, I tend to come to the conclusion that the solutions are entirely social rather than technical.

We need to build a culture of shared values, where these values tie us together and allow us to work towards a common goal. Think about it - *any* community needs shared values to be successful.

Our shared core value on Wikipedia is NPOV, for example. I can work together with people of completely different political persuasions because they all believe in building a neutral encyclopedia. Only the ones who do not believe in that, we try to convert, or eventually ostracize.

What is our shared value in MediaWiki development? As long as we do not have anything that unites us, the things that divide us will always come into focus and take up our energy and resources. Clashes between egos will prevent people from working together who could otherwise achieve true greatness. This to me seems to be a larger cause of lack of cooperation than anything else.

We share many of the same goals, but we have never bothered to put this into explicit terms. The technology we are building has enormous potential for positive change in society. If we all want this positive change to happen, if we all want to make a difference through what we are doing, then we have *no other choice* but to work together as best as we can.

We don't have to start a religion. But building a group identity based on mutual respect and shared values would be useful, I think. How about you?

</soapbox>

Regards,

Erik

Delirium

29 Sep 29 Sep

1:39 a.m.

New subject: The difficulty of collaborative design in a volunteer enviornment

Erik Moeller wrote:

...

I also believe there's a better solution to the problem of volunteer motiviation than the one you suggest, and that is the building of group identity from the start. This is what WikiProjects on Wikipedia do, for example.

It's a separate issue, but I think a solution to some cases of developer motivation would be to have good documentation and modular code so people can work on parts that interest them. I know I personally have had ideas I'd be willing to implement in the past, but my brief digging led me to the conclusion that I'd basically have to read through and understand the entire MediaWiki codebase before I could usefully add anything to it, and at that point I decided that while I was interested, I wasn't *that* interested. A bunch of the people proposing various extensions and whatnot on meta seem to be in the same boat---they could code up bits and pieces, but don't know how to hook into the larger codebase.

Of course, this is another "plan with no implementation" --- writing the documentation and cleaning up the code would require me to read and understand it all first, which is exactly what I wish to avoid, as I have other projects I'm supposed to be working on full-time. =]

-Mark

Brion Vibber

1:47 a.m.

New subject: The difficulty of collaborative design in a volunteer enviornment

On Sep 29, 2004, at 1:39 AM, Delirium wrote:

...

It's a separate issue, but I think a solution to some cases of developer motivation would be to have good documentation and modular code so people can work on parts that interest them.

That's something we're hoping to improve. :) Among other things, we're currently in the process of adding phpdoc comments to the code from which internal API documentation is generated.

One of the things I've been trying to do as I touch various modules is to separate the 'work' and 'user interface' components to make things more modular and improve code reuse. (One tiny example: Special:Undelete calls into the Special:Log code to pull out and format log fragments relating to the selected page.)

-- brion vibber (brion @ pobox.com)

Timwi

1 Oct 1 Oct

7:30 a.m.

New subject: The difficulty of collaborative design in a volunteer enviornment

Tim Starling wrote:

...

In my experience, volunteer developers don't like being told what to do.

I half-disagree.

In my experience, volunteer developers are discouraged when being told to do something that is difficult, or something they are not particularly interested in.

However, I think being asked to do something can be an extremely efficient motivator, if the developer that is being asked personally feels there is potential for success (i.e. if they're interested, if it's easy to do, etc.). If you're asked to do something, then you know that once you've finished it someone will appreciate your work.

I'm saying this because on LiveJournal developers used to be very motivated and encouraged even though they've always been told what to do (not just by Brad, but also by fellow volunteers attempting to represent Brad's interets). Fixing bugs or writing smallish features in LiveJournal was easy because their software is well organised, modular, structured, and easy to get into. LiveJournal volunteer development has come to a halt not because people were told what to do, but exactly because they are no longer told: instead, suggestions and even finished patches are ignored and left uncommented-on, developers feel unappreciated, become frustrated and leave.

According to this, being told what to do is discouraging on MediaWiki only because there are few things that are easy or interesting.

...

Part of the fun of coding comes from creative expression -- the joy that comes from having an idea, and carrying it through until you see it realised. By publishing plans, you destroy that motivation for anyone else who had the same idea, or might have had it in the future.

I don't think like that. Some people might, but I don't.

Yes, part of the fun of coding comes from creative expression -- but for some people, it might not be the biggest part. For some people, I'm sure the biggest part is the social recognition that results from success (being able to say "I made this!" and hearing other people reply "Well done!").

Both personalities like to publish their plans because, like a finished implementation, a plan is also (1) a form of creative expression, and (2) something that can harvest social recognition ("Nice plan!" etc.).

On the contrary, if someone else publishes a plan, there are three possible ways in which I can react. Either I think the plan is bad and I think I can come up with something better; or I think the plan is good and I might only have minor suggestions for improvement or perhaps not even that; or there is something important about the plan that I don't understand (e.g. why it's supposed to be better) or that I can't judge (e.g. its advantages and disadvantages). Unfortunately, the third is the most common; many very competent developers' weakest skill is communication, and often a plan includes just the bare implementation idea, not the information on why it's better and what its advantages are.

...

What developers do when faced with this problem is they attempt to ignore all plans which have gone before. They invent their own, and fool themselves into thinking that their idea is truly creative and new. This allows them to regain the motivation that was lost.

No, I don't think I fool myself into thinking that my ideas are truly creative and new, but I fool myself into thinking that my ideas are good ones. I explicitly do not pass judgment over plans I read that fall into the third category above, but subconciously it feels like the ideas I understand best (which, inevitably, are my own) are somehow better.

...

This curious aspect of volunteer psychology makes collaborative design very difficult. Fortunately there is an alternative which allows us to avoid the most obvious planning mistakes, and that is by the use of an oracle. It works like this. [...]

Your idea would essentially create a hierarchy. Those that you call "senior developers" would be regarded as having a superior social status, and would seem like they have more say in the development of the software.

This isn't necessarily bad with regard to the software; every largish project needs some management and guidance. But whether this is good in the social sense, I'm not sure.

...

In our project, this role of oracle is filled by Brion, and I'd like to think to a lesser extent myself.

Unfortunately, that's not very many people. I have continuous trouble contacting either of you for a conversation. I would have been more than happy to discuss my database schema redesign ideas (or any other ideas, for that matter) with you (or anyone, for that matter), but it never works out. You're either too busy, or have to leave too soon, or you're simply not interested enough. I am not trying to point fingers here; don't get me wrong. This isn't a complaint. This is just where your oracle idea is slightly flawed. We either need more of those oracle people (which defeats the purpose of your idea, because then within the oracle group you have the same issues again), or we need an oracle member that is very much dedicated to listening to and evaluating plans and ideas, and not occupying themselves too much with coding. Such a person is unlikely to be found.

Greetings, Timwi

7387

Age (days ago)

7391

Last active (days ago)

wikitech-l@lists.wikimedia.org

7 comments

7 participants

tags (0)

participants (7)

Brion Vibber
Delirium
erik_moeller＠gmx.de
Kurt Jansson
Magnus Manske
Tim Starling
Timwi