[Foundation-l] Wikimedia and Environment

jamesmikedupont at googlemail.com jamesmikedupont at googlemail.com
Sun Dec 13 12:38:18 UTC 2009


Let me sum this up, The basic optimization is this :
You don't need to transfer that new article in every revision to all
users at all times.
The central server could just say  : this is the last revision that
has been released by the editors responsible for it, there are 100
edits in process and you can get involved by going to this page here
(hosted on a server someplace else). There is no need to transfer
those 100 edits to all the users on the web and they are not
interesting to everyone.


On Sun, Dec 13, 2009 at 12:10 PM, Domas Mituzas <midom.lists at gmail.com> wrote:
>> 4. The other questions are, does it make sense to have such a
>> centralized client server architecture? We have been talking about
>> using a distributed vcs for mediawiki.
>
> Lunatics without any idea of stuff being done inside the engine talk about distribution. Let them!

I hope you are serious here,
Lets take a look at what the engine does, it allows editing of text.
It renders the text. It serves the text. The wiki from ward cunningham
is a perl script of the most basic form. There is not much magic
involved. Of course you need search tools, version histories and such.
There are places for optimizing all of those processes.

It is not lunacy, it is a fact that such work can be done, and is done
without a central server in many places.

Just look at for example how people edit code in an open source
software project using git. It is distributed, and it works.

There are already wikis based on git available.
There are other peer to peer networks such as TOR or freenet that
would be possible to use.

If you were to split up the editing of wikipedia articles into a
network of git servers across the globe and the rendering and
distribution of the resulting data would be the job of the WMF.

Now the issue of resolving conflicts is pretty simple in the issue of
git, everyone has a copy and can do what they want with it. If you
like the version from someone else, you pull it.

In terms of wikipedia as having only one viewpoint, the NPOV that is
reflected by the current revision at any one point in time, that
version would be one pushed from its editors repositories. It is
imaginable that you would have one senior editor for each topic who
has their own repository of of pages who pull in versions from many
people.

>> 7. Now, back to the optimization. Lets say you were able to optimize
>> the program. We would identify the major cpu burners and optimize them
>> out. That does not solve the problem. Because I would think that the
>> php program is only a small part of the entire issue. The fact that
>> the data is flowing in a certain wasteful way is the cause of the
>> waste, not the program itself. Even if it would be much more efficient
>> and moving around data that is not needed, the data is not needed.
>
> We can have new kind of Wikipedia. The one where we serve blank pages, and people imagine content in it. We\ve done that with moderate success quite often.

Please lets be serious here!
I am talking about the fact that not all people need all the
centralised services at all times.

>
>> So if you have 10 people collaborating on a topic, only the results of
>> that work will be checked into the central server. the decentralized
>> communication would be between fewer parties and reduce the resources
>> used.
>
> Except that you still need tracker to handle all that, and resolve conflicts, as still, there're > no good methods of resolving conflicts with small number of untrusted entities.

A tracker to manage what server is used for what group of editors can
be pretty efficient. Essentially it is a form of DNS. A tracker need
only show you the current repositories that are registered for a
certain topic.

Resolving conflicts is important, but you only need so many people for that.

The entire community does not get involved in all the conflicts. There
are only a certain number of people that are deeply involved in any
one section of the wikipedia at any given time.

Imagine that you had, lets say 1000 conference rooms available for
discussion and working together spread around the world and the
results of those rooms would be fed back into the Wikipedia. These
rooms or servers would be for processing the edits and conflicts any
given set of pages.

My idea is that you don't need to have a huge server to resolve
conflicts. many pages don't have many conflicts, there are certain
areas which need constant arbitration of course. Even if you split up
the groups into different viewpoints where the arbitration team only
deals with the output of two teams (pro and contra).

Even if you look at the number of editors in a highly contested page,
they are not unlimited.



More information about the foundation-l mailing list