[Foundation-l] Wikimedia and Environment

Sun Dec 13 09:52:53 UTC 2009

On Sun, Dec 13, 2009 at 10:30 AM, Nikola Smolenski <smolensk at eunet.rs> wrote:
> Дана Saturday 12 December 2009 17:41:44 jamesmikedupont at googlemail.com написа:
>> On Sat, Dec 12, 2009 at 5:32 PM, Teofilo <teofilowiki at gmail.com> wrote:
>> > Do we have an idea of the energy consumption related to the online
>> > access to a Wikipedia article ? Some people say that a few minutes
>> > long search on a search engine costs as much energy as boiling water
>> > for a cup of tea : is that story true in the case of Wikipedia (4) ?
>>
>> my 2 cents : this php is cooking more cups of tea than an optimized
>> program written in c.
>
> But think of all the coffee developers would have to cook while coding and
> optimizing in C!

But that is a one off expense. That is why we programmers can earn a
living, because we can work on many projects. Also we drink coffee
while playing UrbanTerror as well.

1. Php is very hard to optimize.
2. The mediawiki has a pretty nonstandard syntax. The best that I have
seen is the python implementation of the wikibook parser. But given
that each plugin can change the syntax as it will, it will get more
complex.
3. Even python is easier to optimize than php.
4. The other questions are, does it make sense to have such a
centralized client server architecture? We have been talking about
using a distributed vcs for mediawiki.
5. Well, now even if the mediawiki is fully distributed, it will cost
CPU, but that will be distributed. Each edit that has to be copied
will cause work to be done. In a distributed system even more work in
total.
6. Now, I have been wondering anyway who is the benefactor of all
these millions spend on bandwidth, where do they go to anyway?  What
about making a wikipedia network and have the people who want to
access it pay instead of having us pay to give it away? With these
millions you can buy a lot of routers and cables.
7. Now, back to the optimization. Lets say you were able to optimize
the program. We would identify the major cpu burners and optimize them
out. That does not solve the problem. Because I would think that the
php program is only a small part of the entire issue. The fact that
the data is flowing in a certain wasteful way is the cause of the
waste, not the program itself. Even if it would be much more efficient
and moving around data that is not needed, the data is not needed.

This would eventually lead, in an optimal world to updates not even
being distributed at all. Not all changes have to be centralized. Lets
say that there is one editor who would pull the changes from others
and make a public version. That would mean that only they would need
to have all data for that one topic. I think that you could optimize
the wikipedia along the lines of data travelling only to the people
who need it (editors versus viewers) and you would optimize first a
way to route edits into special interest groups and create smaller
virtual subnetworks of the editors CPUs working together in a peer to
peer direct network.

So if you have 10 people collaborating on a topic, only the results of
that work will be checked into the central server. the decentralized
communication would be between fewer parties and reduce the resources
used.

see also :
http://strategy.wikimedia.org/wiki/Proposal:A_MediaWiki_Parser_in_C

mike