Daniel Mayer wrote:
It is important for our developers to have complete access
to the servers that
Wikimedia projects run on. Otherwise outages will be
significantly longer and
we would not have any way to fix things ourselves when it
is the server that
is the problem. Read only mirrors are a very real
possibility but in order to
get that to work we need somebody to code the
functionality. Jimbo supports
the idea of read only mirrors but somebody has to code that
to make it happen
in a near real time and seemless way. Any volunteers?
Me, me! :)
I've started working on DB replication. We should be able to set up a remote slave server once I'm done. The MySQL manual says you can even have slave servers connecting over a modem, just dialling up occasionally to get the latest updates. So having one in a different city or something shouldn't be a problem. There might be a few UI issues to sort out. I guess we'd have to set up a method of mirroring images too.
Sorry for my newbies question, but there's something i dont understand. Many web site around the world have only one DB server and many load balancing web server. Slashdot for example IIRC. Ok it must be a quadri processor... but it's very fast !
I'm not really sure, that pliny is the bottleneck.
The Larousse server is a piii866, it's a bit short for what we want to do I think. I hope the next upgrade will improve Larousse, then we'll can move other wikis to it. And we'll see if Pliny is really the bottleneck.
My 2 cents
On Mon, 2003-09-22 at 00:03, Constans, Camille (C.C.) wrote:
Sorry for my newbies question, but there's something i dont understand. Many web site around the world have only one DB server and many load balancing web server. Slashdot for example IIRC. Ok it must be a quadri processor... but it's very fast !
I'm not really sure, that pliny is the bottleneck.
At the moment, for the most part it's not -- because we've turned off a lot of features that are heavy on the database. Searching, 'wanted pages', 'orphans', etc.
Of course, it's further loaded down by serving web pages for everything but the English-language site, since larousse is much too loaded by that to take the rest on.
The Larousse server is a piii866, it's a bit short for what we want to do I think. I hope the next upgrade will improve Larousse, then we'll can move other wikis to it. And we'll see if Pliny is really the bottleneck.
The current upgrade plan is to upgrade both pliny and larousse to similar levels (dual Athlon 2800s, iirc), and hopefully get the web load better handled. This should happen in the next few days. Hopefully we'll get warning before the downtime, which hopefully will be short. :)
Then there is talk of next getting a big honkin' server for the database, which will leave pliny free as a second web server.
Database replication meanwhile can be handy for several things:
* Live backup server! If the master database server dies, but we've got a replicated server keeping up constantly, we can switch over to it fairly quickly and stay online.
* Performance. If we get all those slow things turned back on, the load on the database will increase. Relatively slow check-a- hojillion-pages operations could be run from a slave server without affecting performance for everyday reading & writing.
* Mirroring? An offsite server with its own copy of the database being updated live could provide a more (or perhaps fully-) functional mirror, offloading traffic from the main server and somewhat reducing response times (for read-only operations at least) for people far from North America.
-- brion vibber (brion @ pobox.com)
Constans, Camille (C.C.) wrote:
Sorry for my newbies question, but there's something i dont understand. Many web site around the world have only one DB server and many load balancing web server. Slashdot for example IIRC. Ok it must be a quadri processor... but it's very fast !
I'm not really sure, that pliny is the bottleneck.
The Larousse server is a piii866, it's a bit short for what we want to do I think. I hope the next upgrade will improve Larousse, then we'll can move other wikis to it. And we'll see if Pliny is really the bottleneck.
My 2 cents
I was suggesting making a read-only mirror by mirroring the database rather than just the HTML. Every time a page is updated on the main DB server, an update would automatically be sent to the mirror. It's a very simple method, which meets Daniel's requirements of short latency. The idea would be that the mirror would serve web pages by getting articles from its local copy of the database, rather than from the other side of the world or country or whatever.
We could even set up a full read-write server on the other side of the world, and redirect users to a different domain name as they arrive, based on their location. Users could specify their preferred mirror in their user preferences. We could even make larousse the default for logged-in users, that way most edits go to a web server which is close to the master DB.
-- Tim Starling.
Tim Starling wrote:
We could even set up a full read-write server on the other side of the world, and redirect users to a different domain name as they arrive, based on their location. Users could specify their preferred mirror in their user preferences. We could even make larousse the default for logged-in users, that way most edits go to a web server which is close to the master DB.
Actually, it's not really mirroring then. What would you call it? A distributed cluster?
-- Tim Starling.
We could even set up a full read-write server on the other side of the
world, and redirect users to a different domain name as they arrive, based on their location.
Brilliant idea given the small number of writes (a few per minute) and massive number of reads. Put me on the list of folks with computing/bandwidth willing to donate to this effort.
Andrew Lih University of Hong Kong Email: alih@hku.hk | Web: http://jmsc.hku.hk/
On Mon, 22 Sep 2003, Tim Starling wrote:
I was suggesting making a read-only mirror by mirroring the database rather than just the HTML. Every time a page is updated on the main DB server, an update would automatically be sent to the mirror. It's a very simple method, which meets Daniel's requirements of short latency. The idea would be that the mirror would serve web pages by getting articles from its local copy of the database, rather than from the other side of the world or country or whatever.
We could even set up a full read-write server on the other side of the world, and redirect users to a different domain name as they arrive, based on their location. Users could specify their preferred mirror in their user preferences. We could even make larousse the default for logged-in users, that way most edits go to a web server which is close to the master DB.
Would it not be better to keep the mirrors read-only, and have them redirect to the master for write-access? To have writing in several places causes significant overhead in avoiding edit conflicts and such.
Andre Engels
Andre Engels wrote:
On Mon, 22 Sep 2003, Tim Starling wrote:
logged-in users, that way most edits go to a web server which is close to the master DB.
Would it not be better to keep the mirrors read-only, and have them redirect to the master for write-access? To have writing in several places causes significant overhead in avoiding edit conflicts and such.
There is a lot of hypothesis and discussion here. Have you considered that there are some 40-60 page views for every single edit? What about using some real statistics instead of guessing? (Just my hypothesis.)
Plus the tech discussion should be on wikitech-l, not wikipedia-l.
http://www.wikipedia.org/wiki/Special:Statistics reports 40 views per edit, as an average since July 2002. More recently, the English Wikipedia has received:
Month Edits per month Page views views/edit --------- --------------- ---------- ---------- July, 2003 212K 9.9M 46 Aug, 2003 248K 13.0M 52 Sept 1-24, 2003 227K 13.9M 61
As a comparison, the fast response time susning.nu wiki features 100 page views per edit. A faster Wikipedia would receive more page views, probably 30M per month. The number of edits per month would also increase, but perhaps not as much.
The Wikipedia statistics are spread out over too many places, and none of these pages are wiki-editable, so I cannot add cross-reference links.
- Webalizer graphs, http://www.wikipedia.org/stats/ - Article count, http://www.wikipedia.org/wiki/Special:Statistics - Erik Zachte's edit count, http://www.wikipedia.org/wikistats/ older version at http://members.chello.nl/epzachte/Wikipedia/Statistics/EN/Sitemap.htm
Lars Aronsson wrote:
Andre Engels wrote:
On Mon, 22 Sep 2003, Tim Starling wrote:
logged-in users, that way most edits go to a web server which is close to the master DB.
Would it not be better to keep the mirrors read-only, and have them redirect to the master for write-access? To have writing in several places causes significant overhead in avoiding edit conflicts and such.
I've decided that Andre may be right about this. Editors should be redirected to larousse when they log in, and anonymous users should be redirected when they click edit. If people were allowed to log in to the mirrors, you'd have to use HTTP redirects in order to log them in to larousse when they started editing.
There is a lot of hypothesis and discussion here. Have you considered that there are some 40-60 page views for every single edit? What about using some real statistics instead of guessing? (Just my hypothesis.)
Statistics are pretty useless without profiling/benchmarking. We know that there are many more views than edits, but we don't know the load on the server for each. Edits are far more expensive than views, for a number of reasons.
Would it be possible to generate some profiling data for the live wiki? Say, turning on $wgProfiling for one in every thousand requests to wiki.phtml?
-- Tim Starling
Tim Starling wrote:
Would it be possible to generate some profiling data for the live wiki? Say, turning on $wgProfiling for one in every thousand requests to wiki.phtml?
You only need to profile the slow requests. That's what I do on susning.nu. It works for me.
On Wed, 24 Sep 2003, Lars Aronsson wrote:
Andre Engels wrote:
On Mon, 22 Sep 2003, Tim Starling wrote:
logged-in users, that way most edits go to a web server which is close to the master DB.
Would it not be better to keep the mirrors read-only, and have them redirect to the master for write-access? To have writing in several places causes significant overhead in avoiding edit conflicts and such.
There is a lot of hypothesis and discussion here. Have you considered that there are some 40-60 page views for every single edit? What about using some real statistics instead of guessing? (Just my hypothesis.)
And what does that have to do with my point? Are you saying that the overhead does not matter because it will only occur in a small percentage of cases? Then I will answer that redirecting people elsewhere for editing does not matter either, because it is just as small a percentage of cases.
I have no idea what 'real statistics' could either strengthen or weaken the point I am making. There are no 'real statistics' about the amount of time it costs to make edits on all machines as opposed to doing all on one. There has been no Wikipedia implementation on either.
Andre Engels
wikipedia-l@lists.wikimedia.org