Andre Engels wrote:
On Mon, 22 Sep 2003, Tim Starling wrote:
logged-in users, that way most edits go to a web server which is close to the master DB.
Would it not be better to keep the mirrors read-only, and have them redirect to the master for write-access? To have writing in several places causes significant overhead in avoiding edit conflicts and such.
There is a lot of hypothesis and discussion here. Have you considered that there are some 40-60 page views for every single edit? What about using some real statistics instead of guessing? (Just my hypothesis.)
Plus the tech discussion should be on wikitech-l, not wikipedia-l.
http://www.wikipedia.org/wiki/Special:Statistics reports 40 views per edit, as an average since July 2002. More recently, the English Wikipedia has received:
Month Edits per month Page views views/edit --------- --------------- ---------- ---------- July, 2003 212K 9.9M 46 Aug, 2003 248K 13.0M 52 Sept 1-24, 2003 227K 13.9M 61
As a comparison, the fast response time susning.nu wiki features 100 page views per edit. A faster Wikipedia would receive more page views, probably 30M per month. The number of edits per month would also increase, but perhaps not as much.
The Wikipedia statistics are spread out over too many places, and none of these pages are wiki-editable, so I cannot add cross-reference links.
- Webalizer graphs, http://www.wikipedia.org/stats/ - Article count, http://www.wikipedia.org/wiki/Special:Statistics - Erik Zachte's edit count, http://www.wikipedia.org/wikistats/ older version at http://members.chello.nl/epzachte/Wikipedia/Statistics/EN/Sitemap.htm
On Wed, 24 Sep 2003, Lars Aronsson wrote:
Andre Engels wrote:
On Mon, 22 Sep 2003, Tim Starling wrote:
logged-in users, that way most edits go to a web server which is close to the master DB.
Would it not be better to keep the mirrors read-only, and have them redirect to the master for write-access? To have writing in several places causes significant overhead in avoiding edit conflicts and such.
There is a lot of hypothesis and discussion here. Have you considered that there are some 40-60 page views for every single edit? What about using some real statistics instead of guessing? (Just my hypothesis.)
And what does that have to do with my point? Are you saying that the overhead does not matter because it will only occur in a small percentage of cases? Then I will answer that redirecting people elsewhere for editing does not matter either, because it is just as small a percentage of cases.
I have no idea what 'real statistics' could either strengthen or weaken the point I am making. There are no 'real statistics' about the amount of time it costs to make edits on all machines as opposed to doing all on one. There has been no Wikipedia implementation on either.
Andre Engels
The Afrikaans wikipedia seems to be down: http://af.wikipedia.org/wiki.cgi returns
Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, lee@piclab.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
---------------------------------------------------------------------------- ----
Apache/1.3.27 Server at af.wikipedia.org Port 80
This doesn't seem to affect any of the other languages.
regards, Ian Gilfillan
On Fri, 2003-09-26 at 09:44, Ian Gilfillan wrote:
The Afrikaans wikipedia seems to be down: http://af.wikipedia.org/wiki.cgi returns
Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, lee@piclab.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.
Error log is not terribly helpful:
[Fri Sep 26 16:39:31 2003] [error] [client 64.163.244.225] Premature end of script headers: /home/usemod/wiki-af/work-http/wiki.cgi
On quick examination, it seems to be doing this only on the HomePage. I can go to other pages: http://af.wikipedia.org/wiki.cgi?Recent_Changes http://af.wikipedia.org/wiki.cgi?Ghana etc
I managed to get in to edit the homepage: http://af.wikipedia.org/wiki.cgi?action=edit&id=HomePage
added a few dots at the end, saved it, it's fine now. Took the dots back off, saved it, still fine.
I dunno... weeeeeird.
-- brion vibber (brion @ pobox.com)
Andre Engels wrote:
I have no idea what 'real statistics' could either strengthen or weaken the point I am making. There are no 'real statistics' about the amount of time it costs to make edits on all machines as opposed to doing all on one. There has been no Wikipedia implementation on either.
I have not taken my time to contribute to the source code, so I'll keep my opinions on the details to myself. But you are right that there are no detailed statistics today, and I think that this is sad.
I wish the best for and give my deepest respect to those who take their time to work on the source code and hardware. My spontaneous reaction, however, is that compartmentization of different operations to different servers is a mistake. It can certainly work, but it also risks to make the system more complex and vulnerable than necessary.
The architecture that I would advocate is a single MySQL backend (possibly with a cold or hot standby) and multiple parallel PHP+Apache frontends, that are all equal and balance the load among them. This should be combined with profiling of any request (both complete HTTP requests, and individual SQL statements) that takes longer than a set limit of wallclock time. The latter would provide the performance statistics that we are missing today.
Susning.nu doesn't have any hardware of its own. It runs as a Perl FCGI script on a web hotel. The web hotel serves a couple of thousand websites on a cluster consisting of a single MySQL backend server and multiple parallel Apache frontend servers that balance the load among them. Susning.nu only has 1.8 million page views per month, and is one of this web hotel's most heavily accessed websites, so it's not a huge web hotel, but they are friendly and skilled and they run Linux. To the questions "how to design the hardware", "how to run backups", and "how to balance the load", my best answer is "I don't - my web hotel does that for me - for $50 per month". All I do is to profile and optimize the Perl script and SQL statements, so the site runs fast.
Lars Aronsson wrote:
The architecture that I would advocate is a single MySQL backend (possibly with a cold or hot standby) and multiple parallel PHP+Apache frontends, that are all equal and balance the load among them.
Yes! This is precisely the architecture we are pursuing.
--Jimbo
wikitech-l@lists.wikimedia.org