Today Friday, the front page of the English Wikipedia has been fast
Another page (I monitor http://www.wikipedia.com/wiki/Sweden) was slow
for one period of 30 minutes (09:30-10:00 am GMT) and another period
of two hours (11:40-13:50 GMT). Some other URLs on the international
Wikipedias were also affected at the same time. This might be due to
maintenance or work being done on the scripts.
Subtract 7 hours from GMT to get the server's local time zone
(PDT = GMT -0700).
Apart from these two limited intervals, every URL that I monitor have
been fast all day, including the recent changes pages.
I'm very happy with this, and hope Brion and Jimmy (and who else?)
will soon get the talk namespace links back without hurting
performance. (But hey, never make big fixes five minutes before you
leave for the weekend! Better just leave it as is if you have to go.)
And now for some more relaxed Friday reading, actually related to
performance problems. (The following analysis might be politically
slanted. Don't take it too seriously.) The Swedish parliament
elections are coming up in September, so the political parties are
starting up their campaigns. The problem is there are no big issues
to fight about. The four non-socialist parties have unusually boring
candidates (Dukakis style), and everybody expects the current
social-democratic government to win. The single issue that seems to
be coming up is the national sick leave insurance, which is paid by
tax money, and far over budget. This is linked to the fact that
"burn-out" is now an accepted medical diagnosis for which you are
allowed to take a long sick leave on the tax payers' expense. You
would expect such welfare excesses to be on the social democrat
agenda, and that non-socialists would urge for tax cuts and a balanced
budget. However, the current s-d govt has been doing a great job
balancing the budget, and they will now have to deal with cutting back
this overgenerous sick leave compensation without hurting their
voters' feelings. Tough job. The Christian-democratic party's
candidate has already hurt a lot of feelings by claiming that "some"
of those receiving compensation are "cheating the system". That might
be true, but accusing "some" (who? me?) is obviously not the way to
attract voters. This issue now has media attention and some
interesting example cases are reported.
Like this one: Attorneys in Swedish district courts have been
right-sized in the past years, as part of balancing the budget. This
means that as soon as one gets sick, the rest get too much to do,
leading to stress and burn-out, which leads to more sick leaves.
Think of the court cases as HTTP requests arriving to Wikipedia.
There are some processes/attorneys there to handle the cases, but for
some reason one process gets blocked and cannot work. This leaves
more work for the remaining workers, but they are probably waiting for
the first process to get finished and unlock the resources (database
records?) that it is using. If processes are allowed to go to sleep
waiting for each other, the work will pile up. It will never end.
So, what is the solution? Throwing more attorneys at the problem?
Maybe, but more likely the work processes should be redesigned and
simplified. That allows the available attorneys to finish up a case
and take on the next one. Some of their tasks are more important than
others, but the performance or throughput of the system depends on
cutting away or redesigning the most time-consuming tasks. The high
degree of sick-leave is an indicator of system design flaws (albeit an
one), and thus not altogether bad.
In the same way, a high "load average" (as reported by the "uptime" or
"top" commands) is one indicator that the Wikipedia system is flawed.
The load average in a UNIX system is the number of processes that are
ready to run, waiting for the CPU to become available. Unfortunately,
most of them are just waiting to see if their wanted resource has
become available. If this is not the case (e.g. database record still
locked), they will go back to the end of the line, waiting again. Do
you remember those bread shop waiting lines in Soviet Russia?
Training new attorneys is in itself a time-consuming task, which
should be avoided if possible. Instead of paying sick leave (for how
long?) to the already trained attorneys, a "cure" for "burn-out"
should be found that can bring them back to work, thus relieving the
overload from their colleagues and saving tax payers' money at the
I have no idea how a "cure" for burn-out can be found, but I think it
is a necessary political trick, and thus will happen. It will not
hurt voters' feelings, and it is my guess that the people who can
achieve this will work for the winners of the election.
This might be the weakest analogy in history, but I think we should
treat the Wikipedia processes with the same dignity and respect that
the Swedish voters would expect. After all, they're supposed to work
for us. The processes feel self-fulfillment when they can finish
their job on time, and get distressed when they get locked up. Any
uncalled for delay will only result in more work piling up. That is a
flaw in the system design that has to be fixed, and we cannot go
around claiming that "some" of the workers are trying to cheat the
system. That will only lead to us losing their confidence.
Lars Aronsson (lars(a)aronsson.se)
Teknikringen 1e, SE-583 30 Linuxköping, Sweden
Any opinions from wikitech-l about this?
----- Forwarded message from Tomasz Wegrzanowski <taw(a)users.sourceforge.net> -----
From: Tomasz Wegrzanowski <taw(a)users.sourceforge.net>
Date: Tue, 30 Jul 2002 04:08:53 +0200
To: Jimmy Wales <jwales(a)bomis.com>
Subject: Re: [Wikitech-l] rsync for mirroring
On Mon, Jul 29, 2002 at 03:40:12PM -0700, Jimmy Wales wrote:
> What do we need to do on our end?
This is just one of many ways of doing it.
You should probably play with logs, connection limits,
running on lower permisions and stuff like that later,
but it should work without this.
Obviously you should add all Wikipedias to list (i have only 3 here)
and use correct paths.
1. install rsync
2. ensure that /etc/services contains this line (if not either add
this line or write port in /etc/inetd.conf numerucally):
rsync 873/tcp # rsync
3. create /etc/rsyncd.conf containing something like that:
read only = yes
path = /home/taw/local/tmp/wiki-pl/
comment = Polish Wikipedia
path = /home/taw/local/tmp/wiki-de/
comment = German Wikipedia
path = /home/taw/local/tmp/wiki-eo/
comment = Esperanto Wikipedia
4. put following line in /etc/inetd.conf
rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon
5. restart inetd
Now to check:
$ rsync localhost::
pl Polish Wikipedia
de German Wikipedia
eo Esperanto Wikipedia
$ rsync localhost::pl
drwxrwxr-x 288 2002/01/09 22:22:50 .
-rw-rw-r-- 19 2001/09/26 16:23:32 .htaccess
drwxrwxr-x 112 2001/10/02 13:07:29 RCS
-rw-rw-r-- 302 2000/07/18 20:40:13 hos.png
-rw-rw-r-- 235 2001/09/26 17:49:14 index.html
drwxrwxr-x 72 2001/10/18 01:37:30 lib-http
-rwxrwxr-x 67 2001/09/26 18:05:02 showtr
drwxrwxr-x 72 2002/01/12 20:26:36 temp
-rwxrw-r-- 1160 2001/04/08 18:34:10 umtrans.pl
drwxrwxr-x 592 2001/11/24 09:52:33 wiki
----- End forwarded message -----
Supporting old, horrible browsers.
I have some current stats that suggests that IE 4.x and Netscape 4.x now
represent 2% and 3% of users, respectively.
Here's what I propose to do with Cologne Blue.
1. Browser-sniffing: detect these, and only these, broken browsers, at
2. For these browsers alone, generate an XHTML page using tables for
layout, and minimal CSS for typography and colours.
3. For all other browsers, generate an XHTML page using CSS for layout,
and typography, and colours. (Oh, and a table for the header, but that's
1. Modern standards-compliant browsers will show the site as intended.
2. Very old browsers, text-only browsers, web spiders, and accessibility
systems will show the site in the best backwards-compatible rendition
possible, as the site will use nice old-fashioned HTML codes inside all
the fancy layout stuff, ignoring the CSS completely.
3. The brain-dead browsers listed above will show a reasonable rendition
of the site, for as long as it takes for their market share to fall near
I think that 90% of the layout code can be re-used in a single skin
file, provided that I can get an indication if the browser is one of the
Does this seem like a reasonable approach? And does anyone have any
GPL'd user-agent parsing PHP code?
On Mon, Jul 29, 2002 at 02:50:20PM +0200, Lars Aronsson wrote:
> On Mon, 29 Jul 2002, Tomasz Wegrzanowski wrote:
> > Could you make international wikis available via rsync ?
> An exciting idea. Would you even try Unison for this?
> Is there any benefit to using Unison over rsync for this kind of
> uni-directional application?
It doesn't seem to have any advantage and is less popular,
so I'd rather choose rsync.
This is my latest, and I hope final, mock-up rendition of Cologne Blue
that I intend to offer to replace the existing implementation of
I enclose a mock-up page: this represents the final look intended in
standard-compliant CSS browsers like Mozilla and IE 6, but will probably
not work properly in non-CSS-aware browsers yet. There's still lots to
do, but the only way to find out is not to build a better mock-up, but
to write the working code.
I realise that this design is not perfect, but it is probably
nicer-looking than the existing implementation, and I'm reading the code
for the new software to see how to make the changes in an evolutionary
way will work with cross-browser support.
I'm also considering the idea of doing a version of this with tables
alone for old browsers that ignore or munge CSS.
If I do the CSS right, the CSS version should work OK for browsers like
Can anyone help me with how to go about contributing code, and where and
how to test it? The first thing I'll need to do is just to clone an
existing style, and call it something like "Cologne Beta", prior to
changing it step-wise into real code.
Once I have something up and running, then we can start voting for
features. As all code will of course be GPL, if you dislike it enough,
you'll be able to change it yourself.
> This made me think: Would it make sense to make a formal BNF
> grammar for the Wikipedia text format, so a LALR(1) parser could
> be made for it? Would that make any sense at all with PHP, or
> just be too hard to code and inflexible?
I'd love to have a formal grammar of some kind (I think regexps
would be fine), and I agree with Jan that a totally wiki-specific
syntax would be far better than out current mish-mash of HTML and
wiki markup. But I'm not sure if it's not already too late to
revisit those decisions.
But if it isn't, I'll be happy to discuss what a syntax might
There's a discussion on wikipedia-l about the exact syntax of an URL and
whether a punctuation mark at the end of the URL should be considered part
of the URL or not.
This made me think: Would it make sense to make a formal BNF grammar for
the Wikipedia text format, so a LALR(1) parser could be made for it?
Would that make any sense at all with PHP, or just be too hard to code
Only ten years ago, people would use C programming and YACC to solve
problems like this, and reg.exp based parsing was considered just too
Lars Aronsson (lars(a)aronsson.se)