Today Friday, the front page of the English Wikipedia has been fast all day.
Another page (I monitor http://www.wikipedia.com/wiki/Sweden) was slow for one period of 30 minutes (09:30-10:00 am GMT) and another period of two hours (11:40-13:50 GMT). Some other URLs on the international Wikipedias were also affected at the same time. This might be due to maintenance or work being done on the scripts.
Subtract 7 hours from GMT to get the server's local time zone (PDT = GMT -0700).
Apart from these two limited intervals, every URL that I monitor have been fast all day, including the recent changes pages.
I'm very happy with this, and hope Brion and Jimmy (and who else?) will soon get the talk namespace links back without hurting performance. (But hey, never make big fixes five minutes before you leave for the weekend! Better just leave it as is if you have to go.)
And now for some more relaxed Friday reading, actually related to performance problems. (The following analysis might be politically slanted. Don't take it too seriously.) The Swedish parliament elections are coming up in September, so the political parties are starting up their campaigns. The problem is there are no big issues to fight about. The four non-socialist parties have unusually boring candidates (Dukakis style), and everybody expects the current social-democratic government to win. The single issue that seems to be coming up is the national sick leave insurance, which is paid by tax money, and far over budget. This is linked to the fact that "burn-out" is now an accepted medical diagnosis for which you are allowed to take a long sick leave on the tax payers' expense. You would expect such welfare excesses to be on the social democrat agenda, and that non-socialists would urge for tax cuts and a balanced budget. However, the current s-d govt has been doing a great job balancing the budget, and they will now have to deal with cutting back this overgenerous sick leave compensation without hurting their voters' feelings. Tough job. The Christian-democratic party's candidate has already hurt a lot of feelings by claiming that "some" of those receiving compensation are "cheating the system". That might be true, but accusing "some" (who? me?) is obviously not the way to attract voters. This issue now has media attention and some interesting example cases are reported.
Like this one: Attorneys in Swedish district courts have been right-sized in the past years, as part of balancing the budget. This means that as soon as one gets sick, the rest get too much to do, leading to stress and burn-out, which leads to more sick leaves.
Think of the court cases as HTTP requests arriving to Wikipedia. There are some processes/attorneys there to handle the cases, but for some reason one process gets blocked and cannot work. This leaves more work for the remaining workers, but they are probably waiting for the first process to get finished and unlock the resources (database records?) that it is using. If processes are allowed to go to sleep waiting for each other, the work will pile up. It will never end.
So, what is the solution? Throwing more attorneys at the problem? Maybe, but more likely the work processes should be redesigned and simplified. That allows the available attorneys to finish up a case and take on the next one. Some of their tasks are more important than others, but the performance or throughput of the system depends on cutting away or redesigning the most time-consuming tasks. The high degree of sick-leave is an indicator of system design flaws (albeit an one), and thus not altogether bad.
In the same way, a high "load average" (as reported by the "uptime" or "top" commands) is one indicator that the Wikipedia system is flawed. The load average in a UNIX system is the number of processes that are ready to run, waiting for the CPU to become available. Unfortunately, most of them are just waiting to see if their wanted resource has become available. If this is not the case (e.g. database record still locked), they will go back to the end of the line, waiting again. Do you remember those bread shop waiting lines in Soviet Russia?
Training new attorneys is in itself a time-consuming task, which should be avoided if possible. Instead of paying sick leave (for how long?) to the already trained attorneys, a "cure" for "burn-out" should be found that can bring them back to work, thus relieving the overload from their colleagues and saving tax payers' money at the same time.
I have no idea how a "cure" for burn-out can be found, but I think it is a necessary political trick, and thus will happen. It will not hurt voters' feelings, and it is my guess that the people who can achieve this will work for the winners of the election.
This might be the weakest analogy in history, but I think we should treat the Wikipedia processes with the same dignity and respect that the Swedish voters would expect. After all, they're supposed to work for us. The processes feel self-fulfillment when they can finish their job on time, and get distressed when they get locked up. Any uncalled for delay will only result in more work piling up. That is a flaw in the system design that has to be fixed, and we cannot go around claiming that "some" of the workers are trying to cheat the system. That will only lead to us losing their confidence.
Lee,
Could you please remove the user 'Jan hidders' from the test-Wikipedia. There seem to be some straing side-effects: - I cannot get permanently logged out. - I cannot edit my personal page. (I get an edit-conflect and after saving I see the edited result but nothing is reported in Recent Changes and the next time I visit the page nothing has changed.)
-- Jan Hidders
I've been thinking a little about writing a formal syntax for the Wikipedia contents, and now have a question. If I understand the code correctly the parts between <nowiki> tags are completely left alone, no matter what HTML we find betwen them (Javascript, illegal HTML or whatever). My first question is (1) is that what we want? and (2) if not should then the formal grammar describe what the code is doing now or what it should be doing?
As far as (1) is concerned I would like it if we would guarantee that our output is always valid and well-formed HTML4 without any scripting. As far as (2) my first inclination would be to let the grammar describe what the code is doing at the moment since that is probably needed anyway, and then we can start discussing what it actually should be doing.
-- Jan Hidders
Are the <nowiki> tags still needed in the script? I'm asking this because 1. I'm having trouble getting them into the formal syntax 2. They are a bit of a security risc because the allow users to get things like javascript on a page.
-- Jan Hidders
Jan.Hidders wrote:
Are the <nowiki> tags still needed in the script? I'm asking this because
- I'm having trouble getting them into the formal syntax
Of course they're still needed! How else are we supposed to include wiki markup as text in a wikipage that's not overly burdenson? (ie, using numeric character entities instead of special wiki symbols.)
- They are a bit of a security risc because the allow users to get things
like javascript on a page.
If that's the case, that's a serious bug. <nowiki> should mean no *wiki* markup interpretation, not no *HTML* safeguarding.
-- brion vibber (brion @ pobox.com)
On Tue, Jul 30, 2002 at 02:38:07AM -0800, Brion VIBBER wrote:
Jan.Hidders wrote:
Are the <nowiki> tags still needed in the script? I'm asking this because
- I'm having trouble getting them into the formal syntax
Of course they're still needed! How else are we supposed to include wiki markup as text in a wikipage that's not overly burdenson? (ie, using numeric character entities instead of special wiki symbols.)
So we only need them for the FAQs? :-) But I see your point.
- They are a bit of a security risc because the allow users to get things
like javascript on a page.
If that's the case, that's a serious bug. <nowiki> should mean no *wiki* markup interpretation, not no *HTML* safeguarding.
Yup, I tried it on my Sandbox, look at the bottom:
http://www.wikipedia.com/wiki/User:Jan_Hidders/Sandbox
At the moment I don't understand Lee's code enough to say if there is any HTML safeguarding going on in the <nowiki> parts, but as far as I can tell there isn't.
But this can be remedied fairly easy, just replace all the <'s and >'s with their corresponding entities in the <nowiki> parts. That's even correct in some sense because we consider HTML as part of the wiki markup. :-/
Lee, should I make a bug report of this?
-- Jan Hidders
I'm not sure I'm the right person to raise this question but I wondered what the current thinking is on adapting the code for other character sets. If I recall correctly we or now assuming UTF-8, right? What exactly does that mean, btw? That we changed the MySQL character tables for those above 7F? Anyting else?
I'm asking this because I know that some writers on the German Wikipedia asked for an easy way to type accents. So I wondered if we should add a special pre-parse function that is language-dependent, i.e, defined in language??.php, and is called on edit text, title strings and search expressions before they are processed. The Germans could define it for example such that ö is always translated to "o (or they could in fact use '"o' as a notation, if they wanted). They could then unambiguously search for Goedel. I assume that the Polish would want similar stuff, and perhaps the people at Vikipedio would also like a special notatation for their special letters, but I know the situation is a bit more complicated there.
-- Jan Hidders
wikitech-l@lists.wikimedia.org