Source code and discussion follow below.
Over the last 24 hours I've polled some URLs each 10 minutes and measured the response time. This is a statistics report:
min avg max (count) <2 2-5 5-15 >15 URL - response times in seconds 0.75 2.00 11.79 ( 144) 68% 27% 4% 0% http://pl.wikipedia.com/wiki.cgi?Szwecja 0.56 2.70 89.78 ( 144) 95% 0% 0% 3% http://www.wikipedia.com/wiki.png 1.47 2.90 8.63 ( 144) 27% 64% 7% 0% http://pl.wikipedia.com/wiki.cgi?Ostatnie_zmiany 1.36 2.98 28.68 ( 144) 30% 62% 6% 0% http://pl.wikipedia.com/ 0.83 3.21 91.86 ( 144) 94% 0% 0% 4% http://eo.wikipedia.com/vikio.png 1.00 4.45 140.53 ( 144) 63% 29% 2% 4% http://eo.wikipedia.com/wiki/Svedio 1.08 5.70 137.43 ( 144) 53% 34% 7% 4% http://eo.wikipedia.com/ 3.40 7.09 68.23 ( 144) 0% 38% 56% 4% http://eo.wikipedia.com/wiki/Lastaj_Sxangxoj 3.35 13.59 203.39 ( 144) 0% 18% 63% 18% http://www.wikipedia.com/wiki/special:RecentChanges 1.61 28.38 411.43 ( 144) 4% 23% 33% 38% http://www.wikipedia.com/wiki/Sweden 3.86 45.16 359.17 ( 144) 0% 2% 38% 58% http://www.wikipedia.com/
The first three columns present the minimum, average, and maximum response times. The rows are sorted on the average column. As you can see the minimums are very good: 0.56 seconds roundtrip from Sweden to San Diego is excellent. Even the English Wikipedia start page (with the worst average in the list) has been served on 3.86 seconds, which is quite good (this happened 6:10 am GMT). However, the striking numbers are the maximum response times of several minutes.
The fourth column is the number of samples, which is 144 in 24 hours.
The following four columns present the statistical distribution of samples in four categories: The percentage of samples that were less than 2 seconds, those between 2 and 5 seconds, those between 5 and 15 seconds, and those in excess of 15 seconds. I thing usability gurus like Jakob Nielsen has declared that 5 seconds is an acceptable maximum for normal pages and most people can accept 15 seconds response time for special functions such as searches and the recent changes list.
The Polish Wikipedia has not a single sample above 15 seconds in these 24 hours. The majority of samples is in the lower two categories, which is very good.
The Esperanto Wikipedia has a small number of samples above 15 seconds, which is sad but perhaps not alarming. The "recent changes" has 56 % of its samples in the high 5-15 seconds response time interval, which is a little high. Perhaps this could be fixed by setting the default "recent changes" list from 7 or 3 days. Who can change this setting? Tell me when you change, and I will report how the response time changed. Almost all other samples are in the 0-2 and 2-5 categories, which is very good.
For the English Wikipedia, the static logotype image is served in less than 2 seconds in 95 % of the samples. For this URL, there are no samples in the 2-5 or 5-15 intervals, but a few samples have very long response times. Perhaps the entire server was put on hold by some other event? The last three lines of the report are depressing. Almost none of the samples fall in the 0-2 or 2-5 categories. This has to be analysed further by instrumenting the source code to report where the delay is introduced.
I now have a running copy of the Wikipedia on my computer and have started to experiment with this instrumentation. It's really straight forward. In version 1.14 of wiki.phtml, Magnus Manske introduced the function getmicrotime(), but the call to the function is commented out. Just after getmicrotime(), I introduce a new function:
function trace($text){ global $startTime, $traceText; $now = getmicrotime(); $elapsed = $now - $startTime; if ($elapsed > 3.0) { $traceText = "$traceText\nAfter $elapsed seconds: $text"; $startTime = $now; } }
Then at the beginning of the "main" program (where Magnus left a commented-out first call to getmicrotime), I declare:
global $startTime, $traceText; $startTime = getmicrotime(); $traceText = "";
Then at various points throughout the code, just after function calls that I suspect are time bandits, I insert calls to my trace function:
trace("Just after updating the database");
These informative texts will accumulate in $traceText if the elapsed time since the start is more than 3 seconds.
At the end of the "main" program comes the question, what should be done with this $traceText? Should it be inserted into a new database table? Or appended to the end of a text log file? Or inserted as an HTML comment into the generated web page? Where can we best use this information? The easiest but perhaps least useful is this:
trace("bottom of wiki.phtml main"); if ($traceText != "") $out = "<!-- traceText:$traceText -->\n$out";
Who can implement this in the real source code? I'm not in the gang.
Just a couple quick notes for now...
On dim, 2002-05-05 at 13:21, Lars Aronsson wrote:
The Esperanto Wikipedia has a small number of samples above 15 seconds, which is sad but perhaps not alarming. The "recent changes" has 56 % of its samples in the high 5-15 seconds response time interval, which is a little high.
Note also that the perl-based Esperanto wiki filters output through a character set conversion and doesn't do any page caching (since caching didn't interact well with the conversion), which is bound to slow it down a little bit.
Perhaps this could be fixed by setting the default "recent changes" list from 7 or 3 days. Who can change this setting? Tell me when you change, and I will report how the response time changed. Almost all other samples are in the 0-2 and 2-5 categories, which is very good.
Have you tried running your tests on, say, http://eo.wikipedia.com/wiki.cgi?action=rc&days=3 ?
-- brion vibber (brion @ pobox.com)
Brion L. VIBBER wrote:
Have you tried running your tests on, say, http://eo.wikipedia.com/wiki.cgi?action=rc&days=3 ?
I did a few samples now, but couldn't see any difference. I'll let it run for the next 24 hours, though.
wikitech-l@lists.wikimedia.org