[Wikitech-l] Re: [Wikipedia-l] what's being done about performance?

13 Jul 2002

Eliminating 'unsuccessful search' and 'special' pages from the count 
gives the following stats:

Analysing 100,000 lines from the raw log with this filtering gives:

bin in seconds, total pages, cumulative percentage

0 57360 83.443651%
1 6929 93.523516%
2 2028 96.473720%
3 1034 97.977917%
4 640 98.908948%
5 314 99.365735%
6 157 99.594129%
7 81 99.711962%
8 61 99.800701%
9 46 99.867619%
10 18 99.893804%
11 12 99.911261%
12 16 99.934537%
13 13 99.953448%
14 6 99.962177%
15 6 99.970905%
16 6 99.979634%
17 2 99.982543%
18 0 99.982543%
19 3 99.986907%
20 2 99.989817%

summary 68741 hits in  41366.343 secs, avg = 0.601771039118
only 9 non-special pages took over 20 seconds: here they are:

20020713011714 28.783 /wiki/Historical_anniversaries
20020713012523 20.301 /wiki/Sport
20020713014205 23.161 /wiki/Federal_Standard_1037C
20020713014723 25.357 
/w/wiki.phtml?title=Free_On-line_Dictionary_of_Computing/O
_-_Q&redirect=no
20020713015936 21.513 
/w/wiki.phtml?title=Wikipedia:Bug_reports&action=history
20020713022203 25.252 /wiki/Free_On-line_Dictionary_of_Computing/L_-_N
20020713025105 29.975 
/w/wiki.phtml?title=Free_On-line_Dictionary_of_Computing/E_-_H&redirect=no
20020713033140 20.802 /wiki/Feature_requests
20020713043401 41.392 
/w/wiki.phtml?title=Complete_list_of_encyclopedia_topics/R&diff=78830&oldid=71983

It's interesting to note that random spidering hits 'special' pages 
about 30% of the time.

Where the page accesses have been binned by the integer part of their 
service time as recorded in the logs.

This is looking really good.

-------------------------------------------------

SUGGESTION #1:

Looking at the logs suggests that many of the worst results are 
generated on the special page options with large counts -- particularly 
the versions with count=5000.

Here's my proposal: we should not list the options with count > 500 for 
users *that are not logged in*.

So, at the bottom of the orphans page, a logged in user would see

  View (previous 50) (next 50) (20 | 50 | 100 | 250 | 500 | 1000 | 2500 
| 5000).

and an casual browser (and any busy bots or spiders) would see

  View (previous 50) (next 50) (20 | 50 | 100 | 250 | 500 ).

Random selection from the first list will search on average
50+50+20+50+100+250+500+1000+2500+5000 / 10 = 952 pages

Random selection from the second list will search on average
50+50+20+50+100+250+500 / 10 = 102 pages

a reduction in load of almost an order of magnitude.

Removing these big outlier loads may well take some of the strain off 
ordinary page loads that happen to occur at the same time.

------------------------------------------------

SUGGESTION #2:

The 'Unsuccessful search' pages can be enormous. They accumulate all the 
bad searches in a whole month. As Wikipedia becomes more popular, they 
have become huge, and they now take a long time to load. We should make 
these weekly or daily instead of monthly, and perhaps split up the old 
ones using a script.

This will also have the effect of improving the 'most wanted' rating of 
frequently missed searches, as currently only one instance a month counts.

Or perhaps they should be generated as a special page from the database?

---------------------------------------------------

Neil

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: [Wikipedia-l] what's being done about performance?