FWIW, Count may be slow, but the slow step in generating the Category
page is probably a combination of sorting the return in mysql and
stepping through the query results in php. On my outdated Mac OSX
dual G5 server, I get the following with my pre-beta extension, using
a category with 124,520 articles and 3 subcategories
Count the articles: 5 sec
Do the query method: 16-24 sec
n = 3
For a category with 74945 articles
Count the articles: 2-3 sec
Do the query method: 9-10 sec
n = 3
this is by putting time() calls in the code and subtracting. This is
right on the edge of not working at all, and sometimes when I load
the page it fails. Of course, I ran the tests while some spammer was
trying to flood my server on another application. The query numbers
are much higher than one would get for the real CategoryPage.php,
since I the query to collect all the subcategories has to traverse
the whole list of categorylinks.
I'm hoping faster hardware will help. Or at some point, perhaps I
should just show the count and write a message to the screen like:
"You really don't want to click through all the pages of articles.
Try a subcategory instead!"
;)
More seriously, I wonder if it would be worth having a cron job
periodically generate denormalized tables for the articles and
subcategories in general.
Oh, and I'm still wondering about the inheritance problem.
Jim
=====================================
Jim Hu
Associate Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054
On Feb 20, 2007, at 2:04 PM, Jim Hu wrote:
Yeah, I'm using COUNT(*)...and it is slow. But
for some uses on low
traffic sites like mine, it's a tradeoff some sysadmins will be
willing to take. I don't see this getting used on wikipedia for
performance reasons, as you note.
I was thinking about how one might cache the count, but then it seems
to me that you need all kinds of triggers for whenever the category
links change... I'm much too much of a newbie to dive that far into
the code...at least so far, and I don't know if this can even be done
as an extension without a lot of new hooks.
I'm sure the rest of you have given this much more thought than I
have.
=====================================
Jim Hu
Associate Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054
On Feb 20, 2007, at 1:14 PM, Simetrical wrote:
On 2/20/07, Jim Hu <jimhu(a)tamu.edu> wrote:
I'm developing the extension for 1.8.3. I
have now gotten the basic
version working with the minor hack to CategoryPage.php.
So far, it improves on CategoryPage.php in two ways:
Shows the subcategories no matter how many articles are shown
Shows the real counts for articles when the number is >200.
I'm hoping to make it more like Special:Allpages next...I'm using it
on a wiki where there are categories with >100K articles in the
biggest categories:
http://gowiki.tamu.edu/GO/wiki/index.php/Category:Eukaryota
Paging through 200 at a time gets old fast!!
Are you using COUNT(*)? That's O(N) on InnoDB, which is why we don't
use it: too slow for large categories. If you have an extra field in
the database somewhere rather than COUNT(*), maybe it would be good
for trunk (although that's for Tim, Domas, etc. to decide).
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l