FWIW, Count may be slow, but the slow step in generating the Category page is probably a combination of sorting the return in mysql and stepping through the query results in php. On my outdated Mac OSX dual G5 server, I get the following with my pre-beta extension, using a category with 124,520 articles and 3 subcategories
Count the articles: 5 sec Do the query method: 16-24 sec n = 3
For a category with 74945 articles Count the articles: 2-3 sec Do the query method: 9-10 sec n = 3
this is by putting time() calls in the code and subtracting. This is right on the edge of not working at all, and sometimes when I load the page it fails. Of course, I ran the tests while some spammer was trying to flood my server on another application. The query numbers are much higher than one would get for the real CategoryPage.php, since I the query to collect all the subcategories has to traverse the whole list of categorylinks.
I'm hoping faster hardware will help. Or at some point, perhaps I should just show the count and write a message to the screen like:
"You really don't want to click through all the pages of articles. Try a subcategory instead!" ;) More seriously, I wonder if it would be worth having a cron job periodically generate denormalized tables for the articles and subcategories in general.
Oh, and I'm still wondering about the inheritance problem.
Jim
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
On Feb 20, 2007, at 2:04 PM, Jim Hu wrote:
Yeah, I'm using COUNT(*)...and it is slow. But for some uses on low traffic sites like mine, it's a tradeoff some sysadmins will be willing to take. I don't see this getting used on wikipedia for performance reasons, as you note.
I was thinking about how one might cache the count, but then it seems to me that you need all kinds of triggers for whenever the category links change... I'm much too much of a newbie to dive that far into the code...at least so far, and I don't know if this can even be done as an extension without a lot of new hooks.
I'm sure the rest of you have given this much more thought than I have. ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
On Feb 20, 2007, at 1:14 PM, Simetrical wrote:
On 2/20/07, Jim Hu jimhu@tamu.edu wrote:
I'm developing the extension for 1.8.3. I have now gotten the basic version working with the minor hack to CategoryPage.php.
So far, it improves on CategoryPage.php in two ways: Shows the subcategories no matter how many articles are shown Shows the real counts for articles when the number is >200.
I'm hoping to make it more like Special:Allpages next...I'm using it on a wiki where there are categories with >100K articles in the biggest categories:
http://gowiki.tamu.edu/GO/wiki/index.php/Category:Eukaryota
Paging through 200 at a time gets old fast!!
Are you using COUNT(*)? That's O(N) on InnoDB, which is why we don't use it: too slow for large categories. If you have an extra field in the database somewhere rather than COUNT(*), maybe it would be good for trunk (although that's for Tim, Domas, etc. to decide).
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l