Hi,
I'm trying to write an extension that makes an alternative category page where the subcategories are split off from the articles. I'm hooking at CategoryPageView, and I thought that the simplest approach would be to extend the CategoryViewer class from CategoryPage.php. But when I try that, I get:
[Tue Feb 20 07:26:33 2007] [error] PHP Fatal error: Class 'Article' not found in /Library/WebServer/Documents/wiki/includes/ CategoryPage.php on line 15
This is after I require CategoryPage.php in the extension. Requiring Article.php in the extension doesn't seem to help, but it works if I hack in a
require_once("$IP/includes/Article.php");
in CategoryPage.php itself. Obviously, CategoryPage doesn't normally need this, and I'd prefer to not hack the base code. I know that I could get around this by just copying the CategoryViewer code into the extension, but if I can I'd like to do this by inheritance.
I don't understand why this happens. The hook is called from CategoryPage.php, so why does it have to be required again? Any explanations or suggestions? Thanks!!
Jim ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
Spoke too soon. Adding the require in CategoryPage.php prevents my test wiki from crashing always, but it still crashes when trying to view a category. :( ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
On Feb 20, 2007, at 7:50 AM, Jim Hu wrote:
Hi,
I'm trying to write an extension that makes an alternative category page where the subcategories are split off from the articles. I'm hooking at CategoryPageView, and I thought that the simplest approach would be to extend the CategoryViewer class from CategoryPage.php. But when I try that, I get:
[Tue Feb 20 07:26:33 2007] [error] PHP Fatal error: Class 'Article' not found in /Library/WebServer/Documents/wiki/includes/ CategoryPage.php on line 15
This is after I require CategoryPage.php in the extension. Requiring Article.php in the extension doesn't seem to help, but it works if I hack in a
require_once("$IP/includes/Article.php");
in CategoryPage.php itself. Obviously, CategoryPage doesn't normally need this, and I'd prefer to not hack the base code. I know that I could get around this by just copying the CategoryViewer code into the extension, but if I can I'd like to do this by inheritance.
I don't understand why this happens. The hook is called from CategoryPage.php, so why does it have to be required again? Any explanations or suggestions? Thanks!!
Jim
Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 2/20/07, Jim Hu jimhu@tamu.edu wrote:
I'm trying to write an extension that makes an alternative category page where the subcategories are split off from the articles. I'm hooking at CategoryPageView, and I thought that the simplest approach would be to extend the CategoryViewer class from CategoryPage.php. But when I try that, I get:
[Tue Feb 20 07:26:33 2007] [error] PHP Fatal error: Class 'Article' not found in /Library/WebServer/Documents/wiki/includes/ CategoryPage.php on line 15
What version is this? I had some idea this shouldn't happen in 1.7+ due to autoloading. Not totally sure, though, just a guess.
I'm developing the extension for 1.8.3. I have now gotten the basic version working with the minor hack to CategoryPage.php.
So far, it improves on CategoryPage.php in two ways: Shows the subcategories no matter how many articles are shown Shows the real counts for articles when the number is >200.
I'm hoping to make it more like Special:Allpages next...I'm using it on a wiki where there are categories with >100K articles in the biggest categories:
http://gowiki.tamu.edu/GO/wiki/index.php/Category:Eukaryota
Paging through 200 at a time gets old fast!! ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
On Feb 20, 2007, at 11:56 AM, Simetrical wrote:
On 2/20/07, Jim Hu jimhu@tamu.edu wrote:
I'm trying to write an extension that makes an alternative category page where the subcategories are split off from the articles. I'm hooking at CategoryPageView, and I thought that the simplest approach would be to extend the CategoryViewer class from CategoryPage.php. But when I try that, I get:
[Tue Feb 20 07:26:33 2007] [error] PHP Fatal error: Class 'Article' not found in /Library/WebServer/Documents/wiki/includes/ CategoryPage.php on line 15
What version is this? I had some idea this shouldn't happen in 1.7+ due to autoloading. Not totally sure, though, just a guess.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 2/20/07, Jim Hu jimhu@tamu.edu wrote:
I'm developing the extension for 1.8.3. I have now gotten the basic version working with the minor hack to CategoryPage.php.
So far, it improves on CategoryPage.php in two ways: Shows the subcategories no matter how many articles are shown Shows the real counts for articles when the number is >200.
I'm hoping to make it more like Special:Allpages next...I'm using it on a wiki where there are categories with >100K articles in the biggest categories:
http://gowiki.tamu.edu/GO/wiki/index.php/Category:Eukaryota
Paging through 200 at a time gets old fast!!
Are you using COUNT(*)? That's O(N) on InnoDB, which is why we don't use it: too slow for large categories. If you have an extra field in the database somewhere rather than COUNT(*), maybe it would be good for trunk (although that's for Tim, Domas, etc. to decide).
On 2/20/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Are you using COUNT(*)? That's O(N) on InnoDB, which is why we don't use it: too slow for large categories. If you have an extra field in the database somewhere rather than COUNT(*), maybe it would be good for trunk (although that's for Tim, Domas, etc. to decide).
O(N).. er. Ah, you mean O(N) on results...
On 2/20/07, Gregory Maxwell gmaxwell@gmail.com wrote:
On 2/20/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Are you using COUNT(*)? That's O(N) on InnoDB, which is why we don't use it: too slow for large categories. If you have an extra field in the database somewhere rather than COUNT(*), maybe it would be good for trunk (although that's for Tim, Domas, etc. to decide).
O(N).. er. Ah, you mean O(N) on results...
I believe it's O(N) on number of rows in the table, actually, not just on number of results (which any query is going to be, I guess, although I don't know much of anything about computational complexity). I was really just repeating what Tim said:
It's a common pitfall for new developers to submit code containing SQL queries which examine huge numbers of rows. Remember that COUNT(*) is O(N), counting rows in a table is like counting beans in a bucket.
http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/docs/database.txt?vie...
Apparently this is because of the transactional nature of the tables http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html. I'm not sure why it couldn't maintain a slightly-inaccurate count (so that behavior doesn't differ between storage engines, I guess), but it doesn't, apparently, so it has to recount them every time.
Then again, probably I misunderstood you and am not telling you anything you didn't already know.
Yeah, I'm using COUNT(*)...and it is slow. But for some uses on low traffic sites like mine, it's a tradeoff some sysadmins will be willing to take. I don't see this getting used on wikipedia for performance reasons, as you note.
I was thinking about how one might cache the count, but then it seems to me that you need all kinds of triggers for whenever the category links change... I'm much too much of a newbie to dive that far into the code...at least so far, and I don't know if this can even be done as an extension without a lot of new hooks.
I'm sure the rest of you have given this much more thought than I have. ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
On Feb 20, 2007, at 1:14 PM, Simetrical wrote:
On 2/20/07, Jim Hu jimhu@tamu.edu wrote:
I'm developing the extension for 1.8.3. I have now gotten the basic version working with the minor hack to CategoryPage.php.
So far, it improves on CategoryPage.php in two ways: Shows the subcategories no matter how many articles are shown Shows the real counts for articles when the number is >200.
I'm hoping to make it more like Special:Allpages next...I'm using it on a wiki where there are categories with >100K articles in the biggest categories:
http://gowiki.tamu.edu/GO/wiki/index.php/Category:Eukaryota
Paging through 200 at a time gets old fast!!
Are you using COUNT(*)? That's O(N) on InnoDB, which is why we don't use it: too slow for large categories. If you have an extra field in the database somewhere rather than COUNT(*), maybe it would be good for trunk (although that's for Tim, Domas, etc. to decide).
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
FWIW, Count may be slow, but the slow step in generating the Category page is probably a combination of sorting the return in mysql and stepping through the query results in php. On my outdated Mac OSX dual G5 server, I get the following with my pre-beta extension, using a category with 124,520 articles and 3 subcategories
Count the articles: 5 sec Do the query method: 16-24 sec n = 3
For a category with 74945 articles Count the articles: 2-3 sec Do the query method: 9-10 sec n = 3
this is by putting time() calls in the code and subtracting. This is right on the edge of not working at all, and sometimes when I load the page it fails. Of course, I ran the tests while some spammer was trying to flood my server on another application. The query numbers are much higher than one would get for the real CategoryPage.php, since I the query to collect all the subcategories has to traverse the whole list of categorylinks.
I'm hoping faster hardware will help. Or at some point, perhaps I should just show the count and write a message to the screen like:
"You really don't want to click through all the pages of articles. Try a subcategory instead!" ;) More seriously, I wonder if it would be worth having a cron job periodically generate denormalized tables for the articles and subcategories in general.
Oh, and I'm still wondering about the inheritance problem.
Jim
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
On Feb 20, 2007, at 2:04 PM, Jim Hu wrote:
Yeah, I'm using COUNT(*)...and it is slow. But for some uses on low traffic sites like mine, it's a tradeoff some sysadmins will be willing to take. I don't see this getting used on wikipedia for performance reasons, as you note.
I was thinking about how one might cache the count, but then it seems to me that you need all kinds of triggers for whenever the category links change... I'm much too much of a newbie to dive that far into the code...at least so far, and I don't know if this can even be done as an extension without a lot of new hooks.
I'm sure the rest of you have given this much more thought than I have. ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
On Feb 20, 2007, at 1:14 PM, Simetrical wrote:
On 2/20/07, Jim Hu jimhu@tamu.edu wrote:
I'm developing the extension for 1.8.3. I have now gotten the basic version working with the minor hack to CategoryPage.php.
So far, it improves on CategoryPage.php in two ways: Shows the subcategories no matter how many articles are shown Shows the real counts for articles when the number is >200.
I'm hoping to make it more like Special:Allpages next...I'm using it on a wiki where there are categories with >100K articles in the biggest categories:
http://gowiki.tamu.edu/GO/wiki/index.php/Category:Eukaryota
Paging through 200 at a time gets old fast!!
Are you using COUNT(*)? That's O(N) on InnoDB, which is why we don't use it: too slow for large categories. If you have an extra field in the database somewhere rather than COUNT(*), maybe it would be good for trunk (although that's for Tim, Domas, etc. to decide).
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org