---------- Forwarded Message ----------
Subject: [Wikitech-l] Multiple categories Date: Sun, 30 May 2004 12:07:09 +0200 From: Nikola Smolenski smolensk@EUnet.yu To: Wikimedia developers wikitech-l@Wikipedia.org
I have made a patch which enables display of all articles that belong to multiple categories. Thus, for example, Category:Female/Orthodox/Saint would return all female orthodox saints (that is, all articles which are categorised with at least [[Category:Female]], [[Category:Orthodox]] and [[Category:Saint]]).
I am not sure would this make for much greater database load, but that remains to be seen in practice. Line if(strchr($t,"/")) { and corresponding else block are actually not needed but I have left them because I figure that display of a single category is slightly faster this way, and that at that point it would be easier to turn off this feature if needed. It should also be easy to limit to number of categories if that turns out to be neccesary.
Index: Parser.php =================================================================== RCS file: /cvsroot/wikipedia/phase3/includes/Parser.php,v retrieving revision 1.135 diff -u -3 -p -r1.135 Parser.php --- Parser.php 15 May 2004 00:29:07 -0000 1.135 +++ Parser.php 30 May 2004 09:11:00 -0000 @@ -318,7 +318,21 @@ class Parser
# FIXME: add limits $t = wfStrencode( $this->mTitle->getDBKey() ); - $sql = "SELECT DISTINCT cur_title,cur_namespace FROM cur,categorylinks WHERE cl_to='$t' AND cl_from=cur_id ORDER BY cl_sortkey" ; + $t = preg_replace("'/+'","/",$t); + if(strchr($t,"/")) { + $ta = explode("/",$t); + $tt="cur"; $tw=""; + + $i=0; + foreach($ta as $v) { + $tt.=",categorylinks as cl$i"; + $tw.="cl$i.cl_to='$v' AND cl$i.cl_from=cl".++$i.".cl_from AND "; + } + $tw=preg_replace("' AND cl[0-9]+[.]cl_from=cl[0-9]+[.]cl_from AND $'U","",$tw); + $sql="SELECT DISTINCT cur_title,cur_namespace FROM $tt WHERE $tw AND cl0.cl_from=cur_id ORDER BY cl0.cl_sortkey"; + } else { + $sql = "SELECT DISTINCT cur_title,cur_namespace FROM cur,categorylinks WHERE cl_to='$t' AND cl_from=cur_id ORDER BY cl_sortkey" ; + } $res = wfQuery ( $sql, DB_READ ) ; while ( $x = wfFetchObject ( $res ) ) $data[] = $x ;
_______________________________________________ Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
-------------------------------------------------------
Nikola Smolenski wrote:
I have made a patch which enables display of all articles that belong to multiple categories.
I think the following patch is slightly more efficient.
The database query itself is very inefficient though. If this was Perl, I could make it way more efficient. In short, the "DISTINCT" and "ORDER BY" are performance killers. They should be done in PHP instead.
Timwi
Index: includes/Parser.php =================================================================== RCS file: /cvsroot/wikipedia/phase3/includes/Parser.php,v retrieving revision 1.135 diff -u -r1.135 Parser.php --- includes/Parser.php 15 May 2004 00:29:07 -0000 1.135 +++ includes/Parser.php 29 Jun 2004 22:34:32 -0000 @@ -318,7 +318,25 @@
# FIXME: add limits $t = wfStrencode( $this->mTitle->getDBKey() ); - $sql = "SELECT DISTINCT cur_title,cur_namespace FROM cur,categorylinks WHERE cl_to='$t' AND cl_from=cur_id ORDER BY cl_sortkey" ; + $t = preg_replace("'/+'","/",$t); + if(strchr($t,"/")) { + $ta = explode("/",$t); + $tt = "cur c"; + $tw = ""; + + $i = 0; + foreach($ta as $v) { + $tt .= ", categorylinks cl$i"; + if ($tw != "") { $tw .= " AND "; } + $tw .= "cl$i.cl_to='$v' AND cl$i.cl_from=c.cur_id"; + $i++; + } + $sql = "SELECT DISTINCT c.cur_title, c.cur_namespace FROM $tt " . + " WHERE $tw ORDER BY cl0.cl_sortkey"; + } else { + $sql = "SELECT DISTINCT c.cur_title, c.cur_namespace FROM cur c, categorylinks cl " . + " WHERE cl.cl_to='$t' AND cl.cl_from=c.cur_id ORDER BY cl.cl_sortkey" ; + } $res = wfQuery ( $sql, DB_READ ) ; while ( $x = wfFetchObject ( $res ) ) $data[] = $x ;
On Wednesday 30 June 2004 00:35, Timwi wrote:
Nikola Smolenski wrote:
I have made a patch which enables display of all articles that belong to multiple categories.
I think the following patch is slightly more efficient.
I don't think that checking inside a loop is generally more efficient than a single call to preg_replace; however, it could be for smaller number of iterations which is the most likely case. It really should be checked.
The database query itself is very inefficient though. If this was Perl, I could make it way more efficient. In short, the "DISTINCT" and "ORDER BY" are performance killers. They should be done in PHP instead.
These take place after WHERE is executed and should have no effect on database performance other than that which already exists for selecting of a single category.
On Tue, Jun 29, 2004 at 01:16:43AM +0200, Nikola Smolenski wrote:
---------- Forwarded Message ----------
Subject: [Wikitech-l] Multiple categories Date: Sun, 30 May 2004 12:07:09 +0200 From: Nikola Smolenski smolensk@EUnet.yu To: Wikimedia developers wikitech-l@Wikipedia.org
I have made a patch which enables display of all articles that belong to multiple categories. Thus, for example, Category:Female/Orthodox/Saint would return all female orthodox saints (that is, all articles which are categorised with at least [[Category:Female]], [[Category:Orthodox]] and [[Category:Saint]]).
There are several category names with a / in some languages:
==en== +----------------------------+ | 9/11 | | 9/11_Commission | | AC/DC | | AC/DC_albums | | HIV/AIDS | | Hebrew_Bible/Tanakh | | Hebrew_Bible/Tanakh_people | | Hebrew_Bible/Tanakh_places | | Jewish_texts/Ketuvim | | Jewish_texts/Nevi'im | +----------------------------+
==de== +-------------------------------------+ | Deutsche/r | | Deutsche/r_Schriftsteller/in | | Nationalpark_in_Australien/Ozeanien | | Romanist/in | | Sportturnier_/_-wettbewerb | +-------------------------------------+
==da== +----------------------+ | Blomster_i_april/maj | | Blomster_i_maj/juni | +----------------------+
Those would be broken by your patch.
Regards,
JeLuF
Jens Frank wrote:
There are several category names with a / in some languages:
Although some of them really really need to be renamed ("9/11" should be "September 11, 2001 attacks"; "Deutsche/r" should be "Deutsche(r)"), some are valid (e.g. "Sportturnier / -wettbewerb").
It seems that the pipe ("|") character is disallowed in our article titles (and hence, in categories too), but allowed in URLs (I couldn't find the relevant RfC on this for definite confirmation; can someone find it?). How about we use that instead?
Of course, the problem with that is that we can't link to combined categories. Any alternative suggestions are welcome.
Index: includes/Parser.php =================================================================== RCS file: /cvsroot/wikipedia/phase3/includes/Parser.php,v retrieving revision 1.135 diff -u -r1.135 Parser.php --- includes/Parser.php 15 May 2004 00:29:07 -0000 1.135 +++ includes/Parser.php 3 Jul 2004 10:49:33 -0000 @@ -318,7 +318,25 @@
# FIXME: add limits $t = wfStrencode( $this->mTitle->getDBKey() ); - $sql = "SELECT DISTINCT cur_title,cur_namespace FROM cur,categorylinks WHERE cl_to='$t' AND cl_from=cur_id ORDER BY cl_sortkey" ; + $t = preg_replace("'\|\|+'","|",$t); + if(strchr($t,"|")) { + $ta = explode("|",$t); + $tt = "cur c"; + $tw = ""; + + $i = 0; + foreach($ta as $v) { + $tt .= ", categorylinks cl$i"; + if ($tw != "") { $tw .= " AND "; } + $tw .= "cl$i.cl_to='$v' AND cl$i.cl_from=c.cur_id"; + $i++; + } + $sql = "SELECT DISTINCT c.cur_title, c.cur_namespace FROM $tt " . + " WHERE $tw ORDER BY cl0.cl_sortkey"; + } else { + $sql = "SELECT DISTINCT c.cur_title, c.cur_namespace FROM cur c, categorylinks cl " . + " WHERE cl.cl_to='$t' AND cl.cl_from=c.cur_id ORDER BY cl.cl_sortkey" ; + } $res = wfQuery ( $sql, DB_READ ) ; while ( $x = wfFetchObject ( $res ) ) $data[] = $x ;
Timwi wrote:
Jens Frank wrote:
There are several category names with a / in some languages:
It seems that the pipe ("|") character is disallowed in our article titles (and hence, in categories too), but allowed in URLs (I couldn't find the relevant RfC on this for definite confirmation; can someone find it?). How about we use that instead?
Actually, I have a better idea now. How about two slashes (//)? Then we can have something like "Category:9/11//People" to have the intersection of Category:9/11 and Category:People.
Timwi
Index: includes/Parser.php =================================================================== RCS file: /cvsroot/wikipedia/phase3/includes/Parser.php,v retrieving revision 1.135 diff -u -r1.135 Parser.php --- includes/Parser.php 15 May 2004 00:29:07 -0000 1.135 +++ includes/Parser.php 3 Jul 2004 16:14:36 -0000 @@ -318,7 +318,25 @@
# FIXME: add limits $t = wfStrencode( $this->mTitle->getDBKey() ); - $sql = "SELECT DISTINCT cur_title,cur_namespace FROM cur,categorylinks WHERE cl_to='$t' AND cl_from=cur_id ORDER BY cl_sortkey" ; + $t = preg_replace("'///+'","//",$t); + if(strchr($t,"//")) { + $ta = explode("//",$t); + $tt = "cur c"; + $tw = ""; + + $i = 0; + foreach($ta as $v) { + $tt .= ", categorylinks cl$i"; + if ($tw != "") { $tw .= " AND "; } + $tw .= "cl$i.cl_to='$v' AND cl$i.cl_from=c.cur_id"; + $i++; + } + $sql = "SELECT DISTINCT c.cur_title, c.cur_namespace FROM $tt " . + " WHERE $tw ORDER BY cl0.cl_sortkey"; + } else { + $sql = "SELECT DISTINCT c.cur_title, c.cur_namespace FROM cur c, categorylinks cl " . + " WHERE cl.cl_to='$t' AND cl.cl_from=c.cur_id ORDER BY cl.cl_sortkey" ; + } $res = wfQuery ( $sql, DB_READ ) ; while ( $x = wfFetchObject ( $res ) ) $data[] = $x ;
On Saturday 03 July 2004 18:14, Timwi wrote:
Timwi wrote:
Jens Frank wrote:
There are several category names with a / in some languages:
It seems that the pipe ("|") character is disallowed in our article titles (and hence, in categories too), but allowed in URLs (I couldn't find the relevant RfC on this for definite confirmation; can someone find it?). How about we use that instead?
Actually, I have a better idea now. How about two slashes (//)? Then we can have something like "Category:9/11//People" to have the intersection of Category:9/11 and Category:People.
I like this idea. By the way, whichever is done,
$t = preg_replace("'///+'","//",$t); if(strchr($t,"//")) {
should really be
if(strchr($t,"//")) { $t = preg_replace("'///+'","//",$t);
Jens Frank wrote:
On Tue, Jun 29, 2004 at 01:16:43AM +0200, Nikola Smolenski wrote:
---------- Forwarded Message ----------
Subject: [Wikitech-l] Multiple categories Date: Sun, 30 May 2004 12:07:09 +0200 From: Nikola Smolenski smolensk@EUnet.yu To: Wikimedia developers wikitech-l@Wikipedia.org
I have made a patch which enables display of all articles that belong to multiple categories. Thus, for example, Category:Female/Orthodox/Saint would return all female orthodox saints (that is, all articles which are categorised with at least [[Category:Female]], [[Category:Orthodox]] and [[Category:Saint]]).
There are several category names with a / in some languages:
==en== +----------------------------+ | 9/11 | | 9/11_Commission | | AC/DC | | AC/DC_albums | | HIV/AIDS | | Hebrew_Bible/Tanakh | | Hebrew_Bible/Tanakh_people | | Hebrew_Bible/Tanakh_places | | Jewish_texts/Ketuvim | | Jewish_texts/Nevi'im | +----------------------------+
==de== +-------------------------------------+ | Deutsche/r | | Deutsche/r_Schriftsteller/in | | Nationalpark_in_Australien/Ozeanien | | Romanist/in | | Sportturnier_/_-wettbewerb | +-------------------------------------+
==da== +----------------------+ | Blomster_i_april/maj | | Blomster_i_maj/juni | +----------------------+
Those would be broken by your patch.
These are very few. Could these diagonals be replaced by something else in the category names, and the diagonal otherwise disallowed in category titles?
Ec
Ray Saintonge wrote:
[ on categories with / in them ]
These are very few. Could these diagonals be replaced by something else in the category names, and the diagonal otherwise disallowed in category titles?
The "/" is a common/usual punctuation mark, and so should be allowed. Even if we were to try to discourage its use, people would still create categories with it because they would think nothing of it.
Hm, I have a new idea, actually. See my posting in a few minutes.
Timwi
On Saturday 03 July 2004 11:44, Jens Frank wrote:
On Tue, Jun 29, 2004 at 01:16:43AM +0200, Nikola Smolenski wrote:
---------- Forwarded Message ----------
Subject: [Wikitech-l] Multiple categories Date: Sun, 30 May 2004 12:07:09 +0200 From: Nikola Smolenski smolensk@EUnet.yu To: Wikimedia developers wikitech-l@Wikipedia.org
I have made a patch which enables display of all articles that belong to multiple categories. Thus, for example, Category:Female/Orthodox/Saint would return all female orthodox saints (that is, all articles which are categorised with at least [[Category:Female]], [[Category:Orthodox]] and [[Category:Saint]]).
There are several category names with a / in some languages:
Well, I submitted the patch before people started categorising massively, if someone took care about it there might be no such categories. Now, it should be decided is a slash in category name really neccesary; if it isn't, then those categories could be renamed and slash used, and if it is, some other token could be suggested.
By the way, I was already thinking about more precise queries and this might just be the place and time to discuss them.
Displaying all but a single category:
Category:Female/-Orthodox/Saint
would return all female saints which are not Orthodox. I hope that there are no categories which start with - .
Displaying articles which are in one of several cathegories:
Category:Female/Catholic,Protestant/Saint
would return all female saints which are either Catholic or Protestant.It would not be possible for comma to have higher priority than slash, but such a need occurs rarely and if such approach is good enough for Google it should be good enough for us. I again hope that there are no categories which have a comma not followed with a space in their names.
Category:Female/Catholic,Protestant/Saint
Why not Category:Female,Catholic,Protestant,Saints ??? This way, we could order Categories by several ways: Category:Saints lists all category of Saints Category:Female lists all category of Females Category:Female/Saints is the same as Category:Saints/Female
if i wasn't clear, i can try to explain better....
-- Riba
On Wednesday 14 July 2004 13:06, Ribamar Santarosa de Sousa wrote:
Category:Female/Catholic,Protestant/Saint
Why not Category:Female,Catholic,Protestant,Saints ??? This way, we could order Categories by several ways: Category:Saints lists all category of Saints Category:Female lists all category of Females Category:Female/Saints is the same as Category:Saints/Female
Yes, that is what already exists. I was proposing a future extension:
Category:Saint/Pope would return all Popes who are also saints Category:Pope,Patriarch would return all Popes and Patriarchs Category:Saint/Pope,Patriarch would return all saints who were Popes or Patriarchs etc.
By the way, what happened to this patch?
Nikola Smolenski wrote:
On Wednesday 14 July 2004 13:06, Ribamar Santarosa de Sousa wrote:
Category:Female/Catholic,Protestant/Saint
Why not Category:Female,Catholic,Protestant,Saints ??? This way, we could order Categories by several ways: Category:Saints lists all category of Saints Category:Female lists all category of Females Category:Female/Saints is the same as Category:Saints/Female
Yes, that is what already exists. I was proposing a future extension:
Category:Saint/Pope would return all Popes who are also saints Category:Pope,Patriarch would return all Popes and Patriarchs Category:Saint/Pope,Patriarch would return all saints who were Popes or Patriarchs
I don't see how this can be more than marginally useful unless it also searches all subcategories to infinite depth (with recursion checks?!).
Rob
On Sunday 18 July 2004 17:54, Rob Hooft wrote:
Nikola Smolenski wrote:
Category:Saint/Pope would return all Popes who are also saints Category:Pope,Patriarch would return all Popes and Patriarchs Category:Saint/Pope,Patriarch would return all saints who were Popes or Patriarchs
I don't see how this can be more than marginally useful unless it also searches all subcategories to infinite depth (with recursion checks?!).
It would not be so useful with current categories, but I assume that when/if it is implemented, people would make use of it and not create subcategories manually. IMO, the finely-tuned categories which are present now came to be because there is no way to display articles which belong to multiple categories.
Nikola Smolenski wrote:
On Sunday 18 July 2004 17:54, Rob Hooft wrote:
Nikola Smolenski wrote:
Category:Saint/Pope would return all Popes who are also saints Category:Pope,Patriarch would return all Popes and Patriarchs Category:Saint/Pope,Patriarch would return all saints who were Popes or Patriarchs
I don't see how this can be more than marginally useful unless it also searches all subcategories to infinite depth (with recursion checks?!).
It would not be so useful with current categories, but I assume that when/if it is implemented, people would make use of it and not create subcategories manually. IMO, the finely-tuned categories which are present now came to be because there is no way to display articles which belong to multiple categories.
I see what you mean, but with the category system you envision, none of the categories would be very useful to browse by eye, and that happens to be a very fruitful way of browsing wikipedia. And all articles would need to be added to very, very many categories, bringing lots of hard manual work with it. In any system with subcategories, a search function MUST also search the tree.
Rob
On Sunday 18 July 2004 20:49, Rob Hooft wrote:
Nikola Smolenski wrote:
On Sunday 18 July 2004 17:54, Rob Hooft wrote:
Nikola Smolenski wrote:
Category:Saint/Pope would return all Popes who are also saints Category:Pope,Patriarch would return all Popes and Patriarchs Category:Saint/Pope,Patriarch would return all saints who were Popes or Patriarchs
I don't see how this can be more than marginally useful unless it also searches all subcategories to infinite depth (with recursion checks?!).
It would not be so useful with current categories, but I assume that when/if it is implemented, people would make use of it and not create subcategories manually. IMO, the finely-tuned categories which are present now came to be because there is no way to display articles which belong to multiple categories.
I see what you mean, but with the category system you envision, none of the categories would be very useful to browse by eye, and that happens to be a very fruitful way of browsing wikipedia. And all articles would
You are right that most categories would have a lot of articles; but I don't think that that's not fruitful. If someone needs a list of all cities featured on WIkipedia, there it is; if someone needs a list of all cities in Germany, it could be made. With current system, one is fine with a list of all cities in Germany but there is no way at all of getting a list of all cities.
need to be added to very, very many categories, bringing lots of hard
I don't think so. For example, a city in Germany might need to be added to Category:City and Category:Germany and that's about all. Possibly there are articles where a lot more could be needed (Leonardo Da Vinci?) but I guess that they would have a lot of categories anyway.
manual work with it. In any system with subcategories, a search function MUST also search the tree.
But you are right, that should be added in the future.
Nikola Smolenski wrote:
I see what you mean, but with the category system you envision, none of the categories would be very useful to browse by eye, and that happens to be a very fruitful way of browsing wikipedia. And all articles would
You are right that most categories would have a lot of articles; but I don't think that that's not fruitful. If someone needs a list of all cities featured on WIkipedia, there it is; if someone needs a list of all cities in Germany, it could be made. With current system, one is fine with a list of all cities in Germany but there is no way at all of getting a list of all cities.
This argument is fallacious. You say "if someone needs a list of all cities in Germany, it could be made", referring to a hypothetical future feature. Then you say "there is no way at all of getting a list of all cities", but it is obviously possible for there to future feature for that too (display a category and all sub-categories).
Timwi
On Monday 19 July 2004 23:01, Timwi wrote:
Nikola Smolenski wrote:
I see what you mean, but with the category system you envision, none of the categories would be very useful to browse by eye, and that happens to be a very fruitful way of browsing wikipedia. And all articles would
You are right that most categories would have a lot of articles; but I don't think that that's not fruitful. If someone needs a list of all cities featured on WIkipedia, there it is; if someone needs a list of all cities in Germany, it could be made. With current system, one is fine with a list of all cities in Germany but there is no way at all of getting a list of all cities.
This argument is fallacious. You say "if someone needs a list of all cities in Germany, it could be made", referring to a hypothetical future feature. Then you say "there is no way at all of getting a list of all cities", but it is obviously possible for there to future feature for that too (display a category and all sub-categories).
No it isn't. One of the future features isn't hypothetical, it is made, tested (by you), found to work, and I don't know what is preventing its inclusion in the CVS. The other feature is a hypothetical one; it doesn't exist and it is possible (though unlikely) that it never will.
wikitech-l@lists.wikimedia.org