How bad is a category with 21,008 pages for the servers?
http://en.wikipedia.org/wiki/Category:Articles_with_unsourced_statements
--Ligulem
On 8/20/06, Ligulem ligulem@pobox.com wrote:
How bad is a category with 21,008 pages for the servers?
http://en.wikipedia.org/wiki/Category:Articles_with_unsourced_statements
IIRC, it's not a problem now that we have category paging, but my recollection may be faulty.
On 8/20/06, Simetrical Simetrical+wikitech@gmail.com wrote:
On 8/20/06, Ligulem ligulem@pobox.com wrote:
How bad is a category with 21,008 pages for the servers?
http://en.wikipedia.org/wiki/Category:Articles_with_unsourced_statements
IIRC, it's not a problem now that we have category paging, but my recollection may be faulty.
20k isn't that large.. we have quite a few with a lot more than that...
On 8/20/06, Gregory Maxwell gmaxwell@gmail.com wrote:
20k isn't that large.. we have quite a few with a lot more than that...
Now that you mention it, we do, and even a special page to account for them: http://en.wikipedia.org/wiki/Special:Mostlinkedcategories. The largest weighs in at over 110,000 pages.
On 21/08/06, Simetrical Simetrical+wikitech@gmail.com wrote:
On 8/20/06, Gregory Maxwell gmaxwell@gmail.com wrote:
20k isn't that large.. we have quite a few with a lot more than that...
Now that you mention it, we do, and even a special page to account for them: http://en.wikipedia.org/wiki/Special:Mostlinkedcategories. The largest weighs in at over 110,000 pages.
Is that the one with an article for each faux pas of a well-known US political figure in?
Rob Church
On Mon, Aug 21, 2006 at 04:53:06AM +0100, Rob Church wrote:
Is that the one with an article for each faux pas of a well-known US political figure in?
Hee. -- j
Gregory Maxwell wrote:
On 8/20/06, Simetrical Simetrical+wikitech@gmail.com wrote:
On 8/20/06, Ligulem ligulem@pobox.com wrote:
How bad is a category with 21,008 pages for the servers?
http://en.wikipedia.org/wiki/Category:Articles_with_unsourced_statements
IIRC, it's not a problem now that we have category paging, but my recollection may be faulty.
20k isn't that large.. we have quite a few with a lot more than that...
Thanks for the responses. So this is not a technical problem then. I just wonder what's the benefit of having such huge categories...
I thought categories were meant as a tool for editors to iterate over the articles contained in them. I can't imagine that a human would ever iterate over a set of 20K pages (at least not without using specialized tools like AWB [1] or bots).
On 8/21/06, Ligulem ligulem@pobox.com wrote:
Thanks for the responses. So this is not a technical problem then. I just wonder what's the benefit of having such huge categories...
I thought categories were meant as a tool for editors to iterate over the articles contained in them. I can't imagine that a human would ever iterate over a set of 20K pages (at least not without using specialized tools like AWB [1] or bots).
There are several distinct uses of categories: a) To allow human readers to browse related articles b) To organise articles for future distribution, publishing etc c) To assist quality control, such as labelling articles that need cleanup etc d) To assow bots to work some kind of magic.
Steve
On 8/21/06, Steve Bennett stevage@gmail.com wrote:
On 8/21/06, Ligulem ligulem@pobox.com wrote:
Thanks for the responses. So this is not a technical problem then. I just wonder what's the benefit of having such huge categories...
I thought categories were meant as a tool for editors to iterate over the articles contained in them. I can't imagine that a human would ever iterate over a set of 20K pages (at least not without using specialized tools like AWB [1] or bots).
There are several distinct uses of categories: a) To allow human readers to browse related articles b) To organise articles for future distribution, publishing etc c) To assist quality control, such as labelling articles that need cleanup etc d) To assow bots to work some kind of magic.
I've been thinking for some weeks or more now that a good feature to improve the usefulness of large categories would be "Random article in this category". I would be excellent for maintainence where nobody is going to iterate through the whole lot but does like to try to keep order etc.
Andrew Dunbar (hippietrail)
Steve _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On 8/21/06, Andrew Dunbar hippytrail@gmail.com wrote:
I've been thinking for some weeks or more now that a good feature to improve the usefulness of large categories would be "Random article in this category". I would be excellent for maintainence where nobody is going to iterate through the whole lot but does like to try to keep order etc.
Yes. Definitely support that.
And, going even further, is doing what most "funny photo" sites do, which is find "similar articles" to the current one, by looking for articles that share most of the same categories. Maybe with the differences between our "categories" and the "tags" on other sites, it wouldn't work as well, but would still be interesting...
Steve
On 21/08/06, Andrew Dunbar hippytrail@gmail.com wrote:
I've been thinking for some weeks or more now that a good feature to improve the usefulness of large categories would be "Random article in this category". I would be excellent for maintainence where nobody is going to iterate through the whole lot but does like to try to keep order etc.
Someone tell me why the bollocking hell I never implemented that? I'm pretty sure I set out to do so at least once in the past.
** adds it to the big list(tm) **
Rob Church
Rob Church wrote:
On 21/08/06, Andrew Dunbar hippytrail@gmail.com wrote:
I've been thinking for some weeks or more now that a good feature to improve the usefulness of large categories would be "Random article in this category". I would be excellent for maintainence where nobody is going to iterate through the whole lot but does like to try to keep order etc.
Someone tell me why the bollocking hell I never implemented that? I'm pretty sure I set out to do so at least once in the past.
Because it would require a DB query with an execution time proportional to the number of articles in the category? Having categories with lots of members is fine, as long as we don't try to traverse them all the time.
The efficient way to do it would be to enumerate the category members, saving the results to a table for later lookup. The entries in this special table could have an expiry time. Is that how you were planning on doing it?
-- Tim Starling
On 21/08/06, Tim Starling t.starling@physics.unimelb.edu.au wrote:
Because it would require a DB query with an execution time proportional to the number of articles in the category? Having categories with lots of members is fine, as long as we don't try to traverse them all the time.
That would be it. I couldn't find an effective method which Domas would agree with.
The efficient way to do it would be to enumerate the category members, saving the results to a table for later lookup. The entries in this special table could have an expiry time. Is that how you were planning on doing it?
I don't bother planning things any more. Having four months' worth of notes turn useless is not a fun experience.
More details? Enumerate them when, and save which results?
Rob Church
Rob Church wrote:
On 21/08/06, Tim Starling t.starling@physics.unimelb.edu.au wrote:
Because it would require a DB query with an execution time proportional to the number of articles in the category? Having categories with lots of members is fine, as long as we don't try to traverse them all the time.
That would be it. I couldn't find an effective method which Domas would agree with.
The efficient way to do it would be to enumerate the category members, saving the results to a table for later lookup. The entries in this special table could have an expiry time. Is that how you were planning on doing it?
I don't bother planning things any more. Having four months' worth of notes turn useless is not a fun experience.
More details? Enumerate them when, and save which results?
Say if you have a large category. You could find a random member like this:
$res = $dbr->select('categorylinks','cl_from',array('cl_to'=>$cat)); $rand = mt_rand(0, $dbr->numRows($res) - 1); $dbr->dataSeek($rand); $randomRow = $dbr->fetchObject($res);
By selecting all members of the category into a buffer, we have implicitly assigned each row of the result set an number between 0 and N-1. This is what I mean by enumeration. Once you have this enumeration, you pick a random number between 0 and N-1 and then quickly retrieve it. But it's slow, because you have to load all the members every time you want to pick a random one.
Here's how you could save the enumeration:
$res = $dbr->select('categorylinks','cl_from',array('cl_to'=>$cat)); $i = 0; $enumRows = array(); while ( $row = $dbr->fetchObject( $res ) ) { $enumRows[] = array( 'ce_index' => $i++, 'ce_from' => $row->cl_from, 'ce_to' => $cat ) } $dbw->insert('categoryenum', $enumRows); $dbw->insert('categoryenum_info', array( 'cei_cat' => $cat, 'cei_count' => $i, 'cei_expiry' => wfTimestamp(TS_MW, time() + $expiry) );
Once you've done that, you can quickly pick random members from the set:
$count = $dbr->selectField( 'categoryenum_info', 'cei_count', array( 'cei_cat' => $cat ) ); $rand = mt_rand( 0, $count - 1 ); $randRow = $dbr->selectRow( 'categoryenum', '*', array( 'ce_index' => $rand, 'ce_to' => $cat ) );
This is fast as long as you have indexes on cei_cat and (ce_to,ce_index). I've left out a few details, such as:
* Expiry. * Refreshing existing enumerations. This could be done with a simple delete/insert, but better performance might be obtained by deleting individual items, and then filling the gaps by moving entries from the end of the enumeration to the middle. * Verification. You can quickly verify that the element that you pick from the cached enumeration is still in the category, and if it's not, you can try again. After a few iterations, you'd have to either give up or declare the enumeration expired and regenerate it.
Hopefully you get the idea though.
Come to think of it, that incremental refresh algorithm above would be fast enough to perform on page save, in LinksUpdate.php. Then you wouldn't need expiry, and it wouldn't be a cache anymore, just a permanent data structure. You could just add a cl_index column to the categorylinks table.
-- Tim Starling
Tim Starling wrote:
The efficient way to do it would be to enumerate the category members, saving the results to a table for later lookup. The entries in this special table could have an expiry time. Is that how you were planning on doing it?
I'm not sure I understand why you're coming up with a new scheme when there exists already a scheme for "Random Article".
"Random Article" works by having a page_random column in the pages table which just contains a random number.
In my mind, therefore, the obvious solution is to add a cl_random column to the categorylinks table. Then you select a random article from a category in the same way you select a random article from the entire site:
$sql = "SELECT page_id,page_title FROM $categorylinks, $page $use_index WHERE page_namespace=$namespace AND page_is_redirect=0 AND cl_random>$randstr AND cl_to=$category AND page_id=cl_from ORDER BY cl_random"; $sql = $db->limitResult($sql, 1, 0); $res = $db->query( $sql, $fname );
The index, of course, would be on (cl_to, cl_random).
Timwi
On 22/08/06, Timwi timwi@gmx.net wrote:
In my mind, therefore, the obvious solution is to add a cl_random column to the categorylinks table. Then you select a random article from a category in the same way you select a random article from the entire site:
That isn't what the *links tables are for, however.
Rob Church
Rob Church wrote:
On 22/08/06, Timwi timwi@gmx.net wrote:
In my mind, therefore, the obvious solution is to add a cl_random column to the categorylinks table. Then you select a random article from a category in the same way you select a random article from the entire site:
That isn't what the *links tables are for, however.
WTF? What sort of argument is that? o.O
But why do I care -- it's not like /I/ want that feature :-p
On 24/08/06, Timwi timwi@gmx.net wrote:
Rob Church wrote:
On 22/08/06, Timwi timwi@gmx.net wrote:
In my mind, therefore, the obvious solution is to add a cl_random column to the categorylinks table. Then you select a random article from a category in the same way you select a random article from the entire site:
That isn't what the *links tables are for, however.
WTF? What sort of argument is that? o.O
I thought it had all the nice hallmarks of a classic wikitech-l argument. Trivial, pointless and full of bull shit. Oh well.
I'd prefer to see a separate table to aggregate data for randomisation, however.
Rob Church
On 8/21/06, Steve Bennett stevage@gmail.com wrote:
There are several distinct uses of categories: a) To allow human readers to browse related articles b) To organise articles for future distribution, publishing etc c) To assist quality control, such as labelling articles that need cleanup etc d) To assow bots to work some kind of magic.
Surprised that you didn't name this one, since it is one of the more useful human oriented ones (and a primary application for the living people cat):
e) Produce a filtered recent changes feed (http://en.wikipedia.org/w/index.php?title=Special:Recentchangeslinked&ta...)
On 8/21/06, Gregory Maxwell gmaxwell@gmail.com wrote:
Surprised that you didn't name this one, since it is one of the more useful human oriented ones (and a primary application for the living people cat):
e) Produce a filtered recent changes feed (http://en.wikipedia.org/w/index.php?title=Special:Recentchangeslinked&ta...)
Heh, didn't know you could do that.
Lots of these functionalities would be better if they handled subcategories, but for that to work we really need a better subcatting system. But I haven't got a solution yet.
Steve
Steve Bennett schrieb:
On 8/21/06, Gregory Maxwell gmaxwell@gmail.com wrote:
Surprised that you didn't name this one, since it is one of the more useful human oriented ones (and a primary application for the living people cat):
e) Produce a filtered recent changes feed (http://en.wikipedia.org/w/index.php?title=Special:Recentchangeslinked&ta...)
Heh, didn't know you could do that.
Lots of these functionalities would be better if they handled subcategories, but for that to work we really need a better subcatting system. But I haven't got a solution yet.
I have. It's in the current code, turned off. It's a filter to mass-sift through articles fast. The only implemented use (also turned off) is to filter Recent Changes to show only articles in a category *and its subcategories*. I have asked to turn it on for testing some time ago, but was more or less ignored (as usual;-).
It could be tested on one of the smaller wikis to check what impact it would have on en or de.
Magnus
On 8/21/06, Magnus Manske magnus.manske@web.de wrote:
I have. It's in the current code, turned off. It's a filter to mass-sift through articles fast. The only implemented use (also turned off) is to filter Recent Changes to show only articles in a category *and its subcategories*. I have asked to turn it on for testing some time ago, but was more or less ignored (as usual;-).
It could be tested on one of the smaller wikis to check what impact it would have on en or de.
How does it cope with category cycles? Basically I feel that since we have no real definition of what "subcategory" means (on en, at least), it's not that meaningful at this stage to presume that subcategories should be searched along with the main category. Or perhaps the user should be able to choose whether or not that's meaningful for the category he's searching...
Steve
Steve Bennett schrieb:
On 8/21/06, Magnus Manske magnus.manske@web.de wrote:
I have. It's in the current code, turned off. It's a filter to mass-sift through articles fast. The only implemented use (also turned off) is to filter Recent Changes to show only articles in a category *and its subcategories*. I have asked to turn it on for testing some time ago, but was more or less ignored (as usual;-).
It could be tested on one of the smaller wikis to check what impact it would have on en or de.
How does it cope with category cycles?
IIRC it remembers which categories were already checked, and doesn't cycle forever ;-)
Also, it starts with the categories a given list of articles are in, then goes *down* through the tree, towards the parent/root (or was the "up"? I keep forgetting), and checks if it finds a given category.
Basically I feel that since we have no real definition of what "subcategory" means (on en, at least), it's not that meaningful at this stage to presume that subcategories should be searched along with the main category. Or perhaps the user should be able to choose whether or not that's meaningful for the category he's searching...
Well, saying "show me recent changes in biology and subcategories" is definitely something I'd enjoy.
Magnus
2006/8/21, Magnus Manske magnus.manske@web.de:
Basically I feel that since we have no real definition of what "subcategory" means (on en, at least), it's not that meaningful at this stage to presume that subcategories should be searched along with the main category. Or perhaps the user should be able to choose whether or not that's meaningful for the category he's searching...
Well, saying "show me recent changes in biology and subcategories" is definitely something I'd enjoy.
I'm afraid that it would be awfully timeconsuming. I recently checked for one article its category and their supercategories and it ran in the hundreds, perhaps over a 1000. Subcategories, definitely of a high-level category like Biology, might well have the same problem.
On 8/21/06, Magnus Manske magnus.manske@web.de wrote:
Well, saying "show me recent changes in biology and subcategories" is definitely something I'd enjoy.
Yeah. It just depends a lot on the kind of category. You probably wouldn't want *all* its subcats, there might be thousands of articles? Maybe you just want one level of subcats?
On the other hand, clear-cut subcats like "Musicians by country" would definitely be interesting.
Steve
Steve Bennett wrote:
On 8/21/06, Gregory Maxwell gmaxwell@gmail.com wrote:
Surprised that you didn't name this one, since it is one of the more useful human oriented ones (and a primary application for the living people cat):
e) Produce a filtered recent changes feed (http://en.wikipedia.org/w/index.php?title=Special:Recentchangeslinked&ta...)
Lots of these functionalities would be better if they handled subcategories, but for that to work we really need a better subcatting system. But I haven't got a solution yet.
Not wanting to suggest anything here, but I'll just throw something in that I came up with at work.
Essentially, I maintain a website that has "categories", and some features need to work for "this category and all its sub-categories". Since the category tree can theoretically grow arbitrarily, I figured that iterative queries ("cycling" through the sub-categories) is unacceptable, so instead I've done this:
* Create a new table, "CategoryTC" (TC = transitive closure) which has only two columns, "AncestorCID" and "CID" (CID = category ID). Each row in this table tells you that the category CID is somewhere below AncestorCID in the category tree. (For the pedants: It's actually the transitive-symmetric closure, i.e. every category is also its own ancestor.)
* In your SQL query, instead of some_variable=$cid you say some_variable=tc.CID and tc.AncestorCID=$cid or some_variable IN (select CID from CategoryTC where AncestorCID=$cid) and it works for sub-categories within the same query.
* When creating, deleting or re-parenting categories, obviously the transitive closure needs to be kept up to date. This is easier than it sounds.
* The CategoryTC table makes it absolutely trivial to prevent the creation of loops in the category tree.
The system would need some re-thinking for MediaWiki because in MediaWiki, the category structure is not a tree as it is in my case. In fact, MediaWiki's category structure isn't even acyclic, even though I'm pretty sure most people would agree that it should be acyclic and that the system should enforce acyclicity by rejecting edits that would create a category loop.
Timwi
On 8/22/06, Timwi timwi@gmx.net wrote:
The system would need some re-thinking for MediaWiki because in MediaWiki, the category structure is not a tree as it is in my case. In fact, MediaWiki's category structure isn't even acyclic, even though I'm pretty sure most people would agree that it should be acyclic and that the system should enforce acyclicity by rejecting edits that would create a category loop.
Heh, I'm apparently not "most editors". I know I keep harping on, but if we don't define what the relationship "subcategory" means, how do we know that cycles are meaningless? In particular, how would you know where to break a cycle? In some of our fuzzier areas, the subcategory relationships are so vague and arbitrary that there doesn't seem any major harm in following the chain and ending up back where you started.
Steve
Steve Bennett wrote:
Heh, I'm apparently not "most editors". I know I keep harping on, but if we don't define what the relationship "subcategory" means, how do we know that cycles are meaningless? In particular, how would you know where to break a cycle? In some of our fuzzier areas, the subcategory relationships are so vague and arbitrary that there doesn't seem any major harm in following the chain and ending up back where you started.
Steve
Can someone point me to one of these cycles? I don't remember seeing one, although I've no doubt they exist. I just can't imagine a situation in which much is gained by allowing it and, given that certain features would be a lot easier if the system were guaranteed cycle-free, I think we might be better off disallowing them. But I'd like to see a few of them "in the wild" before making my mind up.
Charlie
On 8/22/06, Charlie Reams calr3@cam.ac.uk wrote:
Can someone point me to one of these cycles? I don't remember seeing one, although I've no doubt they exist. I just can't imagine a situation in which much is gained by allowing it and, given that certain features would be a lot easier if the system were guaranteed cycle-free, I think we might be better off disallowing them. But I'd like to see a few of them "in the wild" before making my mind up.
They're obviously not easy to find since we have no built in features to do that, and every time one becomes known, it gets fixed :) Maybe Magnus's tool can help us out? I don't think we have any permanent, accepted ones, if that's what you're asking...they just happen from time to time when enough sufficiently dubious category links get made...
Steve
Steve Bennett wrote:
On 8/22/06, Charlie Reams calr3@cam.ac.uk wrote:
Can someone point me to one of these cycles? I don't remember seeing one, although I've no doubt they exist. I just can't imagine a situation in which much is gained by allowing it and, given that certain features would be a lot easier if the system were guaranteed cycle-free, I think we might be better off disallowing them. But I'd like to see a few of them "in the wild" before making my mind up.
They're obviously not easy to find since we have no built in features to do that, and every time one becomes known, it gets fixed :) Maybe Magnus's tool can help us out? I don't think we have any permanent, accepted ones, if that's what you're asking...they just happen from time to time when enough sufficiently dubious category links get made...
Steve
The problem is different senses of 'subcategory': there's [1] "the set of objects in B are a strict subset of the set of objects in A", which is loop-proof, and [2] 'B is a topic that is usually discussed in the context of A'
For example:
London -[2]-> Thames Valley -[1]-> London, comes to mind
Russia -[2]-> Soviet Union -[1]-> Russia, as well.
Also, another entirely reasonable cycle would be:
Universe -[contains]-> Human beings -[who have]-> Human thought -[which includes]-> Philosophy -[which studies the]-> Universe
See [[WordNet]] for a serious attempt to tease out the relationships between concepts in detail: there are _lots_ of possible ways that one thing can be related to another.
-- Neil
On 8/23/06, Neil Harris neil@tonal.clara.co.uk wrote:
The problem is different senses of 'subcategory': there's [1] "the set of objects in B are a strict subset of the set of objects in A", which is loop-proof, and [2] 'B is a topic that is usually discussed in the context of A'
For example:
London -[2]-> Thames Valley -[1]-> London, comes to mind
Russia -[2]-> Soviet Union -[1]-> Russia, as well.
Good explanation!
See [[WordNet]] for a serious attempt to tease out the relationships between concepts in detail: there are _lots_ of possible ways that one thing can be related to another.
One major complication not to forget is that our categories are at the *article* level, not at the *topic* or *concept* level. Often one article = one topic, but not always: for example, [[Beamish and Crawford]] is about both a brewery and a beer. [[Controlled Impact Demonstration]] covers the topics of an air safety program, a type of jet fuel, and a specific event.
If anyone has some schemas or diagrams or whatever that could attempt to make sense of this, I'd love to see them. Clearly a strict hierarchy as the only classificational structure is inappropriate. However, we can also do better than unstructured tagging. What's a good compromise? How can we get the benefits of a hierarchy where it works, and the flexibility of tagging when it doesn't?
Steve
Steve Bennett wrote:
On 8/23/06, Neil Harris neil@tonal.clara.co.uk wrote:
The problem is different senses of 'subcategory': there's [1] "the set of objects in B are a strict subset of the set of objects in A", which is loop-proof, and [2] 'B is a topic that is usually discussed in the context of A'
For example:
London -[2]-> Thames Valley -[1]-> London, comes to mind
Russia -[2]-> Soviet Union -[1]-> Russia, as well.
Good explanation!
See [[WordNet]] for a serious attempt to tease out the relationships between concepts in detail: there are _lots_ of possible ways that one thing can be related to another.
One major complication not to forget is that our categories are at the *article* level, not at the *topic* or *concept* level. Often one article = one topic, but not always: for example, [[Beamish and Crawford]] is about both a brewery and a beer. [[Controlled Impact Demonstration]] covers the topics of an air safety program, a type of jet fuel, and a specific event.
If anyone has some schemas or diagrams or whatever that could attempt to make sense of this, I'd love to see them.
Indeed they have. However, the main efforts in this direction are so complex and abstract that they are effectively unusable for ordinary mortals. See, for example, http://www.w3.org/TR/owl-features/
Clearly a strict hierarchy as the only classificational structure is inappropriate. However, we can also do better than unstructured tagging. What's a good compromise? How can we get the benefits of a hierarchy where it works, and the flexibility of tagging when it doesn't?
I like [[WordNet]]'s (relatively) simple way of identifying different classes of relationships: even things like (say) [[located-in::Category:Places in Texas]], or [[subset-of::Category:Places in Texas]] would still be easier to read, write and understand than stuff like (for example)
<owl:Class rdf:ID="TexasThings"> owl:equivalentClass owl:Restriction <owl:onProperty rdf:resource="#locatedIn" /> <owl:someValuesFrom rdf:resource="#TexasRegion" /> </owl:Restriction> </owl:equivalentClass> </owl:Class>
-- Neil
On 8/23/06, Steve Bennett stevage@gmail.com wrote:
One major complication not to forget is that our categories are at the *article* level, not at the *topic* or *concept* level. Often one article = one topic, but not always: for example, [[Beamish and Crawford]] is about both a brewery and a beer. [[Controlled Impact Demonstration]] covers the topics of an air safety program, a type of jet fuel, and a specific event.
I don't think that's a problem. *Categories* are still at a concept level, after all, and even if they aren't, hierarchy would still work as expected (just there wouldn't be many available strict supercategories). [[Beamish and Crawford]] would be in both [[Category:Breweries]] and [[Category:Beers]]. Arguably that breaks down the idea of strict super-/subsets, but there's no reason to apply that to articles as a whole if it would make sense to apply it to only part of an article that deals with multiple topics.
On 8/23/06, Neil Harris neil@tonal.clara.co.uk wrote:
I like [[WordNet]]'s (relatively) simple way of identifying different classes of relationships: even things like (say) [[located-in::Category:Places in Texas]], or [[subset-of::Category:Places in Texas]] would still be easier to read, write and understand than stuff like (for example)
<owl:Class rdf:ID="TexasThings"> owl:equivalentClass owl:Restriction <owl:onProperty rdf:resource="#locatedIn" /> <owl:someValuesFrom rdf:resource="#TexasRegion" /> </owl:Restriction> </owl:equivalentClass> </owl:Class>
We don't even need [[located-in::]] for the main encyclopedia (leaving aside the mythical Wikidata, which is a good idea but still in the indefinite future). We just need three types of relationships: related-to, superset-of, subset-of.
=Category:Humans= :''The main article for this category is [[Human]].''
==Subcategories== All of the following are {{PAGENAME}}.
* People by city * People by company * People by country of residence * People by nationality * People by occupation * People by political orientation * People by race or ethnicity * People by religion * People by status . . .
==Supercategories== All {{PAGENAME}} are each of the following.
* Apes
==Related categories== * Lists of people
==Articles about {{PAGENAME}}== There are no articles about {{PAGENAME}} that are not also about a [[#Subcategories|subcategory]] of {{PAGENAME}}.
==Articles related to {{PAGENAME}}== * Human * Person * People
On 8/24/06, Andre Engels andreengels@gmail.com wrote:
Waves -> Sound -> Music -> Music events -> Music competitions -> Eurovision Song Contest -> Eurovision host cities -> Rome -> History of Rome -> Ancient Rome -> Ancient Roman religion -> Ancient Roman Christianity -> Patristics -> Heresy -> Heretics -> People executed for heresy -> Jesus -> Doctrines and teachings of Jesus -> Nonviolence
<snip>
What a total mess. I had no idea it was this bad. Why is the category Jesus under "People executed for heresy"? The only subcategories I could think of for a category like that would be "Americans executed for heresy" or "People executed for heresy in the middle ages".
Now, the rest of a post I meant to send yesterday:
On 8/23/06, Simetrical Simetrical+wikitech@gmail.com wrote:
I don't think that's a problem. *Categories* are still at a concept level, after all, and even if they aren't, hierarchy would still work as expected (just there wouldn't be many available strict supercategories). [[Beamish and Crawford]] would be in both [[Category:Breweries]] and [[Category:Beers]]. Arguably that breaks down the idea of strict super-/subsets, but there's no reason to apply that to articles as a whole if it would make sense to apply it to only part of an article that deals with multiple topics.
Yes, I think you're right, I didn't really think it through.
We don't even need [[located-in::]] for the main encyclopedia (leaving aside the mythical Wikidata, which is a good idea but still in the indefinite future). We just need three types of relationships: related-to, superset-of, subset-of.
How would you describe the relationship between Category:John Lennon and Category:The Beatles, knowing that Category:John Lennon contains songs that have nothing to do with The Beatles? Actually, just to flesh this out, what are the relationships between these (possibly fictional) categories: English rock bands, The Beatles, John Lennon, John Lennon songs, The Beatles songs. Similarly, into which categories would these articles go: [[The Beatles]], [[John Lennon]], [[I Am The Walrus]] (John Lennon/Beatles song), [[Imagine (song)]] (John Lennon solo song).
Steve
2006/8/24, Steve Bennett stevage@gmail.com:
On 8/24/06, Andre Engels andreengels@gmail.com wrote:
Waves -> Sound -> Music -> Music events -> Music competitions -> Eurovision Song Contest -> Eurovision host cities -> Rome -> History of Rome -> Ancient Rome -> Ancient Roman religion -> Ancient Roman Christianity -> Patristics -> Heresy -> Heretics -> People executed for heresy -> Jesus -> Doctrines and teachings of Jesus -> Nonviolence
<snip>
What a total mess. I had no idea it was this bad. Why is the category Jesus under "People executed for heresy"? The only subcategories I could think of for a category like that would be "Americans executed for heresy" or "People executed for heresy in the middle ages".
Well, that's your opinion, and mine, but I already have given up arguing that. It's just not doable.
The last time I argued against it on the Dutch Wikipedia (it was about Category:Upper Silezia being in Category:Provinces of Poland), there were three reactions: * Would anyone expect that they [=the pages in Category:Upper Silezia] would all be Polish provinces, because they are under "province in Poland"? I don't think so. * That's a fully normal category. (...) A principle that dozens of users have no problems with and apply automatically. Here and on other wikis. * Of course [to the person of the second reaction]
If you have an idea about how to improve it, I'd like to hear it. For now, on Wikipedia we are a tiny minority, so it doesn't matter if we're right or wrong, we won't be heard.
On 8/24/06, Andre Engels andreengels@gmail.com wrote:
If you have an idea about how to improve it, I'd like to hear it. For now, on Wikipedia we are a tiny minority, so it doesn't matter if we're right or wrong, we won't be heard.
The basic process would be: * Convince a few clueful people that there is *a better way* * Majorly update the help/guideline pages about categories * Somehow modify the interface for categories so it's clearer what should and should not be added. This is the tricky bit.
Actually, it would be really cool if category pages could draw a local map of related categories, like maybe two nodes up and one node down or something, to give people a visual impression of what the category is doing, and whether it's behaving as it should.
Also, since there tend to be less categories in each category than there are articles, it might make sense to display the "sister categories" for each category to again reinforce to the editor whether the category is in the right supercategory or not. Working on categories at the moment is very much like operating through a keyhole - you can't see *anything* of the surrounding context.
Steve
On 24/08/06, Steve Bennett stevage@gmail.com wrote:
Also, since there tend to be less categories in each category than there are articles, it might make sense to display the "sister categories" for each category to again reinforce to the editor whether the category is in the right supercategory or not. Working on categories at the moment is very much like operating through a keyhole
- you can't see *anything* of the surrounding context.
I don't know if it's quite what we're looking for, but we could inspect the category browser feature (for efficiency and load etc.) and perhaps switch it on...?
Rob Church
2006/8/24, Steve Bennett stevage@gmail.com:
The basic process would be:
- Convince a few clueful people that there is *a better way*
Well, good luck. I'm not good at convincing people, and besides, those who really matter (those who work much on categorisation) will undoubtedly say "this is how we do it everywhere" and refuse to even consider your proposal. Also, they are the ones who told me it was 'fully logical and done by everyone' what is done now.
- Majorly update the help/guideline pages about categories
As if that helps. Either I get reverted or they get ignored. Probably both.
- Somehow modify the interface for categories so it's clearer what
should and should not be added. This is the tricky bit.
And how would you do that? People will either use the system for their own nonsense, or they will not look at it at all. Or even worse, they will go and revert whatever you did through the localisation. We can make nice texts, but people will have other opinions and go and change the texts in the MediaWiki namespace.
Actually, it would be really cool if category pages could draw a local map of related categories, like maybe two nodes up and one node down or something, to give people a visual impression of what the category is doing, and whether it's behaving as it should.
Problem is, they ARE behaving as they should. Or at least, they are behaving like those who work on them think they should.
Also, since there tend to be less categories in each category than there are articles, it might make sense to display the "sister categories" for each category to again reinforce to the editor whether the category is in the right supercategory or not. Working on categories at the moment is very much like operating through a keyhole
- you can't see *anything* of the surrounding context.
The category system is a mess, a labyrinth. But I don't see any way to improve that any more. I have given up on them, to me they're just the sewer of Wikipedia now. Which is a shame, because they looked so great when introduced. But apparently it's typically something where the lowest common demonimator decides the level of the whole. Where there's two possibilities, both will have people in favor of them, and in the case of categories, it's the stupidest of those two who will prevail.
2006/8/24, Andre Engels andreengels@gmail.com:
The category system is a mess, a labyrinth. But I don't see any way to improve that any more. I have given up on them, to me they're just the sewer of Wikipedia now. Which is a shame, because they looked so great when introduced. But apparently it's typically something where the lowest common demonimator decides the level of the whole. Where there's two possibilities, both will have people in favor of them, and in the case of categories, it's the stupidest of those two who will prevail.
I got admonished on the Dutch Wikipedia because I did just complain about the category system, but did not say *what* should be improved. Problem is, the whole 'tree' has more dead wood than useful content. So I looked for 10 random categories, and judged about half of them to have major mistakes. Here a similar attempt on the English Wikipedia:
Category:Cities in Tokyo - is both in "Geography of Tokyo" and "Tokyo". Why? Category:Romanian military aircraft 1930-1939 - there are 3 articles in this category. There's also 3 categories to put it in. Need I say more?
Category:Spanish translators - Why is this divided by nationality instead of language, which I would find much more logical. And I don't consider translators non-fiction writers either
Category:Sendai class cruisers - A category with only 2 pages is functioning more to confuse than to enlighten. A subcategory of "cruiser classes" although it is rather than has cruiser classes. Subcategory of both Cruisers of Japan and Cruisers of the Imperial Japanese Navy which have the same content but different locations in the category tree.
Category:British academics: Is this useful? Why is this a subcategory of Education and Educators?
Category:Fictional Jeet Kune Do practitioners: I hate this type of category, but alas, putting it directly in the parent categories would be even worse
Category:Indoor ice hockey venues in Sweden: What is gained by having indoor and outdoor ice hockey venues in separate categories, except that there are more categories with more possibilities for getting lost?
Category:TRS-80 Color Computer games: Well, nothing *major* wrong with this one
Category:Haywood County, Tennessee: The type of case we are discussing. Only parent category: Tennessee counties
Category:Canadian football stubs: Why o why....
Category:1862 in Mexico: That's two articles. And no doubt many similar categories. There's even a category Underpopulated (Year) in Mexico categories.... Categorizing for the sake of categorizing. Or worse.
On 8/24/06, Andre Engels andreengels@gmail.com wrote:
Category:Cities in Tokyo - is both in "Geography of Tokyo" and "Tokyo". Why? Category:Romanian military aircraft 1930-1939 - there are 3 articles in this category. There's also 3 categories to put it in. Need I say more?
We lack good model categories, I think. For me, a category like "Tokyo" should only contain two things: Subcategories (Geography of Tokyo, Tokyo culture, People born in Tokyo...), and articles waiting to be subcategorised. It's totally consistent with the wiki principle that articles can be dumped in the simplest category, and then moved later by an editor with more "local" knowledge.
To answer your "why?" question: Because subcategories don't work. The princple that "If X is a subcat of Y, then Z should not be in both X and Y" is totally bogus and unworkable - at the moment.
Category:Spanish translators - Why is this divided by nationality instead of language, which I would find much more logical. And I don't consider translators non-fiction writers either
See http://en.wikipedia.org/wiki/Category:Translators - what more could you ask for, we have [[Translators by nationality]], [[Translators by destination language]] *and* [[Translators by source language]]. In the subcategories, not only do we have [[Category:Translators from Spanish]] but we even have [[Spanish-English translators]]. Perfect! Sure, these categories aren't well populated, but the structure is 10 out of 10.
Category:Sendai class cruisers - A category with only 2 pages is functioning more to confuse than to enlighten. A subcategory of "cruiser classes" although it is rather than has cruiser classes.
There is general blurring of the distinction between classes of things and things themselves. See [[Category:Elephants]] for an example. The solution is probably to branch these categories into "Famous X's", "Classes of X" etc. For a category with only 5-6 members I'm not fussed if it contains both articles on general classes of things and specific instances of those things, but for bigger categories we should be more precise.
Category:British academics: Is this useful? Why is this a subcategory of Education and Educators?
A good example of our common "non-strict subset" problem. Lots of academics are "educators". But not all.
Category:Fictional Jeet Kune Do practitioners: I hate this type of category, but alas, putting it directly in the parent categories would be even worse
Why? It seems to be in the right place, a subcat of both Fictional martial artists and Jeet Kune Do practitioners. Should probably be a subcat of some "Computer game characters" or something too.
Category:Indoor ice hockey venues in Sweden: What is gained by having indoor and outdoor ice hockey venues in separate categories, except that there are more categories with more possibilities for getting lost?
I disagree - everything is to be gained by splitting categories whenever they can be unambiguously and precisely split. OTOH, since Category:Outdoor ice hockey venues in Sweden doesn't exist, the point is moot.
Category:Haywood County, Tennessee: The type of case we are discussing. Only parent category: Tennessee counties
Seems to be in good working order?
Category:Canadian football stubs: Why o why....
Yep, these "metacategories" (ie, categories containing meta information about our articles) are a bit ugly, but they don't seem to cause a great deal of harm. And one day it will be trivial to tag them all and do something special with them. They're all subcats of Category:Stub categories after all. And currently they're somewhat useful.
Category:1862 in Mexico: That's two articles. And no doubt many similar categories. There's even a category Underpopulated (Year) in Mexico categories.... Categorizing for the sake of categorizing. Or worse.
What's wrong with "Categorizing for the sake of categorizing"? And what's wrong with underpopulated categories? It's just like a stub - awaiting further development.
Steve
2006/8/24, Steve Bennett stevage@gmail.com:
On 8/24/06, Andre Engels andreengels@gmail.com wrote:
Category:Cities in Tokyo - is both in "Geography of Tokyo" and "Tokyo". Why? Category:Romanian military aircraft 1930-1939 - there are 3 articles in this category. There's also 3 categories to put it in. Need I say more?
We lack good model categories, I think. For me, a category like "Tokyo" should only contain two things: Subcategories (Geography of Tokyo, Tokyo culture, People born in Tokyo...), and articles waiting to be subcategorised. It's totally consistent with the wiki principle that articles can be dumped in the simplest category, and then moved later by an editor with more "local" knowledge.
I haven't denied that.
To answer your "why?" question: Because subcategories don't work. The princple that "If X is a subcat of Y, then Z should not be in both X and Y" is totally bogus and unworkable - at the moment.
Why that?
Category:Sendai class cruisers - A category with only 2 pages is functioning more to confuse than to enlighten. A subcategory of "cruiser classes" although it is rather than has cruiser classes.
There is general blurring of the distinction between classes of things and things themselves. See [[Category:Elephants]] for an example. The solution is probably to branch these categories into "Famous X's", "Classes of X" etc. For a category with only 5-6 members I'm not fussed if it contains both articles on general classes of things and specific instances of those things, but for bigger categories we should be more precise.
This is another case: It is a category for classes, containing a subcategory for instances.
Category:Fictional Jeet Kune Do practitioners: I hate this type of category, but alas, putting it directly in the parent categories would be even worse
Why? It seems to be in the right place, a subcat of both Fictional martial artists and Jeet Kune Do practitioners. Should probably be a subcat of some "Computer game characters" or something too.
Just my gut feeling - what's the use of this category? Would anyone think "Hey, this is a fictional Jeet Kune Do practitioner, I would want more of those". Still, I agree that after all the category is good, because you *would* want to have them in Category:Jeet Kune Do practioners, but when getting to that category from a real person, you don't want to bump into the fictional characters. So, in the end, I'm happy with this one.
Category:Indoor ice hockey venues in Sweden: What is gained by having indoor and outdoor ice hockey venues in separate categories, except that there are more categories with more possibilities for getting lost?
I disagree - everything is to be gained by splitting categories whenever they can be unambiguously and precisely split.
I disagree with that. Strongly. A category is to find similar subjects, and that is best done by having a certain size of categories. Split them up further, and you only give more work to those trying to look something up. Going with your principle would leave only categories with 1 page and categories with 2 categories - everything else can be split. The goal of categorization is to make navigation easier, it is not to categorize as much as possible.
Category:Haywood County, Tennessee: The type of case we are discussing. Only parent category: Tennessee counties
Seems to be in good working order?
Well, it's the thing we discussed before. Or at least the thing I thought we were discussing before. It's a subcategory of tennessee counties, but the articles in it are not about Tennessee counties, but about cities. I don't see why you vehemently oppose putting "Jesus" under 'People executed for heresy" yet consider putting "Haywood County, Tennessee" in "Tennessee counties" 'good working order'. To me it's twice the same kind of thing.
Category:1862 in Mexico: That's two articles. And no doubt many similar categories. There's even a category Underpopulated (Year) in Mexico categories.... Categorizing for the sake of categorizing. Or worse.
What's wrong with "Categorizing for the sake of categorizing"? And what's wrong with underpopulated categories? It's just like a stub - awaiting further development.
We are here to make the encyclopedia, not to make a classification scheme of everything. It would, in my opinion, be so much more useful to have (for example) one category about "1860s in Mexico" than to have to go through 11 categories to find those. Again my question is: What is the use of categories? To me it is, getting similar pages together. And that is done not by splitting up further and further until you have all 1- and 2-page categories, but by bringing them together to a manageable size.
On 8/24/06, Andre Engels andreengels@gmail.com wrote:
To answer your "why?" question: Because subcategories don't work. The princple that "If X is a subcat of Y, then Z should not be in both X and Y" is totally bogus and unworkable - at the moment.
Why that?
Because, as we've agreed, it's not easy to browse both a category and its subcats at the moment. Your response is not to use subcats. My response is to put articles in both the category and its subcat.
Just my gut feeling - what's the use of this category? Would anyone think "Hey, this is a fictional Jeet Kune Do practitioner, I would want more of those". Still, I agree that after all the category is good, because you *would* want to have them in Category:Jeet Kune Do practioners, but when getting to that category from a real person, you don't want to bump into the fictional characters. So, in the end, I'm happy with this one.
I think you're overlooking the value of categories as semantic markup.
I disagree with that. Strongly. A category is to find similar subjects, and that is best done by having a certain size of
That's only one, fairly narrow use.
categories. Split them up further, and you only give more work to those trying to look something up. Going with your principle would
Because the software is no good at grouping subcats together. (Not blaming the developers, we just don't have a good category/subcategory model).
Well, it's the thing we discussed before. Or at least the thing I thought we were discussing before. It's a subcategory of tennessee counties, but the articles in it are not about Tennessee counties, but about cities. I don't see why you vehemently oppose putting "Jesus" under 'People executed for heresy" yet consider putting "Haywood County, Tennessee" in "Tennessee counties" 'good working order'. To me it's twice the same kind of thing.
Sorry, that's my mistake, didn't realise that the *category* "Haywod County, Tennessee" was in "Tennessee counties". There's got to be a better model for this stuff that makes more sense.
We are here to make the encyclopedia, not to make a classification scheme of everything. It would, in my opinion, be so much more useful
Well, we could simply have 1,300,000 uncategorised pages. Or we could build an information rich categorisation scheme that makes pages easy to find and establishes meaningful links between them.
to have (for example) one category about "1860s in Mexico" than to have to go through 11 categories to find those. Again my question is: What is the use of categories? To me it is, getting similar pages
They seem to have lots of uses, and the better the model, and the better the implementation, probably the more uses we will come up with. Saying "Categories are only good for X" is unnecessarily restrictive.
Steve
2006/8/24, Steve Bennett stevage@gmail.com:
They seem to have lots of uses, and the better the model, and the better the implementation, probably the more uses we will come up with. Saying "Categories are only good for X" is unnecessarily restrictive.
I haven't seen another use from you where dividing up one category in numerous smaller ones is a gain. I have come up with a very logical use where it is a loss. I'm not saying that categories are *only* for finding similar pages, but it's still one of the main uses. You haven't come up with any other major uses yet, yet apparently you see them, because you seem to support greatly diminishing their use for my usage. You must have reasons for that. Tell them.
On 8/24/06, Andre Engels andreengels@gmail.com wrote:
Well, good luck. I'm not good at convincing people, and besides, those who really matter (those who work much on categorisation) will undoubtedly say "this is how we do it everywhere" and refuse to even consider your proposal. Also, they are the ones who told me it was 'fully logical and done by everyone' what is done now.
I have to believe that if an idea is genuinely good, there are enough sensible people who will eventually realise it and work towards implementing it.
- Majorly update the help/guideline pages about categories
As if that helps. Either I get reverted or they get ignored. Probably both.
Hence getting support first. I agree that most guidelines are generally ignored, hence the third step, which is a bit more 'in your face'.
- Somehow modify the interface for categories so it's clearer what
should and should not be added. This is the tricky bit.
And how would you do that? People will either use the system for their own nonsense, or they will not look at it at all. Or even worse, they will go and revert whatever you did through the localisation. We can make nice texts, but people will have other opinions and go and change the texts in the MediaWiki namespace.
That could happen, but I'd like to think it wouldn't.
Problem is, they ARE behaving as they should. Or at least, they are behaving like those who work on them think they should.
I don't know, I've pointed out a few local weirdnesses in categories to people and they've generally been fairly receptive. It *seems* logical at first that Category:John Lennon is a subcategory of Category:The Beatles. But not when you consider that John Lennon ends up being a subsubcategory of Category:English rock bands.
The category system is a mess, a labyrinth. But I don't see any way to improve that any more. I have given up on them, to me they're just the sewer of Wikipedia now. Which is a shame, because they looked so great when introduced. But apparently it's typically something where the lowest common demonimator decides the level of the whole. Where there's two possibilities, both will have people in favor of them, and in the case of categories, it's the stupidest of those two who will prevail.
I interpret it differently. I think categorising stuff well is more difficult than editing articles. Generally, categories are structured relatively badly, and used relatively badly - not through ill will, but just lack of understanding. OTOH, it actually doesn't take that long to totally clean up a category. Maybe 2 hours to evaluate and recategorise 100 articles. I've only done it a couple of times though, and I'm still working out what the issues are. And I've not had anyone tell me to stop, or undo my changes.
Categories may be the sewer of Wikipedia, but would you like to live in a society with no sewers?
Steve
2006/8/24, Steve Bennett stevage@gmail.com:
Problem is, they ARE behaving as they should. Or at least, they are behaving like those who work on them think they should.
I don't know, I've pointed out a few local weirdnesses in categories to people and they've generally been fairly receptive. It *seems* logical at first that Category:John Lennon is a subcategory of Category:The Beatles. But not when you consider that John Lennon ends up being a subsubcategory of Category:English rock bands.
No, then you assume that everything that goes into a subcategory also belongs to the main category. Categorisation is not strict set theory, and bundles similar notions. A category:Germany is thus a subcategory of Category:Country, because it is a country.
You may think it's logical that being in a subcategory means that there some kind of connection with the main category, and I may think so, but others don't. And when they don't think that something is illogical, they don't think their own actions that lead to such a situation are undesirable.
The category system is a mess, a labyrinth. But I don't see any way to improve that any more. I have given up on them, to me they're just the sewer of Wikipedia now. Which is a shame, because they looked so great when introduced. But apparently it's typically something where the lowest common demonimator decides the level of the whole. Where there's two possibilities, both will have people in favor of them, and in the case of categories, it's the stupidest of those two who will prevail.
I interpret it differently. I think categorising stuff well is more difficult than editing articles. Generally, categories are structured relatively badly, and used relatively badly - not through ill will, but just lack of understanding. OTOH, it actually doesn't take that long to totally clean up a category.
It does. At least in the way I want to clean it up. That is, move stuff from too-small subcategories to the main category. You have to apply for deletion of the subcategory, get an objection to that, and nothing happens. That's pretty slow I say - 2 weeks to NOT get what you want.
Maybe 2 hours to evaluate and recategorise 100 articles.
I could go and do some categories, but at best it's a mere small improvement where elsewhere there's double as much worsening. More likely I will simply get reverted. If I'm unlucky I'm getting reverted AND branded a vandal.
On 8/24/06, Andre Engels andreengels@gmail.com wrote:
No, then you assume that everything that goes into a subcategory also belongs to the main category. Categorisation is not strict set theory, and bundles similar notions. A category:Germany is thus a subcategory of Category:Country, because it is a country.
So what *do* our subcategories mean? It'd be nice to be able to have *some* strict subcategories, to allow assumptions to be made by bots or editors. And then to have other categories where anything goes...
You may think it's logical that being in a subcategory means that there some kind of connection with the main category, and I may think so, but others don't. And when they don't think that something is illogical, they don't think their own actions that lead to such a situation are undesirable.
All of this is caused by the fact that we don't state anywhere what "subcategory" means. It doesn't mean anything! Maybe even a convention of locally defining it, like "Subcategories of this category must ..." would be a start.
It does. At least in the way I want to clean it up. That is, move stuff from too-small subcategories to the main category. You have to apply for deletion of the subcategory, get an objection to that, and nothing happens. That's pretty slow I say - 2 weeks to NOT get what you want.
Why on earth do you want to do this?
I could go and do some categories, but at best it's a mere small improvement where elsewhere there's double as much worsening. More likely I will simply get reverted. If I'm unlucky I'm getting reverted AND branded a vandal.
If you're proposing going around deleting most categories without discussing it with anyone, then yes, you would probably be branded a vandal. :)
Steve
2006/8/24, Steve Bennett stevage@gmail.com:
You may think it's logical that being in a subcategory means that there some kind of connection with the main category, and I may think so, but others don't. And when they don't think that something is illogical, they don't think their own actions that lead to such a situation are undesirable.
All of this is caused by the fact that we don't state anywhere what "subcategory" means. It doesn't mean anything! Maybe even a convention of locally defining it, like "Subcategories of this category must ..." would be a start.
It wouldn't solve anything. The category "British rockbands" would have "Subcategories of this category should be British rockbands" and category "Beatles" would have "subcategories of this category should be closely related to the Beatles" - or in my example, countries would say ''Subcategories of this category should be countries" and Germany would say "Subcategories of this category should be categories with German topics"
It does. At least in the way I want to clean it up. That is, move stuff from too-small subcategories to the main category. You have to apply for deletion of the subcategory, get an objection to that, and nothing happens. That's pretty slow I say - 2 weeks to NOT get what you want.
Why on earth do you want to do this?
Because a category, to me, is a way to find related pages. And that is easiest with a certain size, say between 10 and 40 pages in the category. If there is a nice category with 15 subjects, I don't want to have to stroll through itself and all its 7 subcategories and sub-subcategories to find them all. And that's not an example that took me long to find.
I could go and do some categories, but at best it's a mere small improvement where elsewhere there's double as much worsening. More likely I will simply get reverted. If I'm unlucky I'm getting reverted AND branded a vandal.
If you're proposing going around deleting most categories without discussing it with anyone, then yes, you would probably be branded a vandal. :)
Well, that's one thing. Another would be putting [[Germany]] in [[Category:Countries]] and taking [[Category:Germany]] out of it.
On 8/24/06, Andre Engels andreengels@gmail.com wrote:
It wouldn't solve anything. The category "British rockbands" would have "Subcategories of this category should be British rockbands" and
I would have thought subcats of "British rock bands" should more specific categories of British rock bands, or British rock bands categorised by city etc. Or maybe even "Defunct British rock bands" etc etc. Of course, *articles* in the category should be actual bands...
category "Beatles" would have "subcategories of this category should be closely related to the Beatles" - or in my example, countries would say ''Subcategories of this category should be countries" and Germany would say "Subcategories of this category should be categories with German topics"
You're very negative about your fellow editors :)
Why on earth do you want to do this?
Because a category, to me, is a way to find related pages. And that is easiest with a certain size, say between 10 and 40 pages in the
Remember how I was saying "subcategories don't work"? Here's why. In theory, there is absolutely no reason you'd want one category of 50 articles rather than 5 subcats of 10 articles each. It's simply that the MediaWiki software is particularly bad at browsing categories. (Mostly because we don't know what subcategories actually mean.)
category. If there is a nice category with 15 subjects, I don't want to have to stroll through itself and all its 7 subcategories and sub-subcategories to find them all. And that's not an example that took me long to find.
You're absolutely right. But your solution is wrong.
Well, that's one thing. Another would be putting [[Germany]] in [[Category:Countries]] and taking [[Category:Germany]] out of it.
That seems to me to be the right thing to do. If anything, Category:Germany should be in "Category:Country categories" or something.
Steve
2006/8/24, Steve Bennett stevage@gmail.com:
On 8/24/06, Andre Engels andreengels@gmail.com wrote:
It wouldn't solve anything. The category "British rockbands" would have "Subcategories of this category should be British rockbands" and
I would have thought subcats of "British rock bands" should more specific categories of British rock bands, or British rock bands categorised by city etc. Or maybe even "Defunct British rock bands" etc etc. Of course, *articles* in the category should be actual bands...
Yes, that's what YOU would have thought, and what I would have thought. But someone who wants to put Category:Beatles in Category:British rockbands is going to think differently. And act in that way. And we can say "you should not put categories about rock bands in here", but that's just the same argument that we previously had about [[Category:The Beatles]]. There's nothing won by putting such a text on the category page, because someone who is of the opinion that [[Category:The Beatles]] should be in, is just going to put a text there so that it can go in, and someone who is of the opinion that it should not be in, is going to put a text there so that it cannot go in. Only difference is the place where you get the edit war.
category "Beatles" would have "subcategories of this category should be closely related to the Beatles" - or in my example, countries would say ''Subcategories of this category should be countries" and Germany would say "Subcategories of this category should be categories with German topics"
You're very negative about your fellow editors :)
Yes, but also realistic. Someone is not going to change their mind just because they have to write it down somewhere else.
Because a category, to me, is a way to find related pages. And that is easiest with a certain size, say between 10 and 40 pages in the
Remember how I was saying "subcategories don't work"? Here's why. In theory, there is absolutely no reason you'd want one category of 50 articles rather than 5 subcats of 10 articles each. It's simply that the MediaWiki software is particularly bad at browsing categories. (Mostly because we don't know what subcategories actually mean.)
Well, I want solutions that work now, not solutions that would work if the code were changed as well as the policies.
Well, that's one thing. Another would be putting [[Germany]] in [[Category:Countries]] and taking [[Category:Germany]] out of it.
That seems to me to be the right thing to do. If anything, Category:Germany should be in "Category:Country categories" or something.
That's your opinion, and mine. Not that of those who have a say about the category structure on the Dutch Wikipedia.
On 8/24/06, Andre Engels andreengels@gmail.com wrote:
Yes, that's what YOU would have thought, and what I would have thought. But someone who wants to put Category:Beatles in Category:British rockbands is going to think differently. And act in that way. And we can say "you should not put categories about rock bands in here", but that's just the same argument that we previously had about [[Category:The Beatles]]. There's nothing won by putting such a text on the category page, because someone who is of the opinion that [[Category:The Beatles]] should be in, is just going to put a text there so that it can go in, and someone who is of the opinion that it should not be in, is going to put a text there so that it cannot go in. Only difference is the place where you get the edit war.
I'm getting the impression you don't like wikis much...;)
Well, I want solutions that work now, not solutions that would work if the code were changed as well as the policies.
Patience, grasshopper...since this is the technical mailing list, why not discuss the code changes that would make all of this pain go away? What would the right solution be?
Steve
2006/8/24, Steve Bennett stevage@gmail.com:
I'm getting the impression you don't like wikis much...;)
Let's just say that I wouldn't hate Wikipedia this much if I wouldn't love it this much. I've been with Wikipedia for over 5 years now, and I'm proud to have been part of it. But that also means that I'm getting much too worked-up each time things don't go as I think they should go. And that's not good - I have broken for quite some money over petty, useless fights about Wikipedia the last few days (I deformed an unused cd rack bent my curtain track beyond repair while discussing with you). I know I should just stop, but I can't. It's too addictive.
Well, I want solutions that work now, not solutions that would work if the code were changed as well as the policies.
Patience, grasshopper...since this is the technical mailing list, why not discuss the code changes that would make all of this pain go away? What would the right solution be?
I don't know. Showing pages in subcategories with the category (preferably such that it can be switched off but is switched on by default) would be a good idea, but I'm afraid it should have been finished not yesterday but one year ago to really work. The possibility to have different 'aspects' and get an intersection would be nice too, but probably would take very little time to become as much of a mess as are categories now.
Imagine that we would want to have the full Wikipedia on one wiki page, and then allow only section editing on the lowest level. That's the kind of mess categories have become.
On 8/24/06, Andre Engels andreengels@gmail.com wrote:
I don't know. Showing pages in subcategories with the category (preferably such that it can be switched off but is switched on by default) would be a good idea, but I'm afraid it should have been finished not yesterday but one year ago to really work. The possibility to have different 'aspects' and get an intersection would be nice too, but probably would take very little time to become as much of a mess as are categories now.
Imagine it were possibly to say, formally (with some special syntax) "This is a strict taxonomy. Every item in every subcategory is effectively in this category too." Would this be good? Some possibilities:
* It would be *possible* to show all those eventual subsubsubcategory articles on the same page (resolving most of your concerns) * Hypothetical "random article from category" would be really useful. * Generate graphical trees, and navigate the tree.
I guess I'm proposing a model where categories overall form a loose, generally acyclic graph, but with local trees. I wonder what's the simplest way this could be done? Probably something as simple as a special tag on the parent category, or maybe a special naming scheme.
I'm liking this idea.
Steve
Steve Bennett wrote:
So what *do* our subcategories mean?
Take one more step back and ask: What do our categories mean?
In the paper "Analyzing and Visualizing the Semantic Coverage of Wikipedia and Its Authors", http://arxiv.org/abs/cs.IR/0512085 from December 2005, the authors make an analysis of the English Wikipedia's categories, but the largest categories (table 2, page 10) are hardly "semantic" subject headings, but a technical tool for identifying various quality problems:
1. Disambiguation 40,062 articles 2. 1911_Britannica 10,450 3. Film_stubs 4,867 4. Musician_stubs 4,575 5. American_actors 4,551 6. American_people_stubs 4,401 7. Film_actors 4,023 8. Musical_group_stubs 3,873 9. Album_stubs 3,859 10. Television_actors 3,207 11. Politician_stubs 3,021 12. Articles_to_be_merged 2,899 13. British_people_stubs 2,706 14. American_politician_stubs 2,694 15. Television_stubs 2,540 16. American_actor_stubs 2,492 17. Cleanup_from_October_2005 2,466 18. Incomplete_lists 2,362 19. Football_(soccer)_biography_stubs 2,298 20. Medicine_stubs 2,297
Today's http://en.wikipedia.org/wiki/Special:Mostlinkedcategories isn't much different.
Perhaps it does make some sense to study which users contributed most to the category "cleanup from October 2005", but rather for finding dilettantes than for finding experts on a subject.
When the authors write that "category assignments were introduced in May 2004 and 78,977 unique categories have come into existence since then" I cannot help but laugh and remember Larry Sanger's initial reaction when this was discussed in January 2002, http://mail.wikipedia.org/pipermail/wikipedia-l/2002-January/018685.html
(Here, "May 2004" refers to new functionality in MediaWiki 1.3.)
Today "categories" is a technical tool that can be used for many purposes. Organizing and structuring the contents of Wikipedia according to subject headings is only one of them. Compare this to another technical tool, "templates", which can be used for cleanup messages, for navigation, for infoboxes, for userboxes, for external link syntax, etc.
The pages describing these features in the Help: and Wikipedia: namespaces are confusing these issues all the time. I think we should aim to separate the issues 1. what do you want to accomplish, and 2. what technical tools are available at this time. For example, if I want to create an infobox, I could use raw HTML tables, wikitext table syntax, templates with numeric parameters or templates with named parameters. I could also use a robot to put these infoboxes in a number of articles. Next year, another version of MediaWiki will provide more tools, but my purpose is still to create an infobox.
On 8/24/06, Lars Aronsson lars@aronsson.se wrote:
When the authors write that "category assignments were introduced in May 2004 and 78,977 unique categories have come into existence since then" I cannot help but laugh and remember Larry Sanger's initial reaction when this was discussed in January 2002, http://mail.wikipedia.org/pipermail/wikipedia-l/2002-January/018685.html
Heh, nice. Interestingly the original syntax proposed was "{{CATEGORY A category,Another category}}" - ok, the comma was ugly, but who talked him out of {{}} and into [[ ]] notation?
The pages describing these features in the Help: and Wikipedia: namespaces are confusing these issues all the time. I think we should aim to separate the issues 1. what do you want to accomplish, and 2. what technical tools are available at this
Good questions. Just based on observation, we apparently have these needs: - Automatically producing lists of pages sharing some property, for maintenance - Browsing related pages - Structuring pages to make working with large numbers of pages more tractable (especially for WikiProjects)
What else?
Steve
On 8/24/06, Steve Bennett stevage@gmail.com wrote:
How would you describe the relationship between Category:John Lennon and Category:The Beatles, knowing that Category:John Lennon contains songs that have nothing to do with The Beatles? Actually, just to flesh this out, what are the relationships between these (possibly fictional) categories: English rock bands, The Beatles, John Lennon, John Lennon songs, The Beatles songs. Similarly, into which categories would these articles go: [[The Beatles]], [[John Lennon]], [[I Am The Walrus]] (John Lennon/Beatles song), [[Imagine (song)]] (John Lennon solo song).
It's simple. Look up at my earlier subcategories of humans:
==Subcategories== All of the following are {{PAGENAME}}.
* People by city * People by company * People by country of residence . . .
The key line is "All of the following are {{PAGENAME}}.". Well, let's rephrase that slightly: "All articles in each of the following categories are {{PAGENAME}}." Given the line "All articles in each of the following categories are British rock bands", does Category:The Beatles fit?
Now, one problem here is that "The Beatles" actually means "Beatles-related things" here, just as "Jesus" means "Jesus-related things". This is something that will need to be clarified. To that end, notice that I've introduced multiple constructions *hard-coded into the category page* that assume that the category name is in the plural--no more lazy ambiguous shortcuts like "Category:Jesus". If "article X and Y are in category Z" is not synonymous to "article X and Y are Z", then the category is wrong.
Of course, people could change the interface back, but with some encouragement I don't think they would. You just have to make a change that will allow them to have "related to" built into the system, and combined with easy recategorizing/category renaming, I think they'll see the light.
On 8/24/06, Steve Bennett stevage@gmail.com wrote:
How would you describe the relationship between Category:John Lennon and Category:The Beatles, knowing that Category:John Lennon contains songs that have nothing to do with The Beatles? Actually, just to flesh this out, what are the relationships between these (possibly fictional) categories: English rock bands, The Beatles, John Lennon, John Lennon songs, The Beatles songs. Similarly, into which categories would these articles go: [[The Beatles]], [[John Lennon]], [[I Am The Walrus]] (John Lennon/Beatles song), [[Imagine (song)]] (John Lennon solo song).
It's simple. Look up at my earlier subcategories of humans:
==Subcategories== All of the following are {{PAGENAME}}.
- People by city
- People by company
- People by country of residence
. . .
The key line is "All of the following are {{PAGENAME}}.". Well, let's rephrase that slightly: "All articles in each of the following categories are {{PAGENAME}}." Given the line "All articles in each of the following categories are British rock bands", does Category:The Beatles fit?
Part of the problem is that "British rock bands" implies a group of multiple people making music.
If that assumption is removed, then the John Lennon example works a little bit better.
E.g.: British rock musicians -> The Beatles -> John Lennon.
I.e. John Lennon "is a" Beatle The Beatles "are" British rock musicians John Lennon "is a" British rock musician (transitive example).
I had to make some plurals singular to make the above work as English sentences, but it avoids the "John Lennon 'is a' British rock band" problem.
Then the next problem is that John Lennon has two famous parts of his career - one as part of the Beatles, and one as a solo artist. Maybe you need to separate those two concepts?
How about this:
[[British rock musicians]]---------------- | \ [[John Lennon]]---------------- [[The Beatles]] | \ | [[John Lennon (solo career)]] [[John Lennon (Beatles career)]] | | [[Imagine (song)]] [[I Am The Walrus]]
Now, there are still problems with this: E.g. you can't say "Imagine (song) 'is a' John Lennon (solo career)", or "I Am The Walrus 'is a' John Lennon (Beatles career)", or "Imagine (song) 'is a' British rock musicians".
So the category / subcategory relationship is still a mess :-)
But what if when you specified a category, if you could say what the relationship was? (And the default could be "is a").
E.g.: * in [[John Lennon]], you could have [[Category:British rock musicians|relationship=is a]] * in [[Imagine (song)]], you could have [[Category:John Lennon (solo career)|relationship=composed by]] * in [[The Beatles]], you could have [[Category:John Lennon (Beatles career)|relationship=member of]] * in [[John Lennon]], you could have [[Category:John Lennon (solo career)|relationship=part of career]]
Then you could apply transitive stuff like "I Am The Walrus 'was composed by' John Lennon (Beatles career) 'who is a member of' The Beatles 'who are' British rock musicians"; and "I Am The Walrus 'was composed by' John Lennon (Beatles career) 'which a part of the career of' John Lennon 'who is a' British rock musician".
Also in such a system, having loops mightn't be incorrect or a problem at all, as long as the relationship between concepts is specified (and correct).
Also, you would be able to combine transitive concepts, as long as the relationship between then was the same. For example,
[[Labrador]] | is a | [[Dog]] | is a | [[Animal]]
Then you could logically transform the above into "Labrador is an Animal", and still be correct (since the relationship in both cases is the same, and therefore can be folded down into one step if desired).
Anyway, that's my thoughts on the topic. But I personally don't care that much about categorization, so feel free to ignore :-)
All the best, Nick.
On 8/24/06, Nick Jenkins nickpj@gmail.com wrote:
E.g. you can't say "Imagine (song) 'is a' John Lennon (solo career)", or "I Am The Walrus 'is a' John Lennon (Beatles career)", or "Imagine (song) 'is a' British rock musicians".
That's because you haven't quite gotten what I'm getting at yet. The category wouldn't be called "John Lennon", because nothing is a John Lennon (other than, presumably, John Lennon, who would be rather lonely). Rather, under "things related to Beatles", you would have "things related to John Lennon" or something. Inelegant, I know, but I think you see where I'm going with this?
The main problem is the terminology. I mean, it works logically, but I don't think people will go for saying "No, things-related-to-the-Beatles isn't a supercategory of things-related-to-John-Lennon, things-related-to-the-Beatles is only related to things-related-to-John-Lennon!" But that's the fault of the English language. I leave it to the wordsmiths to improve upon my terminology. :P
But what if when you specified a category, if you could say what the relationship was?
That's Wikidata. We're still waiting. I was putting forth an interim solution, adding only one new type of category-inclusion, so that you don't have to explicitly state all sorts of things like transitivity and deal with arbitrary numbers of relations and so forth. To be honest, it probably wouldn't be that much of a time-saver, so there's probably no point in an interim solution.
We just need to get someone to stick to one of the various Wikidata-ish projects and finish it. Hopefully this Semantic MediaWiki business that was just committed as an extension will end up in a state where the WMF is willing to use it.
Simetrical wrote:
The main problem is the terminology. [...] the fault of the English language. I leave it to the wordsmiths [...]
As I started to look into categories recently, I wanted to draw the category tree and print it like a mindmap on a poster. I'm realizing now that this is a not going to be possible. Instead we have an archipelago of categories, loosely linked in a misty sea of subcategory relationships. And some of the subs can be our enemies. What I need here is a radar screen.
Hmm... can anybody morph this http://web.archive.org/web/20010604173556/www.jimmywales.com/jimmy.jpg with this http://en.wikipedia.org/wiki/Image:The_Hunt_for_Red_October_movie_poster.JPG
E.g. you can't say "Imagine (song) 'is a' John Lennon (solo career)", or "I Am The Walrus 'is a' John Lennon (Beatles
career)", or
"Imagine (song) 'is a' British rock musicians".
That's because you haven't quite gotten what I'm getting at yet. The category wouldn't be called "John Lennon", because nothing is a John Lennon (other than, presumably, John Lennon, who would be rather lonely). Rather, under "things related to Beatles", you would have "things related to John Lennon" or something. Inelegant, I know, but I think you see where I'm going with this?
OK, but how's that any different than what we have now, except with more words involved, and more potential for confusion?
I.e. you can still say Imagine (song) is a thing related to John Lennon, which is a thing related to The Beatles. The relationship between the Beatles and Imagine (song) is somewhat tenuous, I think.
Also it leads to things like saying "Jesus is related somehow to Waves", to use the original loop example. I'm just not sure that it tells you anything useful.
The main problem is the terminology. I mean, it works logically, but I don't think people will go for saying "No, things-related-to-the-Beatles isn't a supercategory of things-related-to-John-Lennon, things-related-to-the-Beatles is only related to things-related-to-John-Lennon!" But that's the fault of the English language. I leave it to the wordsmiths to improve upon my terminology. :P
So you're saying that supercategories should not imply any transitive relationship whatsoever? But isn't that worse than the current situation, which at least tries for some vague ill-defined type of relationship? Granted, it seems to fail, but at least it tries.
But what if when you specified a category, if you could say what the relationship was?
That's Wikidata. We're still waiting.
Ah, ok. I've just quickly read through the Wikidata page on meta, and while it sounds interesting, it's trying to do much more than just store a relationship type between categories ... possibly too much more.
Hmm... can anybody morph this http://web.archive.org/web/20010604173556/www.jimmywales.com/jimmy.jpg with this http://en.wikipedia.org/wiki/Image:The_Hunt_for_Red_October_movie_poster.JPG
[[Sailing Yacht]] -- is a --> [[sailing ship]] -- is a --> [[Category:Ship types]] ^ | is a | [[The Hunt for Red October]] -- is about a --> [[submarine]]
... or is it a trick question?
All the best, Nick.
On 8/24/06, Nick Jenkins nickpj@gmail.com wrote:
I.e. you can still say Imagine (song) is a thing related to John Lennon, which is a thing related to The Beatles. The relationship between the Beatles and Imagine (song) is somewhat tenuous, I think.
Wave-related stuff -> Sound-related stuff -> Music-related stuff -> Music events -> Music competitions would be a hierarchy; Music competitions -> Eurovision Song Contest-related stuff would *not* be used. Instead, the Eurovision Song Contest article would be in Music competitions, and the Eurovision Song Contest-related stuff category would be *related to* the Music competitions category, not a subcategory of it, so the loop would break. Not everything related to the ESC is a music competition.
Alternatively, if we went Music-related stuff -> Music event-related stuff -> Music competition-related stuff -> Eurovision Song Contest-related stuff, the loop would just break one step later: not all Eurovision host city-related stuff is related to the Eurovision Song Contest (the overwhelming majority is not). Therefore, that would also be "related to", not a subcategory, because it fails the "all X are Y" test. So again, no loop. (And don't try to give me a counterexample, because I'm saying that categories' descendants should be a strict subset of their supercategories' descendants, and that makes loops logically impossible.)
So you're saying that supercategories should not imply any transitive relationship whatsoever?
Quite the contrary. Supercategories would be exclusively transitive; related-to categories would be nontransitive in general.
I think this is an excessively confusing conversation for a largely pointless feature, though, so maybe we should drop it here.
2006/8/23, Neil Harris neil@tonal.clara.co.uk:
The problem is different senses of 'subcategory': there's [1] "the set of objects in B are a strict subset of the set of objects in A", which is loop-proof, and [2] 'B is a topic that is usually discussed in the context of A'
For example:
London -[2]-> Thames Valley -[1]-> London, comes to mind
Russia -[2]-> Soviet Union -[1]-> Russia, as well.
Also, another entirely reasonable cycle would be:
Universe -[contains]-> Human beings -[who have]-> Human thought -[which includes]-> Philosophy -[which studies the]-> Universe
Doing a search of cycles, I found that there must be thousands if not more of them on the English Wikipedia. Yours differ somewhat from the average by being short. I chose a category more or less at random (it was [[Category:Waves]]). It had 999 supercategories, and was part of 6 cycles (counting only elementary cycles):
Waves -> Sound -> Music -> Music events -> Music competitions -> Eurovision Song Contest -> Eurovision host cities -> Rome -> History of Rome -> Ancient Rome -> Ancient Roman religion -> Ancient Roman Christianity -> Patristics -> Heresy -> Heretics -> People executed for heresy -> Jesus -> Doctrines and teachings of Jesus -> Nonviolence -> Peace -> Social justice -> Community organizing -> Protests -> History of social movements -> Critical theory -> Cultural studies -> Museology -> Anthropology -> Linguistics -> Proper nouns -> Organizations -> Educational organizations -> Schools -> School subjects -> Biology -> Tree of life -> Eukaryotes -> Animals -> Chordates -> Tetrapods -> Synapsids -> Therapsids -> Theriodonts -> Mammals -> Primates -> Apes -> Humans -> Technology -> Systems -> Dynamical systems -> Waves
Waves -> Sound -> Music -> Music events -> Music competitions -> Eurovision Song Contest -> Eurovision host cities -> Rome -> History of Rome -> Ancient Rome -> Ancient Roman religion -> Ancient Roman Christianity -> Patristics -> Heresy -> Heretics -> People executed for heresy -> Jesus -> Doctrines and teachings of Jesus -> Nonviolence -> Peace -> Social justice -> Community organizing -> Protests -> History of social movements -> Critical theory -> Cultural studies -> Museology -> Anthropology -> Linguistics -> Proper nouns -> Organizations -> Educational organizations -> Schools -> School subjects -> Biology -> Tree of life -> Eukaryotes -> Animals -> Chordates -> Tetrapods -> Synapsids -> Therapsids -> Theriodonts -> Mammals -> Primates -> Apes -> Humans -> Technology -> Applied sciences -> Engineering -> Mechanics -> Dynamical systems -> Waves
Waves -> Sound -> Music -> Music events -> Music competitions -> Eurovision Song Contest -> Eurovision host cities -> Rome -> History of Rome -> Ancient Rome -> Ancient Roman religion -> Ancient Roman Christianity -> Patristics -> Heresy -> Heretics -> People executed for heresy -> Jesus -> Doctrines and teachings of Jesus -> Nonviolence -> Peace -> Social justice -> Community organizing -> Protests -> History of social movements -> Critical theory -> Cultural studies -> Museology -> Anthropology -> Linguistics -> Proper nouns -> Organizations -> Educational organizations -> Schools -> School subjects -> Biology -> Tree of life -> Eukaryotes -> Animals -> Chordates -> Tetrapods -> Synapsids -> Therapsids -> Theriodonts -> Mammals -> Primates -> Apes -> Humans -> Technology -> Applied sciences -> Education -> Education by subject -> Physics education -> Introductory physics -> Waves
Waves -> Sound -> Music -> Music events -> Music competitions -> Eurovision Song Contest -> Eurovision host cities -> Rome -> History of Rome -> Ancient Rome -> Ancient Roman religion -> Ancient Roman Christianity -> Patristics -> Heresy -> Heretics -> People executed for heresy -> Jesus -> Doctrines and teachings of Jesus -> Nonviolence -> Peace -> Social justice -> Community organizing -> Protests -> History of social movements -> Critical theory -> Cultural studies -> Museology -> Anthropology -> Linguistics -> Proper nouns -> Organizations -> Educational organizations -> Schools -> School subjects -> Biology -> Tree of life -> Eukaryotes -> Animals -> Chordates -> Tetrapods -> Synapsids -> Therapsids -> Theriodonts -> Mammals -> Primates -> Apes -> Humans -> Technology -> Applied sciences -> Education -> Academia -> Academic disciplines -> Mathematics -> Dynamical systems -> Waves
Waves -> Sound -> Music -> Music events -> Music competitions -> Eurovision Song Contest -> Eurovision host cities -> Rome -> History of Rome -> Ancient Rome -> Ancient Roman religion -> Ancient Roman Christianity -> Patristics -> Heresy -> Heretics -> People executed for heresy -> Jesus -> Doctrines and teachings of Jesus -> Nonviolence -> Peace -> Social justice -> Community organizing -> Protests -> History of social movements -> Critical theory -> Cultural studies -> Museology -> Anthropology -> Linguistics -> Proper nouns -> Organizations -> Educational organizations -> Schools -> School subjects -> Biology -> Tree of life -> Eukaryotes -> Animals -> Chordates -> Tetrapods -> Synapsids -> Therapsids -> Theriodonts -> Mammals -> Primates -> Apes -> Humans -> Technology -> Applied sciences -> Education -> Academia -> Academic disciplines -> Mathematics -> Differential equations -> Waves
Waves -> Sound -> Music -> Music events -> Music competitions -> Eurovision Song Contest -> Eurovision host cities -> Rome -> History of Rome -> Ancient Rome -> Ancient Roman religion -> Ancient Roman Christianity -> Patristics -> Heresy -> Heretics -> People executed for heresy -> Jesus -> Doctrines and teachings of Jesus -> Nonviolence -> Peace -> Social justice -> Community organizing -> Protests -> History of social movements -> Critical theory -> Cultural studies -> Museology -> Anthropology -> Linguistics -> Proper nouns -> Organizations -> Educational organizations -> Schools -> School subjects -> Biology -> Tree of life -> Eukaryotes -> Animals -> Chordates -> Tetrapods -> Synapsids -> Therapsids -> Theriodonts -> Mammals -> Primates -> Apes -> Humans -> Technology -> Applied sciences -> Education -> Academia -> Academic disciplines -> Mathematics -> Equations -> Differential equations -> Waves
Interestingly, checking out your examples, I found none of them (I checked London, Russia and Universe, with 1060, 1013 and 1001 supercategories) is currently part of a loop.
For Jesus on the other hand, there are 39 different cycles, which however all start with: Jesus -> Doctrines and teachings of Jesus -> Nonviolence -> Peace -> Social justice -> Community organizing -> Protests -> History of social movements -> Critical theory -> Cultural studies
On 8/22/06, Charlie Reams calr3@cam.ac.uk wrote:
Can someone point me to one of these cycles?
The best hypothetical I could come up with is Humans > Science > Anthropology > Humans. Anthropology is a subset of Science, but Science is something done exclusively (that we know of) by humans, and Humans are Anthropology's subject of study. If we separated "strict sub/superset" (as current category system) and "related to" (not as current category system: symmetric and non-transitive), as many have suggested, there should never be any need for circularity.
Steve Bennett wrote:
.. c) To assist quality control, such as labelling articles that need cleanup etc
It seems rather pointless to me to label 20k pages needing citation. But that's probably not a technical question.
If I do a random walk on en Wikipedia, almost every page seems to have some tag needing something ("This page is in need of <insert your pet peeve here>"). This reminds me of all these "under construction" pages in the earlier days of the web.
On 8/21/06, Ligulem ligulem@pobox.com wrote:
Steve Bennett wrote:
.. c) To assist quality control, such as labelling articles that need cleanup etc
It seems rather pointless to me to label 20k pages needing citation. But that's probably not a technical question.
Totally agree on that one. There are some useful ones though, like "pages needing categorisation", "pages needing LaTeX formatting" etc. Any task that can be performed by a non-subject specialist, in particular.
If I do a random walk on en Wikipedia, almost every page seems to have some tag needing something ("This page is in need of <insert your pet peeve here>"). This reminds me of all these "under construction" pages in the earlier days of the web.
Yeah. I can't stand {{cleanup}} - what's the point? But I occasionally use {{wfy}} or {{globalize-USA}} or whatever, the latter mostly to let off steam :)
Steve
On Mon, Aug 21, 2006 at 10:35:23AM +0200, Ligulem wrote:
Steve Bennett wrote:
.. c) To assist quality control, such as labelling articles that need cleanup etc
It seems rather pointless to me to label 20k pages needing citation. But that's probably not a technical question.
How many editors does en have?
If I do a random walk on en Wikipedia, almost every page seems to have some tag needing something ("This page is in need of <insert your pet peeve here>"). This reminds me of all these "under construction" pages in the earlier days of the web.
Well, perhaps, but those sites weren't being crawled hourly (and minutely :-), but people a) equipped and b) inclined to fix things.
I'd ask if anyone has any statistics on how often those tags are affixed and removed, but...
Cheers, -- jra
On 21/08/06, Jay R. Ashworth jra@baylink.com wrote:
I'd ask if anyone has any statistics on how often those tags are affixed and removed, but...
Wikipedians prefer to highlight and tag problems than to take 30 seconds to fix them, it seems. I can understand that larger cases crop up, and warrant attention from more than one person, but stupid little unsourced stubs don't need to be tagged - either find a source on the web, or find reference that a source might exist within five minutes, or "prod" it, or whatever we do now.
Do we still have AfD, or was it renamed again?
Rob Church
On Mon, Aug 21, 2006 at 02:24:19PM +0100, Rob Church wrote:
On 21/08/06, Jay R. Ashworth jra@baylink.com wrote:
I'd ask if anyone has any statistics on how often those tags are affixed and removed, but...
Wikipedians prefer to highlight and tag problems than to take 30 seconds to fix them, it seems. I can understand that larger cases crop up, and warrant attention from more than one person, but stupid little unsourced stubs don't need to be tagged - either find a source on the web, or find reference that a source might exist within five minutes, or "prod" it, or whatever we do now.
Well, that could be a failure in [[Be Bold]], or it could just be laziness. I wonder if there's any way to decide which...
Cheers, -- jra
On 8/21/06, Rob Church robchur@gmail.com wrote:
Wikipedians prefer to highlight and tag problems than to take 30 seconds to fix them, it seems. I can understand that larger cases crop up, and warrant attention from more than one person, but stupid little unsourced stubs don't need to be tagged - either find a source on the web, or find reference that a source might exist within five minutes, or "prod" it, or whatever we do now.
Are you talking about presumed hoaxes or presumed good articles that simply don't have any citations?
Do we still have AfD, or was it renamed again?
You mean Arguments for Dimwits?
Steve
On 21/08/06, Steve Bennett stevage@gmail.com wrote:
On 8/21/06, Rob Church robchur@gmail.com wrote:
Wikipedians prefer to highlight and tag problems than to take 30 seconds to fix them, it seems. I can understand that larger cases crop up, and warrant attention from more than one person, but stupid little unsourced stubs don't need to be tagged - either find a source on the web, or find reference that a source might exist within five minutes, or "prod" it, or whatever we do now.
Are you talking about presumed hoaxes or presumed good articles that simply don't have any citations?
I'm talking about the little stub that looks 50/50, or the single spelling error that causes them to roll out the {{cleanup}} tag.
Do we still have AfD, or was it renamed again?
You mean Arguments for Dimwits?
Those are the best kind!
Rob Church
On 8/21/06, Rob Church robchur@gmail.com wrote:
I'm talking about the little stub that looks 50/50, or the single
I hate those. I really do. They take a lot of time and usually aren't worth it. We should probably prod them. Probably no cause for speedy, and at least by prodding them we're raising a red flag to potential hoaxees.
spelling error that causes them to roll out the {{cleanup}} tag.
That's just dumb. I occasionally use {{cleanup}} on pages that just don't look remotely like a Wikipedia article, or for pages where it's obvious someone started a merge operation and never finished, leaving redundant hunks of text all over the place. Anything minor...well...I wonder if anyone actually goes around carrying out these cleanup operations.
Steve
On 8/21/06, Steve Bennett stevage@gmail.com wrote:
On 8/21/06, Rob Church robchur@gmail.com wrote:
I'm talking about the little stub that looks 50/50, or the single
I hate those. I really do. They take a lot of time and usually aren't worth it. We should probably prod them. Probably no cause for speedy, and at least by prodding them we're raising a red flag to potential hoaxees.
spelling error that causes them to roll out the {{cleanup}} tag.
That's just dumb. I occasionally use {{cleanup}} on pages that just don't look remotely like a Wikipedia article, or for pages where it's obvious someone started a merge operation and never finished, leaving redundant hunks of text all over the place. Anything minor...well...I wonder if anyone actually goes around carrying out these cleanup operations.
Maybe if there were a "go to a random article in need of cleanup" feature then more people would do just that...
Andrew Dunbar (hippietrail)
Steve _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On 8/21/06, Andrew Dunbar hippytrail@gmail.com wrote:
Maybe if there were a "go to a random article in need of cleanup" feature then more people would do just that...
There is one. You navigate to the category and pick the first one off the list. As it happens, the randomization algorithm has a strong bias toward page names starting with Aaaaaaaaaa, but that's not relevant to addressing the quality of the encyclopedia, is it? I doubt anyone's going to notice or care that our articles beginning with A are higher-quality than those starting with M . . .
On 8/22/06, Simetrical Simetrical+wikitech@gmail.com wrote:
On 8/21/06, Andrew Dunbar hippytrail@gmail.com wrote:
Maybe if there were a "go to a random article in need of cleanup" feature then more people would do just that...
There is one. You navigate to the category and pick the first one off the list. As it happens, the randomization algorithm has a strong bias toward page names starting with Aaaaaaaaaa, but that's not relevant to addressing the quality of the encyclopedia, is it? I doubt anyone's going to notice or care that our articles beginning with A are higher-quality than those starting with M . . .
In fact maybe they already do this and nobody actually does notice or care (-:
Andrew Dunbar (hippietrail)
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On Tue, Aug 22, 2006 at 03:03:22PM +1000, Andrew Dunbar wrote:
On 8/22/06, Simetrical Simetrical+wikitech@gmail.com wrote:
On 8/21/06, Andrew Dunbar hippytrail@gmail.com wrote:
Maybe if there were a "go to a random article in need of cleanup" feature then more people would do just that...
There is one. You navigate to the category and pick the first one off the list. As it happens, the randomization algorithm has a strong bias toward page names starting with Aaaaaaaaaa, but that's not relevant to addressing the quality of the encyclopedia, is it? I doubt anyone's going to notice or care that our articles beginning with A are higher-quality than those starting with M . . .
In fact maybe they already do this and nobody actually does notice or care (-:
Actually, it *is* useful for the 'randomization algorithm' to be a bit more... random than that; it spreads the work around, and reduces the "40 people all trying to fix [[Afghanistan]]" problem.
Cheers, -- jra
On 8/22/06, Jay R. Ashworth jra@baylink.com wrote:
Actually, it *is* useful for the 'randomization algorithm' to be a bit more... random than that; it spreads the work around, and reduces the "40 people all trying to fix [[Afghanistan]]" problem.
So pick a random member off the first page instead. Or from the second sometimes. Or the tenth. You don't need a built-in system to do that, just pick one. There are probably only going to be like ten people who actually do this on a regular basis, even on enwiki, however much encouragement is provided, and it's not like it usually takes that long to do cleanup, so conflicts are unlikely even if you just pick the first article off the first page.
On 8/22/06, Andrew Dunbar hippytrail@gmail.com wrote:
Maybe if there were a "go to a random article in need of cleanup" feature then more people would do just that...
Yeah, this has occurred to me. There are various places that you can see a list of "tasks to do", but just in terms of the motivation factor, it might be interesting to have a single link that takes you to a random article in need of something. More of a "lucky dip" factor....Ideally, we could prioritise those with more incoming links...
Steve
On 8/21/06, Rob Church robchur@gmail.com wrote:
On 21/08/06, Jay R. Ashworth jra@baylink.com wrote:
I'd ask if anyone has any statistics on how often those tags are affixed and removed, but...
Wikipedians prefer to highlight and tag problems than to take 30 seconds to fix them, it seems. I can understand that larger cases crop up, and warrant attention from more than one person, but stupid little unsourced stubs don't need to be tagged - either find a source on the web, or find reference that a source might exist within five minutes, or "prod" it, or whatever we do now.
I don't think it's so simple. I almost always have a Wiktionary open while I'm working at the reception desk and it's so easy to hit random and tag imperfect articles with a couple of fix-it categories that doesn's take your brain away from your job like getting in and editing the article would.
Later when you don't have your mind split in two it's easy to go back through stuff you found earlier than look and fix at the same time.
Andrew Dunbar (hippietrail)
Do we still have AfD, or was it renamed again?
Rob Church _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Rob Church wrote:
On 21/08/06, Jay R. Ashworth jra@baylink.com wrote:
I'd ask if anyone has any statistics on how often those tags are affixed and removed, but...
Wikipedians prefer to highlight and tag problems than to take 30 seconds to fix them, it seems. I can understand that larger cases crop up, and warrant attention from more than one person, but stupid little unsourced stubs don't need to be tagged - either find a source on the web, or find reference that a source might exist within five minutes, or "prod" it, or whatever we do now.
People on Wikipedia do what they want if it's allowed. They will NOT do something else, especially not something that is more work, if something they want to do is discouraged or even forbidden; they'll just not do it at all.
Therefore, discouraging work that actually helps the encyclopedia is a very bad idea indeed.
Timwi
If I do a random walk on en Wikipedia, almost every page seems to have some tag needing something ("This page is in need of <insert your pet peeve here>"). This reminds me of all these "under construction" pages in the earlier days of the web.
There is a very important difference between the two, though. The "under construction" pages are an excuse for a lack of content. They were supposed to show to the visitor that someone's working on it and that it'll be ready "Soon™". The Wikipedia tags, however, are an invitation to participate, i.e. a request to someone else. They are a productivity tool, and they are meant to help improve the content, not the visitor frequency.
Timwi
wikitech-l@lists.wikimedia.org