Marc Riddell wrote
This is what I am asking for in WP. That is why both the main and sub categories need to be entered into each Article.
I recall you saying all this some time ago, a propos your own view of what would be convenient. I'm not clear this actually convenient for most users of Wikipedia, i.e. to have large chunks of nested categories made explicit.
Can you not just accept that the system doesn't revolve about your needs?
Charles
----------------------------------------- Email sent from www.virginmedia.com/email Virus-checked using McAfee(R) Software and scanned for spam
Marc Riddell wrote
This is what I am asking for in WP. That is why both the main and sub categories need to be entered into each Article.
on 4/30/07 9:25 AM, charles.r.matthews@ntlworld.com at charles.r.matthews@ntlworld.com wrote:
I recall you saying all this some time ago, a propos your own view of what would be convenient. I'm not clear this actually convenient for most users of Wikipedia, i.e. to have large chunks of nested categories made explicit.
Can you not just accept that the system doesn't revolve about your needs?
Charles,
In what ways do you use to WP Category System?
Marc Riddell
On 30/04/07, charles.r.matthews@ntlworld.com charles.r.matthews@ntlworld.com wrote:
Marc Riddell wrote
This is what I am asking for in WP. That is why both the main and sub categories need to be entered into each Article.
I recall you saying all this some time ago, a propos your own view of what would be convenient. I'm not clear this actually convenient for most users of Wikipedia, i.e. to have large chunks of nested categories made explicit.
Can you not just accept that the system doesn't revolve about your needs?
I am going to be blunt and agree here; I don't think "the category system is fundamentally broken" can reasonably come from "because it doesn't work the way I think it ought". My subtle attempts to intimate this don't seem to have worked, and I want to be clear that I think we're all barking up the wrong tree. There are flaws in our category system, but this is not one of them.
This is the basic issue here. Marc thinks categories should work in a way that conforms to his expectations - essentially, an undifferentiated list of all things with attribute X. Currently, categories are differentiated lists - topics split up into smaller sublists with each page hopefully only appearing once in any given topic-group.
If we change the current system, things will be convenient for Marc; it will be more useful as a database. However, my experience is that most of our readers aren't looking for a database - they're looking for relatively focused, specific, categorisation for navigational purposes, where a tightly topical category of 20-50 articles is substantially more useful than a grand supercategory of 2000-5000.
(Yes, we could have the tight topical categories in Marc's model - but at the cost of swamping pages with references to a huge number of categories which are redundant to one degree or another, and just make navigation that much more tricky for the user)
Somewhere down thread, the mystical expertise of librarians was invoked. I am one, and I belive that tight categorisation is the way to go. I feel that Marc's model, if implemented in the simple quick-fix method of "just include parent and child categories in the same article" will actually make our categorisation less useful for general readers and editors, which in no way justifies the limited benefit of being able to do fancy searches on articles by subject attributes.
We can get search in other ways; improving the methods we use to search, having some pseudo-database functions we can do with categories, would go a long way towards the desired effect. Categorisation is, however, used by our readers, and we shouldn't break it without a very pressing reason.
On 01/05/07, Marc Riddell michaeldavid86@comcast.net wrote:
on 5/1/07 5:11 PM, Andrew Gray at shimgray@gmail.com wrote:
Categorisation is, however, used by our readers, and we shouldn't break it without a very pressing reason.
Do you, personally, use the current WP categorization system?
As a reader? Hard to say to what degree; I definitely make a great deal of use of it as an editor, but the things you need as an editor are different and it's hard to seperate the two aspects! Most of my "reader" time is looking for specific facets, drilling straight to individual articles, not browsing, so I'm not really the category target audience... I do use it, though, just not as a major navigational method.
I do, however, get to *see* a lot of people using Wikipedia. The categories are effectively used as topic lists - indeed, many people talk of categories *as* lists - in a way which I think bears out my "keep granular" thesis. Open the cat, glance through the list, pick the one that sounds right, go there. It would require a lot more patience to do this for a much broader category.
on 5/1/07 5:11 PM, Andrew Gray at shimgray@gmail.com wrote:
Categorisation is, however, used by our readers, and we shouldn't break it without a very pressing reason.
On 01/05/07, Marc Riddell michaeldavid86@comcast.net wrote:
Do you, personally, use the current WP categorization system?
on 5/2/07 7:45 AM, Andrew Gray at shimgray@gmail.com wrote:
As a reader? Hard to say to what degree; I definitely make a great deal of use of it as an editor, but the things you need as an editor are different and it's hard to seperate the two aspects! Most of my "reader" time is looking for specific facets, drilling straight to individual articles, not browsing, so I'm not really the category target audience... I do use it, though, just not as a major navigational method.
Andrew,
Perhaps this is why we are having a problem finding common ground in this discussion. I am looking at WP Categorization strictly from the POV of a reader. I came to your library, went to your catalogue system, and found it did not meet my needs as a researcher. You, as the librarian, asked me what my needs were, and I told you. Your answer seems to be that we cannot meet your needs without revamping the entire system. I say: OK, I'll wait :-) - as long as you are serious about doing so.
All I am asking, in this age of creative technology, is that someone step up and create a system that can meet the needs of as many of our readers as possible.
Marc
On 02/05/07, Marc Riddell michaeldavid86@comcast.net wrote:
All I am asking, in this age of creative technology, is that someone step up and create a system that can meet the needs of as many of our readers as possible.
Possibly the ideal would be a system that can be all at once: a system of tags that we can run complex queries on, and a category hierarchy of pre-existing queries.
- d.
On 02/05/07, Marc Riddell michaeldavid86@comcast.net wrote:
Perhaps this is why we are having a problem finding common ground in this discussion. I am looking at WP Categorization strictly from the POV of a reader.
...from the point of view of a high-level specialised reader who wants to do somewhat complex searches on datasets, though, not from the point of view of the average user. Very different beasts.
I came to your library, went to your catalogue system, and found it did not meet my needs as a researcher. You, as the librarian, asked me what my needs were, and I told you. Your answer seems to be that we cannot meet your needs without revamping the entire system. I say: OK, I'll wait :-) - as long as you are serious about doing so.
My problem is this: we cannot revamp the entire system to meet your needs, in the way you suggest, *without breaking it for everyone else*; we cannot start increasing the number and size of categories without making them much harder to use as a navigational tool.
Basically, there's a division here between metadata and search. Categories are metadata, applied to the individual articles. Opening a category and looking at it is a very crude form of search, surprising though it may seem at a first glance; it tells you all the articles listed under that particular heading.
You want to do a more advanced form of search, on a larger scale than simple navigation, producing results like "all rivers in Eurasia" or "all people who were born in the 20th century". There are two ways to do this:
a) Add more metadata, but keep the existing 'search'. Start slapping on more categories onto articles, so that - to pick an example - "1987 births" is joined by "1980s births" and "20th century births"
b) Get a better search, but keep the existing metadata. Develop a search system that can parse the existing categories, combine them in various ways, and spit out a list for the researcher.
(a) is quick and dirty, but has problems; it makes it harder to navigate the existing category system from the point of view of the normal reader. It also rapidly becomes unworkable if we start thinking about more complex searches than just "the members of all daughter categories" - imagine if we had "1980s births of folk singers" and "20th century births of folk singers" and all the other possible intersections of the already-existing categories being added to pages! There are just so many possible category intersections for any given page... this really won't scale.
(b) is elegant, and scales well. It has the major advantage of being completely disassociated from the metadata, unlike our existing 'search' method, as it's overlaid on top; it doesn't impact in any way the existing system beyond perhaps making us streamline and rationalise it a little.
Unfortunately, it needs someone to go write it. This is what's holding things up.
All I am asking, in this age of creative technology, is that someone step up and create a system that can meet the needs of as many of our readers as possible.
And all I'm asking is that if we're wanting to create a system, we create a system, we don't try to make the existing one sort-of-useful for everyone at the price of making it not-really-useful to anyone :-)
check out how categories could be used via search as facets at http://futef.com
better tools I think would make the problem more tractable.
derek
On 5/2/07, Andrew Gray shimgray@gmail.com wrote:
On 02/05/07, Marc Riddell michaeldavid86@comcast.net wrote:
Perhaps this is why we are having a problem finding common ground in this discussion. I am looking at WP Categorization strictly from the POV of a reader.
...from the point of view of a high-level specialised reader who wants to do somewhat complex searches on datasets, though, not from the point of view of the average user. Very different beasts.
I came to your library, went to your catalogue system, and found it did not meet my needs as a researcher. You, as the librarian, asked me what my needs were, and I told you. Your answer seems to be that we cannot meet your needs without revamping the entire system. I say: OK, I'll wait :-) - as long as you are serious about doing so.
My problem is this: we cannot revamp the entire system to meet your needs, in the way you suggest, *without breaking it for everyone else*; we cannot start increasing the number and size of categories without making them much harder to use as a navigational tool.
Basically, there's a division here between metadata and search. Categories are metadata, applied to the individual articles. Opening a category and looking at it is a very crude form of search, surprising though it may seem at a first glance; it tells you all the articles listed under that particular heading.
You want to do a more advanced form of search, on a larger scale than simple navigation, producing results like "all rivers in Eurasia" or "all people who were born in the 20th century". There are two ways to do this:
a) Add more metadata, but keep the existing 'search'. Start slapping on more categories onto articles, so that - to pick an example - "1987 births" is joined by "1980s births" and "20th century births"
b) Get a better search, but keep the existing metadata. Develop a search system that can parse the existing categories, combine them in various ways, and spit out a list for the researcher.
(a) is quick and dirty, but has problems; it makes it harder to navigate the existing category system from the point of view of the normal reader. It also rapidly becomes unworkable if we start thinking about more complex searches than just "the members of all daughter categories" - imagine if we had "1980s births of folk singers" and "20th century births of folk singers" and all the other possible intersections of the already-existing categories being added to pages! There are just so many possible category intersections for any given page... this really won't scale.
(b) is elegant, and scales well. It has the major advantage of being completely disassociated from the metadata, unlike our existing 'search' method, as it's overlaid on top; it doesn't impact in any way the existing system beyond perhaps making us streamline and rationalise it a little.
Unfortunately, it needs someone to go write it. This is what's holding things up.
All I am asking, in this age of creative technology, is that someone step up and create a system that can meet the needs of as many of our readers as possible.
And all I'm asking is that if we're wanting to create a system, we create a system, we don't try to make the existing one sort-of-useful for everyone at the price of making it not-really-useful to anyone :-)
--
- Andrew Gray andrew.gray@dunelm.org.uk
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: http://lists.wikimedia.org/mailman/listinfo/wikien-l
on 5/2/07 1:54 PM, Andrew Gray at shimgray@gmail.com wrote:
On 02/05/07, Marc Riddell michaeldavid86@comcast.net wrote:
Perhaps this is why we are having a problem finding common ground in this discussion. I am looking at WP Categorization strictly from the POV of a reader.
...from the point of view of a high-level specialised reader who wants to do somewhat complex searches on datasets, though, not from the point of view of the average user. Very different beasts.
I came to your library, went to your catalogue system, and found it did not meet my needs as a researcher. You, as the librarian, asked me what my needs were, and I told you. Your answer seems to be that we cannot meet your needs without revamping the entire system. I say: OK, I'll wait :-) - as long as you are serious about doing so.
My problem is this: we cannot revamp the entire system to meet your needs, in the way you suggest, *without breaking it for everyone else*; we cannot start increasing the number and size of categories without making them much harder to use as a navigational tool.
Basically, there's a division here between metadata and search. Categories are metadata, applied to the individual articles. Opening a category and looking at it is a very crude form of search, surprising though it may seem at a first glance; it tells you all the articles listed under that particular heading.
You want to do a more advanced form of search, on a larger scale than simple navigation, producing results like "all rivers in Eurasia" or "all people who were born in the 20th century". There are two ways to do this:
a) Add more metadata, but keep the existing 'search'. Start slapping on more categories onto articles, so that - to pick an example - "1987 births" is joined by "1980s births" and "20th century births"
b) Get a better search, but keep the existing metadata. Develop a search system that can parse the existing categories, combine them in various ways, and spit out a list for the researcher.
(a) is quick and dirty, but has problems; it makes it harder to navigate the existing category system from the point of view of the normal reader. It also rapidly becomes unworkable if we start thinking about more complex searches than just "the members of all daughter categories" - imagine if we had "1980s births of folk singers" and "20th century births of folk singers" and all the other possible intersections of the already-existing categories being added to pages! There are just so many possible category intersections for any given page... this really won't scale.
(b) is elegant, and scales well. It has the major advantage of being completely disassociated from the metadata, unlike our existing 'search' method, as it's overlaid on top; it doesn't impact in any way the existing system beyond perhaps making us streamline and rationalise it a little.
Unfortunately, it needs someone to go write it. This is what's holding things up.
All I am asking, in this age of creative technology, is that someone step up and create a system that can meet the needs of as many of our readers as possible.
And all I'm asking is that if we're wanting to create a system, we create a system, we don't try to make the existing one sort-of-useful for everyone at the price of making it not-really-useful to anyone :-)
Andrew,
Thank you for all of this. I'm learning.
If the right persons, with the right know-how, really believe a new system is needed, and are committed to creating one - I believe we'll have one in time.
Hey, I'm the long-haired (literally) psych researcher, with his mind in someone else's, needing some outside information; knocking on the library door saying "I need to find some stuff, can you help me locate it?"
Marc
On 5/1/07, Marc Riddell michaeldavid86@comcast.net wrote:
Andrew,
Do you, personally, use the current WP categorization system?
To answer a question not addressed to me, I use it constantly, but they are all template driven date-based work queues or image license categories.
In terms of using it to find content, no of course not. Does anyone? Maybe a hypothetical curious reader with a lot of time on their hands whose goal is to find articles in a way that's a little less random than "random article". It would fit their needs nicely. :)
Judson [[:en:User:Cohesion]]
On 02/05/07, cohesion cohesion@sleepyhead.org wrote:
On 5/1/07, Marc Riddell michaeldavid86@comcast.net wrote:
Do you, personally, use the current WP categorization system?
To answer a question not addressed to me, I use it constantly, but they are all template driven date-based work queues or image license categories.
I do in fact use it to find articles related to an article I'm reading.
Usually I don't go further than the same category. I may occasionally go up or down a category.
- d.
On 5/1/07, Marc Riddell michaeldavid86@comcast.net wrote:
Andrew,
Do you, personally, use the current WP categorization system?
on 5/2/07 11:09 AM, cohesion at cohesion@sleepyhead.org wrote:
To answer a question not addressed to me, I use it constantly, but they are all template driven date-based work queues or image license categories.
Judson,
OK, that's fine. If you are getting what you need - that's what matters.
My needs are a bit different. Repeating (for the upteenth time :-)): Joe Smith died from Cancer; specifically, liver cancer. I want to be able to call up all of the persons who died from cancer, then call up all who died from liver cancer - two separate searches - two separate lists. As WP is constructed now, to do this, both the Categories "Cancer" and "Liver cancer" must be entered in Joe's Article. It's really not that complicated.
Sidebar: (You wrote, above "To answer a question not addressed to me") I like to address the person I am responding to by name because I feel it adds a touch of personal acknowledgment of that person. Your jumping in to add your view is terrific. Imagine we were in a room with a group of people and someone asked you a question. You are very probably going to turn to look at them and answer. I'm also in the room. When you are finished answering the person's question, if I wanted to add something, or give my own perspective to the question, I would jump right in and speak. I would be speaking to you directly, and you would most likely turn to look at me. Or, I may just speak to the group in general. That's how I see the group interaction on this List. I see the List as a continuous Community Meeting, with people coming and going, and each entering and exiting the various conversations in progress, or starting a new one; each being a part of what's being said, and each having some small stake in what's being discussed. That's Community to me.
Marc
On 5/2/07, Marc Riddell michaeldavid86@comcast.net wrote:
My needs are a bit different. Repeating (for the upteenth time :-)): Joe Smith died from Cancer; specifically, liver cancer. I want to be able to call up all of the persons who died from cancer, then call up all who died from liver cancer - two separate searches - two separate lists. As WP is constructed now, to do this, both the Categories "Cancer" and "Liver cancer" must be entered in Joe's Article. It's really not that complicated.
True, but if we did that, where would I look for articles about cancer in general? I'm thinking [[Paraneoplastic syndrome]], [[Radiation therapy]], [[Breast cancer]], [[Liver cancer]]. If we did this we'd be getting your [[Marlon Brando]] mixed in with my [[Metastasis]], and nobody wants that.
I think that the lesson is that 'as WP is constructed now' isn't sufficient to serve everyone.
I think that this sort of issue has been mentioned to you several times. Did you have any suggestion as to how this difficulty would be overcome, or are you only promoting that categories ought to work in the way that is most convenient for you? (Which is fine, of course: Quoth Adam Smith: "By pursuing his own interest he frequently promotes that of the society more effectually than when he really intends to promote it.")
Tracy Poff
On 5/2/07, Marc Riddell michaeldavid86@comcast.net wrote:
My needs are a bit different. Repeating (for the upteenth time :-)): Joe Smith died from Cancer; specifically, liver cancer. I want to be able to call up all of the persons who died from cancer, then call up all who died from liver cancer - two separate searches - two separate lists. As WP is constructed now, to do this, both the Categories "Cancer" and "Liver cancer" must be entered in Joe's Article. It's really not that complicated.
on 5/2/07 1:50 PM, Tracy Poff at tracy.poff@gmail.com wrote:
True, but if we did that, where would I look for articles about cancer in general? I'm thinking [[Paraneoplastic syndrome]], [[Radiation therapy]], [[Breast cancer]], [[Liver cancer]].
Tracy,
My error. I misspoke. What I meant to say was "both the Categories "Cancer deaths" and "Deaths from Liver cancer" must be entered in Joe's Article".
I think that the lesson is that 'as WP is constructed now' isn't sufficient to serve everyone.
It could be constructed to serve more persons than it does now. Persons who know a lot more than me about this have said so many times.
I think that this sort of issue has been mentioned to you several times. Did you have any suggestion as to how this difficulty would be overcome
If I had these suggestions, I would have offered them a long time ago. That's why I am reaching out to others who could.
or are you only promoting that categories ought to work in the way that is most convenient for you?
I am promoting a type of system that can be the most useful to the most people. I voiced what I'm dissatisfied with; someone else can do exactly the same. That's how it's supposed to work
Marc Riddell
On 5/2/07, Marc Riddell michaeldavid86@comcast.net wrote:
Judson,
OK, that's fine. If you are getting what you need - that's what matters.
My needs are a bit different. Repeating (for the upteenth time :-)): Joe Smith died from Cancer; specifically, liver cancer. I want to be able to call up all of the persons who died from cancer, then call up all who died from liver cancer - two separate searches - two separate lists. As WP is constructed now, to do this, both the Categories "Cancer" and "Liver cancer" must be entered in Joe's Article. It's really not that complicated.
Sorry for my sarcasm in comparing categories to random article, I was actually agreeing with you, that categories are only useful for things like date-based work queues, which I assume was not the reason they were made.
I think for what you want they aren't that useful. And I think that MANY MORE people would use it in the way you want than in the way I'm using it. Currently they are useful for wikipedians doing specific workflows. Ideally they could be used for actually getting useful information out of wikipedia.
I will preface this all with saying I don't take much part in the on-wiki category discussions, but here's how I would fix it:
1. Forget any concept of hierarchy (except when obviously useful) It's never going to be a large scale hierarchy, so don't try. 2. Stop worrying about having too many categories on a particular article. Maybe every single person *should* be in [[Category:People]] why not?
That or use semantic mediawiki and say [[died from::cancer]] in the article, but no one likes that idea for some reason.
Judson [[:en:User:Cohesion]]
On 02/05/07, cohesion cohesion@sleepyhead.org wrote:
I will preface this all with saying I don't take much part in the on-wiki category discussions, but here's how I would fix it:
- Forget any concept of hierarchy (except when obviously useful) It's
never going to be a large scale hierarchy, so don't try. 2. Stop worrying about having too many categories on a particular article. Maybe every single person *should* be in [[Category:People]] why not?
What you've done here is basically create a tag system with the ability to have tags relate to each other. Which is good and funky and I'd use it too, but... well, we could just create a seperate one rather than tearing down the existing categories. Duplicate the category code, change it a bit, implement a new namespace, fiddle around with the CSS, Bob's your uncle.
That or use semantic mediawiki and say [[died from::cancer]] in the article, but no one likes that idea for some reason.
This always struck me as like a dentist complaining no-one ever wants to replace all their teeth with undecayable plastic ones - it's so much better! But they all complain about the inconvenience... ;-)
On 5/2/07, Andrew Gray shimgray@gmail.com wrote:
What you've done here is basically create a tag system with the ability to have tags relate to each other. Which is good and funky and I'd use it too, but... well, we could just create a seperate one rather than tearing down the existing categories. Duplicate the category code, change it a bit, implement a new namespace, fiddle around with the CSS, Bob's your uncle.
Sounds great :)
That or use semantic mediawiki and say [[died from::cancer]] in the article, but no one likes that idea for some reason.
This always struck me as like a dentist complaining no-one ever wants to replace all their teeth with undecayable plastic ones - it's so much better! But they all complain about the inconvenience... ;-)
But think of it from the dentist's point of view! :D I realize I'm in the minority on this, and appreciate that input. I lean towards tech solutions for most things, and am sure if everyone was like me wikipedia would be a much smaller crappier place. :) Incidentally that analogy is one of the funnier things I have read today :)
On 03/05/07, cohesion cohesion@sleepyhead.org wrote:
On 5/2/07, Andrew Gray shimgray@gmail.com wrote:
What you've done here is basically create a tag system with the ability to have tags relate to each other. Which is good and funky and I'd use it too, but... well, we could just create a seperate one rather than tearing down the existing categories. Duplicate the category code, change it a bit, implement a new namespace, fiddle around with the CSS, Bob's your uncle.
Sounds great :)
I am deeply suspicious that it should be that conceptually simple - I must have skipped something. Any bored developers want to try hacking up a MediaWiki tagging system?
On 5/2/07, Marc Riddell michaeldavid86@comcast.net wrote:
Judson,
OK, that's fine. If you are getting what you need - that's what matters.
My needs are a bit different. Repeating (for the upteenth time :-)): Joe Smith died from Cancer; specifically, liver cancer. I want to be able to call up all of the persons who died from cancer, then call up all who died from liver cancer - two separate searches - two separate lists. As WP is constructed now, to do this, both the Categories "Cancer" and "Liver cancer" must be entered in Joe's Article. It's really not that complicated.
on 5/2/07 2:00 PM, cohesion at cohesion@sleepyhead.org wrote:
Sorry for my sarcasm in comparing categories to random article, I was actually agreeing with you, that categories are only useful for things like date-based work queues, which I assume was not the reason they were made.
I think for what you want they aren't that useful. And I think that MANY MORE people would use it in the way you want than in the way I'm using it. Currently they are useful for wikipedians doing specific workflows. Ideally they could be used for actually getting useful information out of wikipedia.
I will preface this all with saying I don't take much part in the on-wiki category discussions, but here's how I would fix it:
- Forget any concept of hierarchy (except when obviously useful) It's
never going to be a large scale hierarchy, so don't try. 2. Stop worrying about having too many categories on a particular article. Maybe every single person *should* be in [[Category:People]] why not?
That or use semantic mediawiki and say [[died from::cancer]] in the article, but no one likes that idea for some reason.
Judson,
No sarcasm taken :-). I welcomed your input, and thank you for this. I believe there are many more persons out there frustrated with the present Category System. I have read this on many a Talk Page. I'm just trying to persuade them to speak up in a forum where they can be heard by the most persons. My needs are very specific, and others have different ones. Many of these needs are not being met by the present system. I hope all of our needs are heard by those that have the ability to meet them.
Marc
On 5/1/07, Andrew Gray shimgray@gmail.com wrote:
On 30/04/07, charles.r.matthews@ntlworld.com charles.r.matthews@ntlworld.com wrote:
Marc Riddell wrote
This is what I am asking for in WP. That is why both the main and sub categories need to be entered into each Article.
I recall you saying all this some time ago, a propos your own view of
what
would be convenient. I'm not clear this actually convenient for most
users
of Wikipedia, i.e. to have large chunks of nested categories made
explicit.
Can you not just accept that the system doesn't revolve about your
needs?
I am going to be blunt and agree here; I don't think "the category system is fundamentally broken" can reasonably come from "because it doesn't work the way I think it ought". My subtle attempts to intimate this don't seem to have worked, and I want to be clear that I think we're all barking up the wrong tree. There are flaws in our category system, but this is not one of them.
This is the basic issue here. Marc thinks categories should work in a way that conforms to his expectations - essentially, an undifferentiated list of all things with attribute X. Currently, categories are differentiated lists - topics split up into smaller sublists with each page hopefully only appearing once in any given topic-group.
If we change the current system, things will be convenient for Marc; it will be more useful as a database. However, my experience is that most of our readers aren't looking for a database - they're looking for relatively focused, specific, categorisation for navigational purposes, where a tightly topical category of 20-50 articles is substantially more useful than a grand supercategory of 2000-5000.
(Yes, we could have the tight topical categories in Marc's model - but at the cost of swamping pages with references to a huge number of categories which are redundant to one degree or another, and just make navigation that much more tricky for the user)
Somewhere down thread, the mystical expertise of librarians was invoked. I am one, and I belive that tight categorisation is the way to go. I feel that Marc's model, if implemented in the simple quick-fix method of "just include parent and child categories in the same article" will actually make our categorisation less useful for general readers and editors, which in no way justifies the limited benefit of being able to do fancy searches on articles by subject attributes.
We can get search in other ways; improving the methods we use to search, having some pseudo-database functions we can do with categories, would go a long way towards the desired effect. Categorisation is, however, used by our readers, and we shouldn't break it without a very pressing reason.
--
- Andrew Gray
andrew.gray@dunelm.org.uk
I personally don't think it's broken *because *it doesn't work the way I want it to work. I think it's broken because no-one can explain to me how it works, and I've received at least two entirely contradictory answers about how it works every single time I've asked.
However, my experience is that most of our readers aren't looking for a database - they're looking for relatively focused, specific, categorisation for navigational purposes, where a tightly topical category of 20-50 articles is substantially more useful than a grand supercategory of 2000-5000.
A question I've asked a million things, then how do you categorize things that come in already created categories that have more than 50 members? For example, plant families can't be categories, because there are too many with over 50 members, horticultural varieties of a specific species cannot be categories because you can't have more than 50ish members, the "substantially useful" size of a category. Varieties of sage should be broken up precisely how to conform to the category scheme, and doesn't this wind up being original research when dealing with organism categories?
I'd be generally fine with any coherent scheme, because it simply could not be as frustrating as no one knowing how the current scheme works.
KP
K P wrote:
A question I've asked a million things, then how do you categorize things that come in already created categories that have more than 50 members? For example, plant families can't be categories, because there are too many with over 50 members, horticultural varieties of a specific species cannot be categories because you can't have more than 50ish members, the "substantially useful" size of a category. Varieties of sage should be broken up precisely how to conform to the category scheme, and doesn't this wind up being original research when dealing with organism categories?
50 members is not a hard limit, just like 32kb is not a hard limit on article size. If there's a large category that cannot be reasonably subdivided, then it cannot be subdivided and that's that. A very large category is still more useful than one that's been arbitrarily split up or that doesn't exist at all.
In the case of plant families, though, I don't see why they can't be divided into genera subcategories should the need arise.
On 02/05/07, K P kpbotany@gmail.com wrote:
However, my experience is that most of our readers aren't looking for a database - they're looking for relatively focused, specific, categorisation for navigational purposes, where a tightly topical category of 20-50 articles is substantially more useful than a grand supercategory of 2000-5000.
A question I've asked a million things, then how do you categorize things that come in already created categories that have more than 50 members? For example, plant families can't be categories, because there are too many with over 50 members, horticultural varieties of a specific species cannot be categories because you can't have more than 50ish members, the "substantially useful" size of a category. Varieties of sage should be broken up precisely how to conform to the category scheme, and doesn't this wind up being original research when dealing with organism categories?
No, no, no, no. "20-50" was an arbitrary number I pulled out of the air - it's just saying that, from a usability perspective, a category of a few dozen is better than a category of a few thousand. This is not an attempt to legislate size!
We shouldn't be *artificially* subdividing categories - we will always have some unwieldy categories, ones which can't fundamentally be broken down any easier. There is nothing wrong with having the occasional overly-large category, as long as there is a reason for that - the reason here being "it would not be helpful to subdivide further".
Categorise on the most granular scale that is useful and practical. If you can't usefully divide a category below a thousand members, then leave it with a thousand members - but most categories can, and should, be broken down well before you reach that point.
On 5/2/07, Andrew Gray shimgray@gmail.com wrote:
On 02/05/07, K P kpbotany@gmail.com wrote:
However, my experience is that most of our readers aren't looking for a database - they're looking for relatively focused, specific, categorisation for navigational purposes, where a tightly topical category of 20-50 articles is substantially more useful than a grand supercategory of 2000-5000.
A question I've asked a million things, then how do you categorize
things
that come in already created categories that have more than 50 members?
For
example, plant families can't be categories, because there are too many
with
over 50 members, horticultural varieties of a specific species cannot be categories because you can't have more than 50ish members, the "substantially useful" size of a category. Varieties of sage should be broken up precisely how to conform to the category scheme, and doesn't
this
wind up being original research when dealing with organism categories?
No, no, no, no. "20-50" was an arbitrary number I pulled out of the air - it's just saying that, from a usability perspective, a category of a few dozen is better than a category of a few thousand. This is not an attempt to legislate size!
We shouldn't be *artificially* subdividing categories - we will always have some unwieldy categories, ones which can't fundamentally be broken down any easier. There is nothing wrong with having the occasional overly-large category, as long as there is a reason for that - the reason here being "it would not be helpful to subdivide further".
Categorise on the most granular scale that is useful and practical. If you can't usefully divide a category below a thousand members, then leave it with a thousand members - but most categories can, and should, be broken down well before you reach that point.
--
- Andrew Gray
andrew.gray@dunelm.org.uk
Unfortunately, you're a librarian and you see and think through the obvious this way, but try telling that sometimes to the folks at Commons, where, if it doesn't fit on a single article page, it CANNOT be a category.
Yes, families can be broken down alphabetically, but why should they, if there is no relationship among members solely because of their place in the alphabet? There's no need to fear ignorance in categorizing plants, as botanists write taxonomy in Latin, so even other botanists are ignorant of what is going on and must repeatedly consult codes.
If you break down a family into genera, then you can wind up with something from one of the big families, where you have 5 genera with 100 members each and a couple of thousand genera or categories with only one member each, again, you haven't done anything useful.
See what happens when you add a professional into the mix to discuss their area, though, instead of letting amateurs try to group think their way out of it? It's simple, there wind up being some huge categories according to the need, but most are more discrete. But that's not what happens on Wikipedia, or Wikimedia Commns, where I've been waiting 2 weeks now for someone to explain categories to me, after they spent the past few months beating me up verbally for not using them their way-of-the-day-of-a-particular-editor/administrator.
The other issue, though, is, do the Wikipedia users use categories to find information? IF this is the case, then they might be built differently from how they would be if they were only internally used by editors. As the categories are listed on the article page, I suspect this is the intention, but I get argued down on this, no one should ever categorize something for the use of the reader, again, especially in Commons, but also in Wikipedia, categories don't exist for users. Then why display them in article space?
KP
On 02/05/07, K P kpbotany@gmail.com wrote:
Categorise on the most granular scale that is useful and practical. If you can't usefully divide a category below a thousand members, then leave it with a thousand members - but most categories can, and should, be broken down well before you reach that point.
Unfortunately, you're a librarian and you see and think through the obvious this way, but try telling that sometimes to the folks at Commons, where, if it doesn't fit on a single article page, it CANNOT be a category.
Categorisation on Commons is an *entirely* different kettle of fish, one which as far as I am aware is in flux right now, and one I don't even begin to try to pretend to understand. Perhaps this might be better asked to commons-l?
The other issue, though, is, do the Wikipedia users use categories to find information? IF this is the case, then they might be built differently from how they would be if they were only internally used by editors. As the categories are listed on the article page, I suspect this is the intention, but I get argued down on this, no one should ever categorize something for the use of the reader, again, especially in Commons, but also in Wikipedia, categories don't exist for users. Then why display them in article space?
On Wikipedia, they exist for readers. I see readers using them. As a reader, I use them (occasionally).
On Commons, the entire concept of what a user is, what a user is looking for, is different. I wouldn't like to try to insist both work in the same way.
On 02/05/07, Andrew Gray shimgray@gmail.com wrote:
Categorisation on Commons is an *entirely* different kettle of fish, one which as far as I am aware is in flux right now, and one I don't even begin to try to pretend to understand. Perhaps this might be better asked to commons-l?
The category tree approach really doesn't work for Commons. Commons seriously needs a tag system with complex queries.
It also needs a search that doesn't suck. The new Mayflower search is like AltaVista compared to Lycos; no-one's managed the Google "search that doesn't suck" breakthrough.
- d.
On 5/2/07, Andrew Gray shimgray@gmail.com wrote:
On 02/05/07, K P kpbotany@gmail.com wrote:
Categorise on the most granular scale that is useful and practical. If you can't usefully divide a category below a thousand members, then leave it with a thousand members - but most categories can, and should, be broken down well before you reach that point.
Unfortunately, you're a librarian and you see and think through the
obvious
this way, but try telling that sometimes to the folks at Commons, where,
if
it doesn't fit on a single article page, it CANNOT be a category.
Categorisation on Commons is an *entirely* different kettle of fish, one which as far as I am aware is in flux right now, and one I don't even begin to try to pretend to understand. Perhaps this might be better asked to commons-l?
The other issue, though, is, do the Wikipedia users use categories to
find
information? IF this is the case, then they might be built differently
from
how they would be if they were only internally used by editors. As the categories are listed on the article page, I suspect this is the
intention,
but I get argued down on this, no one should ever categorize something
for
the use of the reader, again, especially in Commons, but also in
Wikipedia,
categories don't exist for users. Then why display them in article
space?
On Wikipedia, they exist for readers. I see readers using them. As a reader, I use them (occasionally).
On Commons, the entire concept of what a user is, what a user is looking for, is different. I wouldn't like to try to insist both work in the same way.
--
- Andrew Gray
andrew.gray@dunelm.org.uk
(I can't convince anyone in Commons of that, that they use categories differently from how they're used at Wikipedia, and there is no point of Commons-l, because no one at Commons knows what is going on. However, that is a whole 'nother kettly of fish.)
So, back to Wikipedia. If they exist for readers, how they are used and created is different from how they are used and created if they are only for meta data.
And all I'm asking is that if we're wanting to create a system, we create a system, we don't try to make the existing one sort-of-useful for everyone at the price of making it not-really-useful to anyone :-)
I agree with this. Still, I don't think Marc can conceive of just how big this is. I've worked on data bases, writing them, small ones, tiny ones, miniscule one topic ones, and this is a huge thing to ask of a group of volunteers, in my opinion.
However, I do believe that if you created a system, whatever it was, it would be much more useful to everyone than the exising non-system, simply because it was structured and had systematic design. Nothing we use on Wikipedia right now fits this.
KP
K P wrote:
If you break down a family into genera, then you can wind up with something from one of the big families, where you have 5 genera with 100 members each and a couple of thousand genera or categories with only one member each, again, you haven't done anything useful.
I expect the way this would be done according to existing common practice on Wikipedia would be to create subcategories for those five genera with lots of members, and then the remaining thousand species that each belong to their own genera would remain under the root family category. People don't generally create categories that will only ever hold one or two articles, there's no point.
That's assuming all of those species even get articles, of course.
Bryan Derksen wrote:
K P wrote:
If you break down a family into genera, then you can wind up with something from one of the big families, where you have 5 genera with 100 members each and a couple of thousand genera or categories with only one member each, again, you haven't done anything useful.
I expect the way this would be done according to existing common practice on Wikipedia would be to create subcategories for those five genera with lots of members, and then the remaining thousand species that each belong to their own genera would remain under the root family category. People don't generally create categories that will only ever hold one or two articles, there's no point.
I'm sure we have many category specialists who are not easily deterred by the pointlessness of their efforts. :-)
Ec
Ray Saintonge wrote:
Bryan Derksen wrote:
I expect the way this would be done according to existing common practice on Wikipedia would be to create subcategories for those five genera with lots of members, and then the remaining thousand species that each belong to their own genera would remain under the root family category. People don't generally create categories that will only ever hold one or two articles, there's no point.
I'm sure we have many category specialists who are not easily deterred by the pointlessness of their efforts. :-)
True, but this does fortunately describe the most common practice. Usually a category will have sub-categories for anything "common enough to be worth it", and less common stuff will go in the parent category, as sort of a catch-all "other stuff that goes here". For example, there are lots of people from Dresden, so we have a [[Category:People from Dresden]] subcategory of [[Category:People from Saxony]]. But people from small towns in Saxony just go in the main People from Saxony category; there's no [[Category:People from Zörbig]] subcategory with [[Johann Jakob Reiske]] as the sole member.
-Mark
On 04/05/07, Delirium delirium@hackish.org wrote:
For example, there are lots of people from Dresden, so we have a [[Category:People from Dresden]] subcategory of [[Category:People from Saxony]]. But people from small towns in Saxony just go in the main People from Saxony category; there's no [[Category:People from Zörbig]] subcategory with [[Johann Jakob Reiske]] as the sole member.
Not really related to your direct point here, but this reminds me. At present, the category system seems to imply[1] that people from Dresden are also Alpine countries and categories requiring diffusion. It seems to me that we need a mechanism to differentiate between direct inheritance (A is an instance of B) and relatedness (A is within group B but is not an instance of B itself).
I guess this is a roundabout way of saying I don't think this will be worked out until we get Semantic MediaWiki.
[1] http://tools.wikimedia.de/~dapete/catgraph/graph.php?wiki=wikipedia&lang...
On 04/05/07, Earle Martin wikipedia@downlode.org wrote:
I guess this is a roundabout way of saying I don't think this will be worked out until we get Semantic MediaWiki.
Until we come up with a Semantic MediaWiki syntax that's intuitively usable by mere mortals.
- d.
On 04/05/07, David Gerard dgerard@gmail.com wrote:
On 04/05/07, Earle Martin wikipedia@downlode.org wrote:
I guess this is a roundabout way of saying I don't think this will be worked out until we get Semantic MediaWiki.
Until we come up with a Semantic MediaWiki syntax that's intuitively usable by mere mortals.
Well, yes.
Also, thinking about it, I came to the conclusion that what we need is a separate editable ontology that defines the relationships between classes of item. It could probably be kept in its own namespace to leave that sort of thing well out of the view of those selfsame mere mortals.
Er, maybe this is the wrong mailing list for me to be wibbling about this sort of thing, sorry.
On 5/4/07, David Gerard dgerard@gmail.com wrote:
On 04/05/07, Earle Martin wikipedia@downlode.org wrote:
I guess this is a roundabout way of saying I don't think this will be worked out until we get Semantic MediaWiki.
Until we come up with a Semantic MediaWiki syntax that's intuitively usable by mere mortals.
I don't buy this.
First, this is the syntax:
George Washington was an [[nationality::American]] President.
That is usable by a good number of people. Secondly, that example is doing it the hard way. The current infoboxes could be changed so that they include this information when the user uses them in exactly the same way they do now. Currently the article [[George Washington]] has a line in the infobox
| nationality=American
This could remain exactly the same and have the template changed to include this as a semantic statement.
My biggest objection is that we should wait until things are easy in general. Not everything has to be easy. Not everyone has to worry about the semantic aspects at all. It we waited until parserfunctions were easy to use would we have the super useful templates we have today? I don't care if a new user can edit [[Template:Backlognav]] it is useful regardless.
http://en.wikipedia.org/w/index.php?title=Template:Backlognav&action=edi... or http://en.wikipedia.org/w/index.php?title=Template:Coord/input/dec&actio...
Templates using semantic mediawiki would be drastically less complicated than many of the templates using parserfunctions we use today. If the servers can't handle the increased load that some of the queries might have that's one thing, and maybe we should think about disabling that aspect, but assuming everything has to be dead simple before it can be useful holds up progress.
Judson [[:en:User:Cohesion]]
Bryan Derksen wrote:
K P wrote:
If you break down a family into genera, then you can wind up with something from one of the big families, where you have 5 genera with 100 members each and a couple of thousand genera or categories with only one member each, again, you haven't done anything useful.
I expect the way this would be done according to existing common practice on Wikipedia would be to create subcategories for those five genera with lots of members, and then the remaining thousand species that each belong to their own genera would remain under the root family category. People don't generally create categories that will only ever hold one or two articles, there's no point.
on 5/3/07 9:16 PM, Ray Saintonge at saintonge@telus.net wrote:
I'm sure we have many category specialists who are not easily deterred by the pointlessness of their efforts. :-)
:-)
Marc
On 5/2/07, Bryan Derksen bryan.derksen@shaw.ca wrote:
K P wrote:
If you break down a family into genera, then you can wind up with
something
from one of the big families, where you have 5 genera with 100 members
each
and a couple of thousand genera or categories with only one member each, again, you haven't done anything useful.
I expect the way this would be done according to existing common practice on Wikipedia would be to create subcategories for those five genera with lots of members, and then the remaining thousand species that each belong to their own genera would remain under the root family category. People don't generally create categories that will only ever hold one or two articles, there's no point.
That's assuming all of those species even get articles, of course.
Then you've articficially, uniquely and originally to Wikipedia created groups that don't exist elsewhere. Is this acceptable, that you group organisms in ways they aren't already grouped, and when do you do it, and when not? The problem is that these genera are probably already sorted in some level of groups, tribes or other, that aren't necessarily used. So, what if two of the big genera belong to one group, but they've been given their own category, and some of the singular genera, belong in various other groups, but have been grouped together, artificially, and originally, by Wikipedia editors?
The problem is that categories in taxonomy mean something, whether Linnaean or phylogenetic. Nature didn't sort them like Wikipedia wants them, in nice tidy groups. When you group organisms you are implying that they belong together for some reason, such as they are evolutionarily closer to each other than to members of other groups. Any time you use a classification system based on something else, you can't extrapolate a different type of classification system into what you are doing. If organisms are categorized according to taxonomical systems, then Wikipedia editors can't come in and, because of the convenience of or need for categorization, add a layer of unrelated groupings to the system.
In botany we create categories all the time that will only ever hold one or two articles simply because of this, we categorize by families into orders, and some orders have many families, others only one, we classify genera into families, and some families have 20,000 genera, others only one.
Taxonomical systems group organisms based upon morphological similarities or upon evolutionary relationships. Nature didn't order evolution by numbers, only 10 allowed here, 20 there.
So, either we use existing taxonomical systems, and then Wikipedia has to commit to not altering them (what the librarian suggested so readily), or we don't use existing taxonomical systems of categorizing organisms and simply make up our own original system. But what we can't use part taxonomy and part something we use to accomodate categorization on Wikipedia.
KP
K P wrote:
On 5/2/07, Bryan Derksen bryan.derksen@shaw.ca wrote:
K P wrote:
If you break down a family into genera, then you can wind up with
something
from one of the big families, where you have 5 genera with 100 members
each
and a couple of thousand genera or categories with only one member each, again, you haven't done anything useful.
I expect the way this would be done according to existing common practice on Wikipedia would be to create subcategories for those five genera with lots of members, and then the remaining thousand species that each belong to their own genera would remain under the root family category. People don't generally create categories that will only ever hold one or two articles, there's no point.
That's assuming all of those species even get articles, of course.
Then you've articficially, uniquely and originally to Wikipedia created groups that don't exist elsewhere.
No? I'm not entirely sure what you're saying here, but if my interpretation is correct (that it's "original research" to create subcategories for genera that have lots of members but not those that have only a few members) then I simply disagree. It's not original research to note that genera W has a hundred articles on its members while X, Y and Z have only one each, and create a category structure that has a subcategory for W but not X, Y and Z.
Is this acceptable, that you group organisms in ways they aren't already grouped, and when do you do it, and when not?
Wikipedia's category system has all manner of groupings that don't appear to have been done by other sources, so yes, it seems it's acceptable.
The problem is that these genera are probably already sorted in some level of groups, tribes or other, that aren't necessarily used. So, what if two of the big genera belong to one group, but they've been given their own category, and some of the singular genera, belong in various other groups, but have been grouped together, artificially, and originally, by Wikipedia editors?
Now I'm really not sure what you're saying. If a species is grouped in multiple different ways by scientists, why not have multiple different category structures to hold them in? That's done frequently on Wikipedia as well, for example articles on asteroids are categorized both by spectral class and by orbital characteristics.
The problem is that categories in taxonomy mean something, whether Linnaean or phylogenetic. Nature didn't sort them like Wikipedia wants them, in nice tidy groups. When you group organisms you are implying that they belong together for some reason, such as they are evolutionarily closer to each other than to members of other groups. Any time you use a classification system based on something else, you can't extrapolate a different type of classification system into what you are doing. If organisms are categorized according to taxonomical systems, then Wikipedia editors can't come in and, because of the convenience of or need for categorization, add a layer of unrelated groupings to the system.
But Wikipedia's categories were never _intended_ to exactly represent Linnaean or phylogenetic classifications. They're a way of grouping _articles_, not animals or rocks or what have you. It's only because in this case the articles are each about a specific type of animal or rock that the category structure winds up mirroring the other classification systems so closely.
In botany we create categories all the time that will only ever hold one or two articles simply because of this, we categorize by families into orders, and some orders have many families, others only one, we classify genera into families, and some families have 20,000 genera, others only one.
Taxonomical systems group organisms based upon morphological similarities or upon evolutionary relationships. Nature didn't order evolution by numbers, only 10 allowed here, 20 there.
So, either we use existing taxonomical systems, and then Wikipedia has to commit to not altering them (what the librarian suggested so readily), or we don't use existing taxonomical systems of categorizing organisms and simply make up our own original system. But what we can't use part taxonomy and part something we use to accomodate categorization on Wikipedia.
I think what we've done is make up our own system that happens to be based closely on existing taxonomy, with modifications that make it more convenient for its function of grouping encyclopedia articles. Why can't we do this? It's already widely implemented.