Hi Research-l,
My impression is that volunteers on Commons and ENWP spend a lot of time on categorization. I have seen references to analyses of how categorization is done, but I can't recall seeing an analysis of how much use readers make of categories on Commons and ENWP. My guess is that readers often use categories on Commons for media searches, but that ENWP categories are rarely used by readers, although maybe WMF Discovery uses categories to inform search results. Is there data that shows how extensively readers on ENWP and Commons use categories?
Thanks,Pine ( https://meta.wikimedia.org/wiki/User:Pine )
Hi Pine,
On Wed, May 23, 2018 at 9:46 PM, Pine W wiki.pine@gmail.com wrote:
Hi Research-l,
My impression is that volunteers on Commons and ENWP spend a lot of time on categorization. I have seen references to analyses of how categorization is done, but I can't recall seeing an analysis of how much use readers make of categories on Commons and ENWP. My guess is that readers often use categories on Commons for media searches, but that ENWP categories are rarely used by readers, although maybe WMF Discovery uses categories to inform search results. Is there data that shows how extensively readers on ENWP and Commons use categories?
I don't know of recent (or old) studies on this topic, but there are at least a few other things we know that can help you think about whether it's useful to work on the category network in different projects.
Categories are used by (at least) three different groups: * Editors * Readers * Machines
We don't know all the use-cases that categories have for these groups. It seems that generally editors use them to organize their work and make the article space more navigable, readers use them to explore content (in a more serendipitous way), and machines use them extensively for a variety of applications. [We do miss published work about what I just said, btw, and I really hope us or someone else writes more about it in the coming year or two.:)]
While we're trying to figure out what the exact answer for the two first groups are, it's helpful to think about the last group:
Wikipedia category network, with its known caveats, has been used extensively by researchers to build new insights and technologies. A lot of research on alignment of text across languages (which is in turn used in building dictionaries and automatic translation tools) takes advantage of this (for the most part) human curated categorization of articles. It's an important side-product of building the encyclopedia (and other projects). I'll give you a couple of examples (non-comprehensive), feel free to dig in the literature review of these papers for more:
* The usage of Wikipedia category network for telling apart classes from instances: https://dl.acm.org/authorize.cfm?key=N655914 (a necessary step in knowledge base creation)
* In building YAGO: http://www2007.wwwconference.org/papers/paper391.pdf
* Using Wikipedia category network for building section recommendation systems for Wikipedia: https://arxiv.org/pdf/1804.05995.pdf , Check for example, http://gapfinder.wmflabs.org/en.wikipedia.org/v1/section/article/Barack_Obam...
There is significant value in Wikipedia Category Network, I would not discourage editors from building it. I do hope they know what value this work brings to, at least, the research and scientific community.
Best, Leila
Thanks,Pine ( https://meta.wikimedia.org/wiki/User:Pine ) _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hello,
A very interesting question. From my experience and talks with readers, I have the impression that readers usually take no notice of the categories. I could not find out why, because the category system may be indeed useful for at least some use cases.
When it comes to Commons, I would be very interested to learn how many readers (or recipients) are actually non Wikipedia editors.
Kind regards Ziko
2018-05-24 19:09 GMT+02:00 Leila Zia leila@wikimedia.org:
Hi Pine,
On Wed, May 23, 2018 at 9:46 PM, Pine W wiki.pine@gmail.com wrote:
Hi Research-l,
My impression is that volunteers on Commons and ENWP spend a lot of time
on categorization. I have seen references to analyses of how categorization is done, but I can't recall seeing an analysis of how much use readers make of categories on Commons and ENWP. My guess is that readers often use categories on Commons for media searches, but that ENWP categories are rarely used by readers, although maybe WMF Discovery uses categories to inform search results. Is there data that shows how extensively readers on ENWP and Commons use categories?
I don't know of recent (or old) studies on this topic, but there are at least a few other things we know that can help you think about whether it's useful to work on the category network in different projects.
Categories are used by (at least) three different groups:
- Editors
- Readers
- Machines
We don't know all the use-cases that categories have for these groups. It seems that generally editors use them to organize their work and make the article space more navigable, readers use them to explore content (in a more serendipitous way), and machines use them extensively for a variety of applications. [We do miss published work about what I just said, btw, and I really hope us or someone else writes more about it in the coming year or two.:)]
While we're trying to figure out what the exact answer for the two first groups are, it's helpful to think about the last group:
Wikipedia category network, with its known caveats, has been used extensively by researchers to build new insights and technologies. A lot of research on alignment of text across languages (which is in turn used in building dictionaries and automatic translation tools) takes advantage of this (for the most part) human curated categorization of articles. It's an important side-product of building the encyclopedia (and other projects). I'll give you a couple of examples (non-comprehensive), feel free to dig in the literature review of these papers for more:
- The usage of Wikipedia category network for telling apart classes
from instances: https://dl.acm.org/authorize.cfm?key=N655914 (a necessary step in knowledge base creation)
In building YAGO: http://www2007.wwwconference.org/papers/paper391.pdf
Using Wikipedia category network for building section recommendation
systems for Wikipedia: https://arxiv.org/pdf/1804.05995.pdf , Check for example, http://gapfinder.wmflabs.org/en.wikipedia.org/v1/section/ article/Barack_Obama
There is significant value in Wikipedia Category Network, I would not discourage editors from building it. I do hope they know what value this work brings to, at least, the research and scientific community.
Best, Leila
Thanks,Pine ( https://meta.wikimedia.org/wiki/User:Pine ) _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Ziko van Dijk, 24/05/2018 23:08:
When it comes to Commons, I would be very interested to learn how many readers (or recipients) are actually non Wikipedia editors.
It would be useful to consider less common but high value usage, for instance people looking for illustrations for a publication. Such searches could be substitutes for specialised (and expensive) databases, so the value provided by Commons categories may be higher than the mere usage numbers suggest. (It should be measured in hours saved or something like that.)
Federico
I do outreach including training. From that, I am inclined to agree that readers don’t use categories. People who come to edit training are (unsurprisingly) generally already keen readers of Wikipedia, but categories seem to be something they first learn about in edit training. Indeed, one of my outreach offerings is just a talk about Wikipedia, which includes tips for getting more out of the reader experience, like categories, What Links Here, and lots of thing that are in plain view on the standard desktop interface but people aren't looking there.
Also many categories exist in parallel with List-of articles and navboxes, which do more-or-less-but-not-exactly the same thing. It may be that readers are more likely to stumble on the lists or see the navbox entries (particularly if the navbox renders open). But all in all, I still think most readers enter Wikipedia via search engines and then progress further through Wikipedia by link clicking and using the Wikipedia search box as their principal navigation tools.
Editors use categories principally to increase their edit count (cynical but it's hard to think otherwise given what I see on my watchlist); there's an awful lot of messing about with categories for what seems to be very little benefit to the reader (especially as readers don't seem to use them). And with a lack of obvious ways to intersect categories (petscan is wonderful but neither readers nor most editor know about it) an leads to the never-ending creation of cross-categorisation like
https://en.wikipedia.org/wiki/Category:19th-century_Australian_women_writers
which is pretty clearly the intersection of 4 category trees that probably should be independent: nationality, sex, occupation, time frame. Sooner or later it will inevitably be further subcategorised into
1870s British-born-Australian cis-women poets
First-Monday-in-the-month Indian-born Far-North-Queensland cis-women-with-male-pseudonym romantic-sonnet-poets :-)
Obviously categories do have some uses to editors. If you have a source that provides you with some information about some aspect of a group of topics, it can be useful to work your way through each of the entries in the category updating it accordingly.
Machines. Yes, absolutely. I use AWB and doing things across a category (and the recursive closure of a category) is my primary use-case for AWB. My second use-case for AWB I use a template-use (template/infobox use is a de-facto category and indeed is a third thing that often parallels a category but unlike lists and navboxes, this form is invisible to the reader).
With Commons, again, I don't think readers go there, most haven't even heard of it. It's mainly editors at work there and I think they do use categories. The category structure seems to grow there more organically. There is not the constant "let's rename this category worldwide" or the same level of cross-categorisation on Commons that I see on en.Wikipedia.
I note that while we cannot know who is using categories, we can still get page count stats for the category itself. These tend to be close to 0-per-day for a lot of categories (e.g. Town halls in Queensland). Even a category that one might think has much greater interest get relatively low numbers, e.g. "Presidents of the United States" gets 26-per-day views on average. This compares with 37K daily average for the Donald Trump article, 19K for Barack Obama, and 16K for George Washington. So this definitely suggests that the readers who presumably make up the bulk of the views on the presidential articles are not looking at the obvious category for such folk (although they might be moving between presidential articles using by navboxes, succession boxes, lists or other links). Having said that, the Donald Trump article has *53* categories of which Presidents of the United States is number 39 (they appear to be alphabetically ordered), so it is possible that the reader never found the presidential category which is lost in a sea of categories like "21st century Presbyterians" and "Critics of the European Union". I would really have thought that being in the category Presidents of the USA was a slightly more important to the topic of the article than his apparent conversion to Presbyterianism in the 21st century (given he's not categorised as a 20th century Presbyterian).
And, somewhat amazingly, there is no apparent category for "Critics of Donald Trump". I must propose it, along with a fully diffused sub-cat system of Critics of Donald Trump's immigration policies, Critics of Donald Trump's hair, etc. By the time I've add all the relevant articles to those categories, I should have at least another 100K edits to my name!
Kerry
-----Original Message----- From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Federico Leva (Nemo) Sent: Friday, 25 May 2018 7:14 AM To: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org; Ziko van Dijk zvandijk@gmail.com Subject: Re: [Wiki-research-l] Reader use of Wikipedia and Commons categories
Ziko van Dijk, 24/05/2018 23:08:
When it comes to Commons, I would be very interested to learn how many readers (or recipients) are actually non Wikipedia editors.
It would be useful to consider less common but high value usage, for instance people looking for illustrations for a publication. Such searches could be substitutes for specialised (and expensive) databases, so the value provided by Commons categories may be higher than the mere usage numbers suggest. (It should be measured in hours saved or something like that.)
Federico
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
User of Interlang links and categories varies strongly with placement on the page. we used to be able to see this now clearly with the multiple popular skins. today we can perhaps see this best with the multiple apps and viewers. on wp mobile, surprisingly, readers don't use categories at all!
More seriously: this is a tremendously useful and underutilized slice of wiki knowledge, like the quality and completeness categories, which deserve to be made more visible.
@kerry I expect it isn't for edit count, it is for fixing a fast of knowledge that those editors find critically important (as I do!). yes we need something like petscan and intersection to be a standard aspect of on wiki search: this is precisely the sorry of use that good clean categorisation is good for!
categorically yours, sj
On Thu 24 May, 2018, 6:38 PM Kerry Raymond, kerry.raymond@gmail.com wrote:
I do outreach including training. From that, I am inclined to agree that readers don’t use categories. People who come to edit training are (unsurprisingly) generally already keen readers of Wikipedia, but categories seem to be something they first learn about in edit training. Indeed, one of my outreach offerings is just a talk about Wikipedia, which includes tips for getting more out of the reader experience, like categories, What Links Here, and lots of thing that are in plain view on the standard desktop interface but people aren't looking there.
Also many categories exist in parallel with List-of articles and navboxes, which do more-or-less-but-not-exactly the same thing. It may be that readers are more likely to stumble on the lists or see the navbox entries (particularly if the navbox renders open). But all in all, I still think most readers enter Wikipedia via search engines and then progress further through Wikipedia by link clicking and using the Wikipedia search box as their principal navigation tools.
Editors use categories principally to increase their edit count (cynical but it's hard to think otherwise given what I see on my watchlist); there's an awful lot of messing about with categories for what seems to be very little benefit to the reader (especially as readers don't seem to use them). And with a lack of obvious ways to intersect categories (petscan is wonderful but neither readers nor most editor know about it) an leads to the never-ending creation of cross-categorisation like
https://en.wikipedia.org/wiki/Category:19th-century_Australian_women_writers
which is pretty clearly the intersection of 4 category trees that probably should be independent: nationality, sex, occupation, time frame. Sooner or later it will inevitably be further subcategorised into
1870s British-born-Australian cis-women poets
First-Monday-in-the-month Indian-born Far-North-Queensland cis-women-with-male-pseudonym romantic-sonnet-poets :-)
Obviously categories do have some uses to editors. If you have a source that provides you with some information about some aspect of a group of topics, it can be useful to work your way through each of the entries in the category updating it accordingly.
Machines. Yes, absolutely. I use AWB and doing things across a category (and the recursive closure of a category) is my primary use-case for AWB. My second use-case for AWB I use a template-use (template/infobox use is a de-facto category and indeed is a third thing that often parallels a category but unlike lists and navboxes, this form is invisible to the reader).
With Commons, again, I don't think readers go there, most haven't even heard of it. It's mainly editors at work there and I think they do use categories. The category structure seems to grow there more organically. There is not the constant "let's rename this category worldwide" or the same level of cross-categorisation on Commons that I see on en.Wikipedia.
I note that while we cannot know who is using categories, we can still get page count stats for the category itself. These tend to be close to 0-per-day for a lot of categories (e.g. Town halls in Queensland). Even a category that one might think has much greater interest get relatively low numbers, e.g. "Presidents of the United States" gets 26-per-day views on average. This compares with 37K daily average for the Donald Trump article, 19K for Barack Obama, and 16K for George Washington. So this definitely suggests that the readers who presumably make up the bulk of the views on the presidential articles are not looking at the obvious category for such folk (although they might be moving between presidential articles using by navboxes, succession boxes, lists or other links). Having said that, the Donald Trump article has *53* categories of which Presidents of the United States is number 39 (they appear to be alphabetically ordered), so it is possible that the reader never found the presidential category which is lost in a sea of categories like "21st century Presbyterians" and "Critics of the European Union". I would really have thought that being in the category Presidents of the USA was a slightly more important to the topic of the article than his apparent conversion to Presbyterianism in the 21st century (given he's not categorised as a 20th century Presbyterian).
And, somewhat amazingly, there is no apparent category for "Critics of Donald Trump". I must propose it, along with a fully diffused sub-cat system of Critics of Donald Trump's immigration policies, Critics of Donald Trump's hair, etc. By the time I've add all the relevant articles to those categories, I should have at least another 100K edits to my name!
Kerry
-----Original Message----- From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Federico Leva (Nemo) Sent: Friday, 25 May 2018 7:14 AM To: Research into Wikimedia content and communities < wiki-research-l@lists.wikimedia.org>; Ziko van Dijk zvandijk@gmail.com Subject: Re: [Wiki-research-l] Reader use of Wikipedia and Commons categories
Ziko van Dijk, 24/05/2018 23:08:
When it comes to Commons, I would be very interested to learn how many readers (or recipients) are actually non Wikipedia editors.
It would be useful to consider less common but high value usage, for instance people looking for illustrations for a publication. Such searches could be substitutes for specialised (and expensive) databases, so the value provided by Commons categories may be higher than the mere usage numbers suggest. (It should be measured in hours saved or something like that.)
Federico
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org