Regarding Kerry Raymond's "Patriotic editing hypothesis", I've done some very simple informal investigation regarding the quality of geographic articles, these are mostly on cities, towns, counties, etc. in en:Wikipedia. Geographic articles have much lower average quality scores than other subjects (see https://en.wikipedia.org/wiki/User:Smallbones/Quality4by4 ) With just a small bit of poking around it's obvious that the quality difference between geo articles and the rest is due to geo articles about countries where English is not the native language. A bit more poking and something that should have been really obvious jumps out. French geo articles on FR:Wiki are much better (at least longer) than the corresponding EN:Wiki article; Russian geo articles are much better on RU:Wiki than on EN:Wiki, etc.
This is certainly consistent with the "Patriotic editing hypothesis" if we define patriotism by language rather than by borders. It could be checked out with other language versions e.g. German vs. French; (Finnish, Estonian, Polish, German, or Hungarian, etc.) vs.Russian; Chinese vs. any language.
The hypothesis even had a very practical implication - we should translate more geo articles from their native language Wikipedias.
Hope this helps, Pete Ekman ==== Date: Tue, 24 Jan 2017 11:12:58 +1000 From: "Kerry Raymond" kerry.raymond@gmail.com To: "'Research into Wikimedia content and communities'" wiki-research-l@lists.wikimedia.org Subject: [Wiki-research-l] regional KPIs Message-ID: 006701d275df$02016b90$060442b0$@gmail.com Content-Type: text/plain; charset="utf-8"
As previously came up in discussion about chapters, it would be very useful to have national data about Wikipedia activities, which can be determined (generally) from IP addresses. Now I understand the privacy argument in relation to logged-in users (not saying I agree with it though in relation to aggregate data). However, can we find a proxy that does not have the privacy considerations.
My hypothesis is that national content is predominantly written by users resident in that nation. And that therefore activity on national content can be used as a proxy for national user editing activity.
In the case of Australia, we could describe Australian national content in either of two ways: articles within the closure of the [[Category:Australia]] and/or those tagged as {{WikiProject Australia}}. There are arguments for/against either (neither is perfect, in my experience the category closure will tend to have false positives and the project will tend to have false negatives).
I would like to know what correlation exists between national editor activity (as determined from IP addresses mapped to location) and national content edits and if/how it changes over time for various nations. This is research that only WMF can do because WMF has the IP addresses and the rest of us can't have them for privacy reasons.
If we could establish that a strong-enough correlation existed between them, we could use national content activity (for which there is no privacy consideration) as a proxy for national editing activity. And we might even be able to come up with a multiplier for each nation to provide comparable data for national editing activity.
Now, it may be that we need to restrict the edits themselves in some way to maximise the correlations between national content and same-nation editor activity.
My second hypothesis is "semantic" edits (e.g. edits that add large amounts of content or citation) to national content will be more highly correlated with same-nation editors than "syntactic" edits (e.g. fix spelling, punctuation or Manual of Style issues) will be. I suspect most bots and other automated/semi-automated edits are doing syntactic edits.
Now, some of you will probably be aware of [https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2017-01-17/Recen t_research Female Wikipedians aren't more likely to edit women biographies]. So it may well be that my patriotic-editing hypothesis is also untrue. But it would be nice to know one way or the other.
Kerry
A couple of research papers that might be helpful:
1: Hecht, B. and Gergle, D. 2009. Measuring Self-Focus Bias in Community-Maintained Knowledge Repositories. Proceedings of the 2009 International Conference on Communities and Technologies , pp. 11-19. http://www.brenthecht.com/publications/bhecht_CommAndTech2009.pdf
In their paper, Hecht & Gergle study how content in some of the Wikipedia editions is focused on certain countries, and those typically correspond to where the languages are spoken.
2: Warncke-Wang, M., Uduwage, A., Dong, Z., and Riedl, J. "In Search of the Ur-Wikipedia: Universality, Similarity, and Translation in the Wikipedia Inter-language Link Network", in WikiSym 2012. http://www-users.cs.umn.edu/~morten/publications/wikisym2012-urwikipedia.pdf
In this paper we wanted to study similarity based on distance, meaning that we needed to see if we could locate a Wikipedia edition to a specific country. Turns out that if you look at the statistics[1], a lot of the language editions get the vast majority of edits from a single country. While that's not helpful when it comes to the English edition, it arguably solves the problem for quite a few other languages.
Footnotes: 1: https://stats.wikimedia.org/wikimedia/squids/SquidReportPageEditsPerLanguage...
Cheers, Morten
On 24 January 2017 at 07:27, Peter Ekman pdekman@gmail.com wrote:
Regarding Kerry Raymond's "Patriotic editing hypothesis", I've done some very simple informal investigation regarding the quality of geographic articles, these are mostly on cities, towns, counties, etc. in en:Wikipedia. Geographic articles have much lower average quality scores than other subjects (see https://en.wikipedia.org/wiki/User:Smallbones/Quality4by4 ) With just a small bit of poking around it's obvious that the quality difference between geo articles and the rest is due to geo articles about countries where English is not the native language. A bit more poking and something that should have been really obvious jumps out. French geo articles on FR:Wiki are much better (at least longer) than the corresponding EN:Wiki article; Russian geo articles are much better on RU:Wiki than on EN:Wiki, etc.
This is certainly consistent with the "Patriotic editing hypothesis" if we define patriotism by language rather than by borders. It could be checked out with other language versions e.g. German vs. French; (Finnish, Estonian, Polish, German, or Hungarian, etc.) vs.Russian; Chinese vs. any language.
The hypothesis even had a very practical implication - we should translate more geo articles from their native language Wikipedias.
Hope this helps, Pete Ekman ==== Date: Tue, 24 Jan 2017 11:12:58 +1000 From: "Kerry Raymond" kerry.raymond@gmail.com To: "'Research into Wikimedia content and communities'" wiki-research-l@lists.wikimedia.org Subject: [Wiki-research-l] regional KPIs Message-ID: 006701d275df$02016b90$060442b0$@gmail.com Content-Type: text/plain; charset="utf-8"
As previously came up in discussion about chapters, it would be very useful to have national data about Wikipedia activities, which can be determined (generally) from IP addresses. Now I understand the privacy argument in relation to logged-in users (not saying I agree with it though in relation to aggregate data). However, can we find a proxy that does not have the privacy considerations.
My hypothesis is that national content is predominantly written by users resident in that nation. And that therefore activity on national content can be used as a proxy for national user editing activity.
In the case of Australia, we could describe Australian national content in either of two ways: articles within the closure of the [[Category:Australia]] and/or those tagged as {{WikiProject Australia}}. There are arguments for/against either (neither is perfect, in my experience the category closure will tend to have false positives and the project will tend to have false negatives).
I would like to know what correlation exists between national editor activity (as determined from IP addresses mapped to location) and national content edits and if/how it changes over time for various nations. This is research that only WMF can do because WMF has the IP addresses and the rest of us can't have them for privacy reasons.
If we could establish that a strong-enough correlation existed between them, we could use national content activity (for which there is no privacy consideration) as a proxy for national editing activity. And we might even be able to come up with a multiplier for each nation to provide comparable data for national editing activity.
Now, it may be that we need to restrict the edits themselves in some way to maximise the correlations between national content and same-nation editor activity.
My second hypothesis is "semantic" edits (e.g. edits that add large amounts of content or citation) to national content will be more highly correlated with same-nation editors than "syntactic" edits (e.g. fix spelling, punctuation or Manual of Style issues) will be. I suspect most bots and other automated/semi-automated edits are doing syntactic edits.
Now, some of you will probably be aware of [https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_ Signpost/2017-01-17/Recen t_research Female Wikipedians aren't more likely to edit women biographies]. So it may well be that my patriotic-editing hypothesis is also untrue. But it would be nice to know one way or the other.
Kerry
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hoi, A similar thing can be found when you look at the history of a country. Indonesia and Malaysia have much better articles than English Wikipedia. In the same way, the content of western nobility is much better served in Wikidata than the content for Asian nobility.
This is to be expected.
The point of the original thread is how to measure the effectiveness of a chapter. To give a chapter credit for what it does, you will find that finding a truth in data is highly problematic when you seek a general rule. Thanks, GerardM
On 24 January 2017 at 16:27, Peter Ekman pdekman@gmail.com wrote:
Regarding Kerry Raymond's "Patriotic editing hypothesis", I've done some very simple informal investigation regarding the quality of geographic articles, these are mostly on cities, towns, counties, etc. in en:Wikipedia. Geographic articles have much lower average quality scores than other subjects (see https://en.wikipedia.org/wiki/User:Smallbones/Quality4by4 ) With just a small bit of poking around it's obvious that the quality difference between geo articles and the rest is due to geo articles about countries where English is not the native language. A bit more poking and something that should have been really obvious jumps out. French geo articles on FR:Wiki are much better (at least longer) than the corresponding EN:Wiki article; Russian geo articles are much better on RU:Wiki than on EN:Wiki, etc.
This is certainly consistent with the "Patriotic editing hypothesis" if we define patriotism by language rather than by borders. It could be checked out with other language versions e.g. German vs. French; (Finnish, Estonian, Polish, German, or Hungarian, etc.) vs.Russian; Chinese vs. any language.
The hypothesis even had a very practical implication - we should translate more geo articles from their native language Wikipedias.
Hope this helps, Pete Ekman ==== Date: Tue, 24 Jan 2017 11:12:58 +1000 From: "Kerry Raymond" kerry.raymond@gmail.com To: "'Research into Wikimedia content and communities'" wiki-research-l@lists.wikimedia.org Subject: [Wiki-research-l] regional KPIs Message-ID: 006701d275df$02016b90$060442b0$@gmail.com Content-Type: text/plain; charset="utf-8"
As previously came up in discussion about chapters, it would be very useful to have national data about Wikipedia activities, which can be determined (generally) from IP addresses. Now I understand the privacy argument in relation to logged-in users (not saying I agree with it though in relation to aggregate data). However, can we find a proxy that does not have the privacy considerations.
My hypothesis is that national content is predominantly written by users resident in that nation. And that therefore activity on national content can be used as a proxy for national user editing activity.
In the case of Australia, we could describe Australian national content in either of two ways: articles within the closure of the [[Category:Australia]] and/or those tagged as {{WikiProject Australia}}. There are arguments for/against either (neither is perfect, in my experience the category closure will tend to have false positives and the project will tend to have false negatives).
I would like to know what correlation exists between national editor activity (as determined from IP addresses mapped to location) and national content edits and if/how it changes over time for various nations. This is research that only WMF can do because WMF has the IP addresses and the rest of us can't have them for privacy reasons.
If we could establish that a strong-enough correlation existed between them, we could use national content activity (for which there is no privacy consideration) as a proxy for national editing activity. And we might even be able to come up with a multiplier for each nation to provide comparable data for national editing activity.
Now, it may be that we need to restrict the edits themselves in some way to maximise the correlations between national content and same-nation editor activity.
My second hypothesis is "semantic" edits (e.g. edits that add large amounts of content or citation) to national content will be more highly correlated with same-nation editors than "syntactic" edits (e.g. fix spelling, punctuation or Manual of Style issues) will be. I suspect most bots and other automated/semi-automated edits are doing syntactic edits.
Now, some of you will probably be aware of [https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_ Signpost/2017-01-17/Recen t_research Female Wikipedians aren't more likely to edit women biographies]. So it may well be that my patriotic-editing hypothesis is also untrue. But it would be nice to know one way or the other.
Kerry
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hello,
In my 2009 study I have looked at some Wikipedias and compared geographika articles with regard to the language in question. There is a theoreme by the socioloinguist Heinz Kloss about the "eigenbezogene Themen", topics, that are related to the own specific linguistic community.
According to Kloss, a community is mainly interested in its own language, culture and history, the country / landscape, and also typical crafts. Kloss argues that there is a relatively rich literature in this language about these topics, and much less about other topics such as aeroplane construction.
(I noticed that the university of the Faroe islands, for example, has courses to educate teachers and also a department for nautica and fishing. For other subjects you'll have to leave the islands and also your native language.)
In my comparison I checked briefly whether a language version of Wikipedia is at least doing well in articles about its own linguistic reagion. For example, someone who is interested in the Dutch province of Friesland will find for about equally much information in Frisian Wikipedia and Dutch Wikipedia. (At least, in 2008/2009.) This was not the case for Corsican and French Wikipedia, with Corsican Wikipedia being much weaker.
I wouldn't call the phenomenon "patriotic editing" because that implies a certain intention that the individual contributors might not have. If I translate Kloss' term, it should be something more like "self related contributing" or "contents with regard to the own (linguistic) community/society".
By the way, I don't think that translations from Wikipedia to Wikipedia are the best way to create good content. An article about Paris in Dutch has to differ from the article in French, as you have a different readership with different backgrounds and interests.
Kind regards Ziko
2017-01-25 8:20 GMT+01:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, A similar thing can be found when you look at the history of a country. Indonesia and Malaysia have much better articles than English Wikipedia. In the same way, the content of western nobility is much better served in Wikidata than the content for Asian nobility.
This is to be expected.
The point of the original thread is how to measure the effectiveness of a chapter. To give a chapter credit for what it does, you will find that finding a truth in data is highly problematic when you seek a general rule. Thanks, GerardM
On 24 January 2017 at 16:27, Peter Ekman pdekman@gmail.com wrote:
Regarding Kerry Raymond's "Patriotic editing hypothesis", I've done some very simple informal investigation regarding the quality of geographic articles, these are mostly on cities, towns, counties, etc. in en:Wikipedia. Geographic articles have much lower average quality scores than other subjects (see https://en.wikipedia.org/wiki/User:Smallbones/Quality4by4 ) With just a small bit of poking around it's obvious that the quality difference between geo articles and the rest is due to geo articles about countries where English is not the native language. A bit more poking and something that should have been really obvious jumps out. French geo articles on FR:Wiki are much better (at least longer) than the corresponding EN:Wiki article; Russian geo articles are much better on RU:Wiki than on EN:Wiki, etc.
This is certainly consistent with the "Patriotic editing hypothesis" if we define patriotism by language rather than by borders. It could be checked out with other language versions e.g. German vs. French; (Finnish, Estonian, Polish, German, or Hungarian, etc.) vs.Russian; Chinese vs. any language.
The hypothesis even had a very practical implication - we should translate more geo articles from their native language Wikipedias.
Hope this helps, Pete Ekman ==== Date: Tue, 24 Jan 2017 11:12:58 +1000 From: "Kerry Raymond" kerry.raymond@gmail.com To: "'Research into Wikimedia content and communities'" wiki-research-l@lists.wikimedia.org Subject: [Wiki-research-l] regional KPIs Message-ID: 006701d275df$02016b90$060442b0$@gmail.com Content-Type: text/plain; charset="utf-8"
As previously came up in discussion about chapters, it would be very useful to have national data about Wikipedia activities, which can be determined (generally) from IP addresses. Now I understand the privacy argument in relation to logged-in users (not saying I agree with it though in relation to aggregate data). However, can we find a proxy that does not have the privacy considerations.
My hypothesis is that national content is predominantly written by users resident in that nation. And that therefore activity on national content can be used as a proxy for national user editing activity.
In the case of Australia, we could describe Australian national content in either of two ways: articles within the closure of the [[Category:Australia]] and/or those tagged as {{WikiProject Australia}}. There are arguments for/against either (neither is perfect, in my experience the category closure will tend to have false positives and the project will tend to have false negatives).
I would like to know what correlation exists between national editor activity (as determined from IP addresses mapped to location) and national content edits and if/how it changes over time for various nations. This is research that only WMF can do because WMF has the IP addresses and the rest of us can't have them for privacy reasons.
If we could establish that a strong-enough correlation existed between them, we could use national content activity (for which there is no privacy consideration) as a proxy for national editing activity. And we might even be able to come up with a multiplier for each nation to provide comparable data for national editing activity.
Now, it may be that we need to restrict the edits themselves in some way to maximise the correlations between national content and same-nation editor activity.
My second hypothesis is "semantic" edits (e.g. edits that add large amounts of content or citation) to national content will be more highly correlated with same-nation editors than "syntactic" edits (e.g. fix spelling, punctuation or Manual of Style issues) will be. I suspect most bots and other automated/semi-automated edits are doing syntactic edits.
Now, some of you will probably be aware of [https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/ 2017-01-17/Recen t_research Female Wikipedians aren't more likely to edit women biographies]. So it may well be that my patriotic-editing hypothesis is also untrue. But it would be nice to know one way or the other.
Kerry
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org