As previously came up in discussion about chapters, it would be very useful to have national data about Wikipedia activities, which can be determined (generally) from IP addresses. Now I understand the privacy argument in relation to logged-in users (not saying I agree with it though in relation to aggregate data). However, can we find a proxy that does not have the privacy considerations.
My hypothesis is that national content is predominantly written by users resident in that nation. And that therefore activity on national content can be used as a proxy for national user editing activity.
In the case of Australia, we could describe Australian national content in either of two ways: articles within the closure of the [[Category:Australia]] and/or those tagged as {{WikiProject Australia}}. There are arguments for/against either (neither is perfect, in my experience the category closure will tend to have false positives and the project will tend to have false negatives).
I would like to know what correlation exists between national editor activity (as determined from IP addresses mapped to location) and national content edits and if/how it changes over time for various nations. This is research that only WMF can do because WMF has the IP addresses and the rest of us can't have them for privacy reasons.
If we could establish that a strong-enough correlation existed between them, we could use national content activity (for which there is no privacy consideration) as a proxy for national editing activity. And we might even be able to come up with a multiplier for each nation to provide comparable data for national editing activity.
Now, it may be that we need to restrict the edits themselves in some way to maximise the correlations between national content and same-nation editor activity.
My second hypothesis is "semantic" edits (e.g. edits that add large amounts of content or citation) to national content will be more highly correlated with same-nation editors than "syntactic" edits (e.g. fix spelling, punctuation or Manual of Style issues) will be. I suspect most bots and other automated/semi-automated edits are doing syntactic edits.
Now, some of you will probably be aware of [https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2017-01-17/Recen t_research Female Wikipedians aren't more likely to edit women biographies]. So it may well be that my patriotic-editing hypothesis is also untrue. But it would be nice to know one way or the other.
Kerry
Hoi, What Wikipedia? It is highly likely that articles written about any subject are written by people who know the language involved. This means that all articles about the United States are most likely written in Indonesia when the language is Javanese or in the Netherlands when the language is Dutch. We know from research that was done in them olden days that for some languages there are emigre community that writes a lot; this was true for Napoleatan.
While I understand the interest in the question, what is it we will benefit from researching this? There is plenty of actionable research we could do. Or to put it more bluntly, when we seek parameters that may drive more editing/ quality edits research will be of benefit. When we want to ensure a more consistent point of view over all our Wikipedias I would understand the need for research (have ideas on that one). Thanks, GerardM
On 24 January 2017 at 02:12, Kerry Raymond kerry.raymond@gmail.com wrote:
As previously came up in discussion about chapters, it would be very useful to have national data about Wikipedia activities, which can be determined (generally) from IP addresses. Now I understand the privacy argument in relation to logged-in users (not saying I agree with it though in relation to aggregate data). However, can we find a proxy that does not have the privacy considerations.
My hypothesis is that national content is predominantly written by users resident in that nation. And that therefore activity on national content can be used as a proxy for national user editing activity.
In the case of Australia, we could describe Australian national content in either of two ways: articles within the closure of the [[Category:Australia]] and/or those tagged as {{WikiProject Australia}}. There are arguments for/against either (neither is perfect, in my experience the category closure will tend to have false positives and the project will tend to have false negatives).
I would like to know what correlation exists between national editor activity (as determined from IP addresses mapped to location) and national content edits and if/how it changes over time for various nations. This is research that only WMF can do because WMF has the IP addresses and the rest of us can’t have them for privacy reasons.
If we could establish that a strong-enough correlation existed between them, we could use national content activity (for which there is no privacy consideration) as a proxy for national editing activity. And we might even be able to come up with a multiplier for each nation to provide comparable data for national editing activity.
Now, it may be that we need to restrict the edits themselves in some way to maximise the correlations between national content and same-nation editor activity.
My second hypothesis is “semantic” edits (e.g. edits that add large amounts of content or citation) to national content will be more highly correlated with same-nation editors than “syntactic” edits (e.g. fix spelling, punctuation or Manual of Style issues) will be. I suspect most bots and other automated/semi-automated edits are doing syntactic edits.
Now, some of you will probably be aware of [https://en.wikipedia.org/ wiki/Wikipedia:Wikipedia_Signpost/2017-01-17/Recent_research Female Wikipedians aren't more likely to edit women biographies]. So it may well be that my patriotic-editing hypothesis is also untrue. But it would be nice to know one way or the other.
Kerry
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Yes, but when you are one of many English-speaking nations and in a world where English is widely spoken as a 2nd language, it’s hard to know if outreach from your chapter has any impact on en.WP. WMF asks for success metrics / KPIs or whatever you like to call them. Right now it’s hard to gather any evidence. Obviously where there is a high correlation between language and nation, it’s quite plausible that see WP contributions in that language have arisen from editor activity in that nation. In Australia, we do not have that situation. Therefore, if there was a known correlation between Australian user activity and Australian content activity, then we could use the content activity as a proxy for editor activity. Right now, I don’t think we have the evidence either way as to whether there would be any validity in that proxy assumption.
My comments follow from the earlier thread about chapters. At the moment we do things in chapter in the hope they “help”. Frankly that could be a big waste of everyone’s time if there is no impact. It’s actionable all right. We might stop doing some things and start doing other things or we might be motivated to put even more effort into existing things. It could help us determine if a more general program (like 1Lib1Ref) was succeeding or not in different countries which would be starting point for trying to understand why it works better in some than others.
Kerry
From: Gerard Meijssen [mailto:gerard.meijssen@gmail.com] Sent: Tuesday, 24 January 2017 3:46 PM To: Kerry Raymond kerry.raymond@gmail.com; Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Subject: Re: [Wiki-research-l] regional KPIs
Hoi,
What Wikipedia? It is highly likely that articles written about any subject are written by people who know the language involved. This means that all articles about the United States are most likely written in Indonesia when the language is Javanese or in the Netherlands when the language is Dutch. We know from research that was done in them olden days that for some languages there are emigre community that writes a lot; this was true for Napoleatan.
While I understand the interest in the question, what is it we will benefit from researching this? There is plenty of actionable research we could do. Or to put it more bluntly, when we seek parameters that may drive more editing/ quality edits research will be of benefit. When we want to ensure a more consistent point of view over all our Wikipedias I would understand the need for research (have ideas on that one).
Thanks,
GerardM
On 24 January 2017 at 02:12, Kerry Raymond <kerry.raymond@gmail.com mailto:kerry.raymond@gmail.com > wrote:
As previously came up in discussion about chapters, it would be very useful to have national data about Wikipedia activities, which can be determined (generally) from IP addresses. Now I understand the privacy argument in relation to logged-in users (not saying I agree with it though in relation to aggregate data). However, can we find a proxy that does not have the privacy considerations.
My hypothesis is that national content is predominantly written by users resident in that nation. And that therefore activity on national content can be used as a proxy for national user editing activity.
In the case of Australia, we could describe Australian national content in either of two ways: articles within the closure of the [[Category:Australia]] and/or those tagged as {{WikiProject Australia}}. There are arguments for/against either (neither is perfect, in my experience the category closure will tend to have false positives and the project will tend to have false negatives).
I would like to know what correlation exists between national editor activity (as determined from IP addresses mapped to location) and national content edits and if/how it changes over time for various nations. This is research that only WMF can do because WMF has the IP addresses and the rest of us can’t have them for privacy reasons.
If we could establish that a strong-enough correlation existed between them, we could use national content activity (for which there is no privacy consideration) as a proxy for national editing activity. And we might even be able to come up with a multiplier for each nation to provide comparable data for national editing activity.
Now, it may be that we need to restrict the edits themselves in some way to maximise the correlations between national content and same-nation editor activity.
My second hypothesis is “semantic” edits (e.g. edits that add large amounts of content or citation) to national content will be more highly correlated with same-nation editors than “syntactic” edits (e.g. fix spelling, punctuation or Manual of Style issues) will be. I suspect most bots and other automated/semi-automated edits are doing syntactic edits.
Now, some of you will probably be aware of [https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2017-01-17/Recent... Female Wikipedians aren't more likely to edit women biographies]. So it may well be that my patriotic-editing hypothesis is also untrue. But it would be nice to know one way or the other.
Kerry
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org mailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Mon, Jan 23, 2017 at 10:50 PM, Kerry Raymond kerry.raymond@gmail.com wrote:
Yes, but when you are one of many English-speaking nations and in a world where English is widely spoken as a 2nd language, it’s hard to know if outreach from your chapter has any impact on en.WP. WMF asks for success metrics / KPIs or whatever you like to call them. Right now it’s hard to gather any evidence.
What I say below is pretty much just process and pointers, not research related:
To the one specific point you raised above: I agree with you, and looking around me I can say that many in WMF do agree with you (and we also know that agreeing is not enough, and we hope to address that soon.).
There is a task to track the work https://phabricator.wikimedia.org/T131280 to release relevant data. Please subscribe to the task to monitor progress if you're interested, and if you want, consider giving it a token if you feel strongly about it. :) Regarding timelines (and keep in mind I'm not in Analytics) Nuria from Analytics says in https://lists.wikimedia.org/pipermail/analytics/2017-January/005654.html that there is hope to have such data out by April 2017. I would keep an eye on https://www.mediawiki.org/wiki/Wikimedia_Engineering/2016-17_Q4_Goals#Analyt... to see if this task gets prioritized for Q4 which is the quarter starting April 2017 and ending June 2017. Even if it doesn't get prioritized, it may get done, but it's always more assuring if it does. https://lists.wikimedia.org/pipermail/analytics/2017-January/005654.html
https://lists.wikimedia.org/pipermail/analytics/2017-January/005654.html Best, Leila
Kerry
*From:* Gerard Meijssen [mailto:gerard.meijssen@gmail.com] *Sent:* Tuesday, 24 January 2017 3:46 PM *To:* Kerry Raymond kerry.raymond@gmail.com; Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org *Subject:* Re: [Wiki-research-l] regional KPIs
Hoi,
What Wikipedia? It is highly likely that articles written about any subject are written by people who know the language involved. This means that all articles about the United States are most likely written in Indonesia when the language is Javanese or in the Netherlands when the language is Dutch. We know from research that was done in them olden days that for some languages there are emigre community that writes a lot; this was true for Napoleatan.
While I understand the interest in the question, what is it we will benefit from researching this? There is plenty of actionable research we could do. Or to put it more bluntly, when we seek parameters that may drive more editing/ quality edits research will be of benefit. When we want to ensure a more consistent point of view over all our Wikipedias I would understand the need for research (have ideas on that one).
Thanks,
GerardM
On 24 January 2017 at 02:12, Kerry Raymond kerry.raymond@gmail.com wrote:
As previously came up in discussion about chapters, it would be very useful to have national data about Wikipedia activities, which can be determined (generally) from IP addresses. Now I understand the privacy argument in relation to logged-in users (not saying I agree with it though in relation to aggregate data). However, can we find a proxy that does not have the privacy considerations.
My hypothesis is that national content is predominantly written by users resident in that nation. And that therefore activity on national content can be used as a proxy for national user editing activity.
In the case of Australia, we could describe Australian national content in either of two ways: articles within the closure of the [[Category:Australia]] and/or those tagged as {{WikiProject Australia}}. There are arguments for/against either (neither is perfect, in my experience the category closure will tend to have false positives and the project will tend to have false negatives).
I would like to know what correlation exists between national editor activity (as determined from IP addresses mapped to location) and national content edits and if/how it changes over time for various nations. This is research that only WMF can do because WMF has the IP addresses and the rest of us can’t have them for privacy reasons.
If we could establish that a strong-enough correlation existed between them, we could use national content activity (for which there is no privacy consideration) as a proxy for national editing activity. And we might even be able to come up with a multiplier for each nation to provide comparable data for national editing activity.
Now, it may be that we need to restrict the edits themselves in some way to maximise the correlations between national content and same-nation editor activity.
My second hypothesis is “semantic” edits (e.g. edits that add large amounts of content or citation) to national content will be more highly correlated with same-nation editors than “syntactic” edits (e.g. fix spelling, punctuation or Manual of Style issues) will be. I suspect most bots and other automated/semi-automated edits are doing syntactic edits.
Now, some of you will probably be aware of [https://en.wikipedia.org/ wiki/Wikipedia:Wikipedia_Signpost/2017-01-17/Recent_research Female Wikipedians aren't more likely to edit women biographies]. So it may well be that my patriotic-editing hypothesis is also untrue. But it would be nice to know one way or the other.
Kerry
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
"closure of the [[Category:Australia]]" is not going to work. In en.wiki subcategories are not subsets in any mathematical sense and the category tree has many, many loops and no roots.
cheers stuart
-- ...let us be heard from red core to black sky
On Tue, Jan 24, 2017 at 2:12 PM, Kerry Raymond kerry.raymond@gmail.com wrote:
As previously came up in discussion about chapters, it would be very useful to have national data about Wikipedia activities, which can be determined (generally) from IP addresses. Now I understand the privacy argument in relation to logged-in users (not saying I agree with it though in relation to aggregate data). However, can we find a proxy that does not have the privacy considerations.
My hypothesis is that national content is predominantly written by users resident in that nation. And that therefore activity on national content can be used as a proxy for national user editing activity.
In the case of Australia, we could describe Australian national content in either of two ways: articles within the closure of the [[Category:Australia]] and/or those tagged as {{WikiProject Australia}}. There are arguments for/against either (neither is perfect, in my experience the category closure will tend to have false positives and the project will tend to have false negatives).
I would like to know what correlation exists between national editor activity (as determined from IP addresses mapped to location) and national content edits and if/how it changes over time for various nations. This is research that only WMF can do because WMF has the IP addresses and the rest of us can’t have them for privacy reasons.
If we could establish that a strong-enough correlation existed between them, we could use national content activity (for which there is no privacy consideration) as a proxy for national editing activity. And we might even be able to come up with a multiplier for each nation to provide comparable data for national editing activity.
Now, it may be that we need to restrict the edits themselves in some way to maximise the correlations between national content and same-nation editor activity.
My second hypothesis is “semantic” edits (e.g. edits that add large amounts of content or citation) to national content will be more highly correlated with same-nation editors than “syntactic” edits (e.g. fix spelling, punctuation or Manual of Style issues) will be. I suspect most bots and other automated/semi-automated edits are doing syntactic edits.
Now, some of you will probably be aware of [https://en.wikipedia.org/ wiki/Wikipedia:Wikipedia_Signpost/2017-01-17/Recent_research Female Wikipedians aren't more likely to edit women biographies]. So it may well be that my patriotic-editing hypothesis is also untrue. But it would be nice to know one way or the other.
Kerry
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Which is why we have Wikidata?
On Tue, Jan 24, 2017 at 8:03 AM, Stuart A. Yeates syeates@gmail.com wrote:
"closure of the [[Category:Australia]]" is not going to work. In en.wiki subcategories are not subsets in any mathematical sense and the category tree has many, many loops and no roots.
cheers stuart
-- ...let us be heard from red core to black sky
On Tue, Jan 24, 2017 at 2:12 PM, Kerry Raymond kerry.raymond@gmail.com wrote:
As previously came up in discussion about chapters, it would be very useful to have national data about Wikipedia activities, which can be determined (generally) from IP addresses. Now I understand the privacy argument in relation to logged-in users (not saying I agree with it though in relation to aggregate data). However, can we find a proxy that does not have the privacy considerations.
My hypothesis is that national content is predominantly written by users resident in that nation. And that therefore activity on national content can be used as a proxy for national user editing activity.
In the case of Australia, we could describe Australian national content in either of two ways: articles within the closure of the [[Category:Australia]] and/or those tagged as {{WikiProject Australia}}. There are arguments for/against either (neither is perfect, in my experience the category closure will tend to have false positives and the project will tend to have false negatives).
I would like to know what correlation exists between national editor activity (as determined from IP addresses mapped to location) and national content edits and if/how it changes over time for various nations. This is research that only WMF can do because WMF has the IP addresses and the rest of us can’t have them for privacy reasons.
If we could establish that a strong-enough correlation existed between them, we could use national content activity (for which there is no privacy consideration) as a proxy for national editing activity. And we might even be able to come up with a multiplier for each nation to provide comparable data for national editing activity.
Now, it may be that we need to restrict the edits themselves in some way to maximise the correlations between national content and same-nation editor activity.
My second hypothesis is “semantic” edits (e.g. edits that add large amounts of content or citation) to national content will be more highly correlated with same-nation editors than “syntactic” edits (e.g. fix spelling, punctuation or Manual of Style issues) will be. I suspect most bots and other automated/semi-automated edits are doing syntactic edits.
Now, some of you will probably be aware of [https://en.wikipedia.org/wiki /Wikipedia:Wikipedia_Signpost/2017-01-17/Recent_research Female Wikipedians aren't more likely to edit women biographies]. So it may well be that my patriotic-editing hypothesis is also untrue. But it would be nice to know one way or the other.
Kerry
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I think this will be important for us as a baseline to measure all sorts of things regarding chapter activity as well. Australia is probably worse than the Netherlands in terms of regional editting activity, and I have said before that we have a major problem finding US editors in the "fly-over states".
However, regarding your last comment "Now, some of you will probably be aware of [https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_ Signpost/2017-01-17/Recent_research Female Wikipedians aren't more likely to edit women biographies]." I think it is important to put this in perspective:
I spoke with the woman who wrote that and we talked about some of the reasons this is true, including the fact that most top names in any profession are male because of systemic bias during and after the lives of those men (years after their death, their names come down to us in history because their names are the ones recorded, etc). In general anyone starting out on Wikipedia is more likely to have their edits stick around if those edits are non-controversial and meet the standards of Wikipedia, which is mostly true for reliable sources about men. Many notable women have biographies on Wikipedia that are only mentioned in leading historical sources in passing. Only savvy wikipedians are able to craft such biographies with proper sourcing to save them from deletion. So this study also shows the difficulty in writing about women on Wikipedia, not necessarily the lack of interest in writing about them. I think it is a very interesting study, but the same conclusion can also be made for other marginalized groups of Wikipedia editors, such as about men living in Africa being more likely to write about Western males than African males, etc., or in Australia's case, Aboriginal men being more likely to write about non-Aboriginal men, etc.
On Tue, Jan 24, 2017 at 2:12 AM, Kerry Raymond kerry.raymond@gmail.com wrote:
As previously came up in discussion about chapters, it would be very useful to have national data about Wikipedia activities, which can be determined (generally) from IP addresses. Now I understand the privacy argument in relation to logged-in users (not saying I agree with it though in relation to aggregate data). However, can we find a proxy that does not have the privacy considerations.
My hypothesis is that national content is predominantly written by users resident in that nation. And that therefore activity on national content can be used as a proxy for national user editing activity.
In the case of Australia, we could describe Australian national content in either of two ways: articles within the closure of the [[Category:Australia]] and/or those tagged as {{WikiProject Australia}}. There are arguments for/against either (neither is perfect, in my experience the category closure will tend to have false positives and the project will tend to have false negatives).
I would like to know what correlation exists between national editor activity (as determined from IP addresses mapped to location) and national content edits and if/how it changes over time for various nations. This is research that only WMF can do because WMF has the IP addresses and the rest of us can’t have them for privacy reasons.
If we could establish that a strong-enough correlation existed between them, we could use national content activity (for which there is no privacy consideration) as a proxy for national editing activity. And we might even be able to come up with a multiplier for each nation to provide comparable data for national editing activity.
Now, it may be that we need to restrict the edits themselves in some way to maximise the correlations between national content and same-nation editor activity.
My second hypothesis is “semantic” edits (e.g. edits that add large amounts of content or citation) to national content will be more highly correlated with same-nation editors than “syntactic” edits (e.g. fix spelling, punctuation or Manual of Style issues) will be. I suspect most bots and other automated/semi-automated edits are doing syntactic edits.
Now, some of you will probably be aware of [https://en.wikipedia.org/ wiki/Wikipedia:Wikipedia_Signpost/2017-01-17/Recent_research Female Wikipedians aren't more likely to edit women biographies]. So it may well be that my patriotic-editing hypothesis is also untrue. But it would be nice to know one way or the other.
Kerry
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org