Which is why we have Wikidata?

On Tue, Jan 24, 2017 at 8:03 AM, Stuart A. Yeates <syeates@gmail.com> wrote:
"closure of the [[Category:Australia]]" is not going to work. In en.wiki subcategories are not subsets in any mathematical sense and the category tree has many, many loops and no roots.

cheers
stuart

--
...let us be heard from red core to black sky

On Tue, Jan 24, 2017 at 2:12 PM, Kerry Raymond <kerry.raymond@gmail.com> wrote:

As previously came up in discussion about chapters, it would be very useful to have national data about Wikipedia activities, which can be determined (generally) from IP addresses. Now I understand the privacy argument in relation to logged-in users (not saying I agree with it though in relation to aggregate data). However, can we find a proxy that does not have the privacy considerations.

 

My hypothesis is that national content is predominantly written by users resident in that nation. And that therefore activity on national content can be used as a proxy for national user editing activity.

 

In the case of Australia, we could describe Australian national content in either of two ways: articles within the closure of the [[Category:Australia]] and/or those tagged as  {{WikiProject Australia}}. There are arguments for/against either (neither is perfect, in my experience the category closure will tend to have false positives and the project will tend to have false negatives).

 

I would like to know what correlation exists between national editor activity (as determined from IP addresses mapped to location) and national content edits and if/how it changes over time for various nations. This is research that only WMF can do because WMF has the IP addresses and the rest of us can’t have them for privacy reasons.

 

If we could establish that a strong-enough correlation existed between them, we could use national content activity (for which there is no privacy consideration) as a proxy for national editing activity. And we might even be able to come up with a multiplier for each nation to provide comparable data for national editing activity.

 

Now, it may be that we need to restrict the edits themselves in some way to maximise the correlations between national content and same-nation editor activity.

 

My second hypothesis is “semantic” edits (e.g. edits that add large amounts of content or citation) to national content will be more highly correlated with same-nation editors than “syntactic” edits (e.g. fix spelling, punctuation or Manual of Style issues) will be. I suspect most bots and other automated/semi-automated edits are doing syntactic edits.

 

Now, some of you will probably be aware of [https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2017-01-17/Recent_research Female Wikipedians aren't more likely to edit women biographies]. So it may well be that my patriotic-editing hypothesis is also untrue. But it would be nice to know one way or the other.

 

Kerry

 


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l