Hello Sneha!

You may already know about our private pageview_hourly dataset, which contains hourly pageviews data split by a variety of dimension including geolocated country and the Wikipedia Zero carrier, if any.

For unregistered/IP edits, the IP address is permanently, publicly saved in revision history, and you could easily pass those addresses to a geolocation database to get a generally accurate idea of which country they were made from. You could link this data other information about the edit, such as its content or whether it is was made using the mobile website.

For registered edits, there is the private geowiki dataset which from 2012 to 2018 saved an aggregate number of edits per project per country (it has been replaced by the newer, better geoeditors dataset), although it does not have any data about Zero status particularly. There is no way to link this data to information about the edits since we only keep per-editor/per-edit geolocation data for 90 days.

Since you've already colloborated with the Research team, you probably know that you would need to set up a formal research collaboration with them to get access to either of the private datasets I mentioned.

Hope that helps! šŸ˜

On Mon, 25 Mar 2019 at 09:38, Sneha Narayan <snehanarayan@gmail.com> wrote:
Hello everyone!

I'm a CS professor at Carleton College (formerly a PhD student at Northwestern), and I've collaborated with folks at WMF on WP research in the past. Most notably, I was the lead author on a paper written with Jonathan Morgan and Jake Orlowitz evaluating the Wikipedia Adventure. I hope to continue having productive collaborations with other people who care about WP, and keep producing research that supports the future of the project.

A potential research idea I'd like to explore is understanding whether and how Wikipedia Zero impacted the amount and nature of participation on WP in the projects that were affected by its rollout. Since the Wikipedia Zero program lasted during a particular timeĀ period and then ended, it also sets up a potentially good avenue for a comparative study.

I was wondering if any of you were aware of any datasets that log information about the rollout of/participation in Wikipedia Zero. Specifically, some of the data I'm interested in include:

- Countries that had access to WP Zero, including dates/times that this was turned on and off
- Any information about whether access through WP Zero meant that you could only visit/edit particular parts or language editions of Wikipedia (I don't think this was the case, but I wanted to make sure)
- Edits made to any WP language edition by IP addresses from those countries during that time period
- Whether those edits were being made using a device that accessed WP through WP Zero
- The kind of device being used for editing
- Other generic features of the edit (number of characters, namespace, registered/unregistered etc)

I'm aware that the data may not be recorded exactly along those lines, but I was still curious to know what data about Wikipedia Zero is out there, and whether or not it was publicly available.Ā 

Thank you for your help!
-Sneha


Sneha Narayan
Department of Computer Science
Carleton College
snehanarayan.com
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics