Hello Sneha!
You may already know about our private pageview_hourly dataset,
<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly>
which contains hourly pageviews data split by a variety of dimension
including geolocated country and the Wikipedia Zero carrier, if any.
For unregistered/IP edits, the IP address is permanently, publicly saved in
revision history, and you could easily pass those addresses to a
geolocation database to get a generally accurate idea of which country they
were made from. You could link this data other information about the edit,
such as its content or whether it is was made using the mobile website.
For registered edits, there is the private geowiki
<https://wikitech.wikimedia.org/wiki/Analytics/Archive/Geowiki> dataset
which from 2012 to 2018 saved an aggregate number of edits per project per
country (it has been replaced by the newer, better geoeditors
<https://wikitech.wikimedia.org/wiki/Analytics/Systems/Geoeditors>
dataset), although it does not have any data about Zero status
particularly. There is no way to link this data to information about the
edits since we only keep per-editor/per-edit geolocation data for 90 days.
Since you've already colloborated with the Research team, you probably know
that you would need to set up a formal research collaboratio
<https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations>n
with them to get access to either of the private datasets I mentioned.
Hope that helps! 😁
On Mon, 25 Mar 2019 at 09:38, Sneha Narayan <snehanarayan(a)gmail.com> wrote:
Hello everyone!
I'm a CS professor at Carleton College (formerly a PhD student at
Northwestern), and I've collaborated with folks at WMF on WP research in
the past. Most notably, I was the lead author on a paper written with
Jonathan Morgan and Jake Orlowitz evaluating the Wikipedia Adventure
<https://dl.acm.org/citation.cfm?id=2998307>. I hope to continue having
productive collaborations with other people who care about WP, and keep
producing research that supports the future of the project.
A potential research idea I'd like to explore is understanding whether and
how Wikipedia Zero impacted the amount and nature of participation on WP in
the projects that were affected by its rollout. Since the Wikipedia Zero
program lasted during a particular time period and then ended, it also sets
up a potentially good avenue for a comparative study.
I was wondering if any of you were aware of any datasets that log
information about the rollout of/participation in Wikipedia Zero.
Specifically, some of the data I'm interested in include:
- Countries that had access to WP Zero, including dates/times that this
was turned on and off
- Any information about whether access through WP Zero meant that you
could only visit/edit particular parts or language editions of Wikipedia (I
don't think this was the case, but I wanted to make sure)
- Edits made to any WP language edition by IP addresses from those
countries during that time period
- Whether those edits were being made using a device that accessed WP
through WP Zero
- The kind of device being used for editing
- Other generic features of the edit (number of characters, namespace,
registered/unregistered etc)
I'm aware that the data may not be recorded exactly along those lines, but
I was still curious to know what data about Wikipedia Zero is out there,
and whether or not it was publicly available.
Thank you for your help!
-Sneha
*Sneha Narayan*
Department of Computer Science
Carleton College
snehanarayan.com
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics