Hello everyone,
I would like to carry out a study about how users work across different projects in the Wikimedia ecosystem. I can’t find any dataset containing all usernames and user ids across all the projects, or at least those with a global account. I’ve tried with quarry, but the query to get all the data from it is too big and is not really a solution. Can anybody point me to some resource I can download and process myself, e.g. a global account user dataset, or the whole user database table that can be queried in quarry?
Thanks, Alessandro
––– Alessandro Piscopo Web and Internet Science Group School of Electronics and Computer Science University of Southampton email: A.Piscopo@soton.ac.ukmailto:A.Piscopo@soton.ac.uk
Hi Alessandro,
Usernames are unique across Wikimedia projects now; so, it is possible to simply union/intersect the usernames from any projects to understand the overlap. There are, however, as you have seen a huge number of users although most are not active in any given month. One approach I have used [1] is to monitor the recent changes feeds for multiple projects for a time period and look at overlap within this data, but that obviously does not speak to longer trends. For that, I would first define a suitable metric (e.g., one or more edits per month [2] or 5 or more edits [3], etc.) to get a list of active editors per project.
[1] Hale, S. A. (2014). Multilinguals and Wikipedia editing. In Proceedings of the 6th Annual ACM Web Science Conference, WebSci ’14, ACM. http://www.scotthale.net/pubs/?websci2014 [2] https://stats.wikimedia.org/v2/#/en.wikipedia.org/contributing/editors/norma... [3] https://meta.wikimedia.org/wiki/Research:Wikistats_metrics/Editors
Best wishes, Scott
On Fri, Aug 3, 2018 at 2:52 PM Piscopo A. A.Piscopo@soton.ac.uk wrote:
Hello everyone,
I would like to carry out a study about how users work across different projects in the Wikimedia ecosystem. I can’t find any dataset containing all usernames and user ids across all the projects, or at least those with a global account. I’ve tried with quarry, but the query to get all the data from it is too big and is not really a solution. Can anybody point me to some resource I can download and process myself, e.g. a global account user dataset, or the whole user database table that can be queried in quarry?
Thanks, Alessandro
––– Alessandro Piscopo Web and Internet Science Group School of Electronics and Computer Science University of Southampton email: A.Piscopo@soton.ac.ukmailto:A.Piscopo@soton.ac.uk
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Scott,
Thank you very much, the paper is definitely relevant for what I am doing. However, I would like to look at how the behaviour of multilinguals evolve over time, so I will have to find another solution for getting user names. I’ll keep you and the community updated about the future developments of this.
Cheers, Alessandro
––– Alessandro Piscopo Web and Internet Science Group School of Electronics and Computer Science University of Southampton email: A.Piscopo@soton.ac.ukmailto:A.Piscopo@soton.ac.uk
On 3 Aug 2018, at 15:43, Scott Hale <computermacgyver@gmail.commailto:computermacgyver@gmail.com> wrote:
Hi Alessandro,
Usernames are unique across Wikimedia projects now; so, it is possible to simply union/intersect the usernames from any projects to understand the overlap. There are, however, as you have seen a huge number of users although most are not active in any given month. One approach I have used [1] is to monitor the recent changes feeds for multiple projects for a time period and look at overlap within this data, but that obviously does not speak to longer trends. For that, I would first define a suitable metric (e.g., one or more edits per month [2] or 5 or more edits [3], etc.) to get a list of active editors per project.
[1] Hale, S. A. (2014). Multilinguals and Wikipedia editing. In Proceedings of the 6th Annual ACM Web Science Conference, WebSci ’14, ACM. http://www.scotthale.net/pubs/?websci2014 [2] https://stats.wikimedia.org/v2/#/en.wikipedia.org/contributing/editors/norma... [3] https://meta.wikimedia.org/wiki/Research:Wikistats_metrics/Editors
Best wishes, Scott
On Fri, Aug 3, 2018 at 2:52 PM Piscopo A. <A.Piscopo@soton.ac.ukmailto:A.Piscopo@soton.ac.uk> wrote:
Hello everyone,
I would like to carry out a study about how users work across different projects in the Wikimedia ecosystem. I can’t find any dataset containing all usernames and user ids across all the projects, or at least those with a global account. I’ve tried with quarry, but the query to get all the data from it is too big and is not really a solution. Can anybody point me to some resource I can download and process myself, e.g. a global account user dataset, or the whole user database table that can be queried in quarry?
Thanks, Alessandro
––– Alessandro Piscopo Web and Internet Science Group School of Electronics and Computer Science University of Southampton email: A.Piscopo@soton.ac.ukmailto:A.Piscopo@soton.ac.ukmailto:A.Piscopo@soton.ac.uk
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Dr Scott A. Hale http://scott.hale.ushttp://scott.hale.us/ computermacgyver@gmail.commailto:computermacgyver@gmail.com _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
––– Alessandro Piscopo Web and Internet Science Group School of Electronics and Computer Science University of Southampton email: A.Piscopo@soton.ac.ukmailto:A.Piscopo@soton.ac.uk
wiki-research-l@lists.wikimedia.org