A quick and dirty solution might be to use the hostbot list from the
teahouse at
https://en.wikipedia.org/wiki/Wikipedia:Teahouse/Hosts/Database_reports The
list is regularly refreshed, so you could pull the account names from there
over the course of a month and then randomly select your sample, noting
that it is biased towards new editors that have made more than 10 edits.
Otherwise perhaps using recent changes, but filtering for logged actions by
new users?
https://en.wikipedia.org/wiki/Special:RecentChanges?userExpLevel=newcomer&a…
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
On Wed, 13 Mar 2019 at 04:49, Haifeng Zhang <haifeng1(a)andrew.cmu.edu> wrote:
Hi folks,
My work needs to randomly sample new editors in each month, e.g., 100
editors per month.
Do any of you have good suggestions for how to do this efficiently?
I could think of using the dump files, but wonder are there other options?
Thanks,
Haifeng Zhang
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l