A quick and dirty solution might be to use the hostbot list from the teahouse at https://en.wikipedia.org/wiki/Wikipedia:Teahouse/Hosts/Database_reports The list is regularly refreshed, so you could pull the account names from there over the course of a month and then randomly select your sample, noting that it is biased towards new editors that have made more than 10 edits.
Otherwise perhaps using recent changes, but filtering for logged actions by new users? https://en.wikipedia.org/wiki/Special:RecentChanges?userExpLevel=newcomer&am...
https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
On Wed, 13 Mar 2019 at 04:49, Haifeng Zhang haifeng1@andrew.cmu.edu wrote:
Hi folks,
My work needs to randomly sample new editors in each month, e.g., 100 editors per month.
Do any of you have good suggestions for how to do this efficiently?
I could think of using the dump files, but wonder are there other options?
Thanks,
Haifeng Zhang _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l