Haifeng ,
While some suggests the dumps or notice boards, my immediate thought was a database query, e.g., through Quarry. It just happens that Jonathan T. Morgan has created a query there:
https://quarry.wmflabs.org/query/310
SELECT user_id, user_name, user_registration, user_editcount FROM enwiki_p.user WHERE user_registration > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1 DAY),'%Y%m%d%H%i%s') AND user_editcount > 10 AND user_id NOT IN (SELECT ug_user FROM enwiki_p.user_groups WHERE ug_group = 'bot') AND user_name not in (SELECT REPLACE(log_title,"_"," ") from enwiki_p.logging where log_type = "block" and log_action = "block" and log_timestamp > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 2 DAY),'%Y%m%d%H%i%s'));
You may fork from that query. There is R. Stuart Geiger (Staeiou)'s fork here https://quarry.wmflabs.org/query/34256 querying for month, - as another example.
Finn Årup Nielsen http://people.compute.dtu.dk/faan/
On 12/03/2019 19:18, Haifeng Zhang wrote:
Hi folks,
My work needs to randomly sample new editors in each month, e.g., 100 editors per month.
Do any of you have good suggestions for how to do this efficiently?
I could think of using the dump files, but wonder are there other options?
Thanks,
Haifeng Zhang _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l