Scott:
A good place to start to read about "bot spam" and its impact on the data
is this one:
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection We
recently released a new classification for traffic. Besides classifying
traffic as "user" or "spider" we also have now "automated"
which tags as
such traffic from a number of entities (but not all) that can be described
as "high-volume spammers". You probably have some questions after reading
the doc and for those we can set up a meeting.
Thanks,
Nuria
On Tue, Jun 16, 2020 at 9:55 AM Scott Bassett <sbassett(a)wikimedia.org>
wrote:
Hello Analytics Team-
The Security Team has recently spent some cycles investigating improved
anti-automation (bad bots, high-volume spammers, etc.) solutions,
particularly around an improved Wikimedia captcha. We were curious if your
team has any methods or advice regarding the analysis of nefarious
automated traffic within the context of raw web requests or any other
relevant analytics data. If the answer is "not really", that's fine. But
if there are some relevant tools, methods, research, etc. your team has
performed that you would like to share with us, that would be much
appreciated. If it makes sense to discuss this further during a quick
call, I can try to find some time for a few of us over the next couple of
weeks. We also have an extremely barebones task where we are attempting to
document various methods of measurement which might be helpful:
https://phabricator.wikimedia.org/T255208.
Thanks,
--
Scott Bassett
sbassett(a)wikimedia.org
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics