Re: [Wikitech-l] A potential new way to deal with spambots

11 Feb 2019

      We've been working on unflagged bot detection on my team.  It's far from a
real product integration, but we have shown that it works in practice.  We
tested this in Wikidata, but I don't see a good reason why a similar
strategy wouldn't work for English Wikipedia.
Hall, A., Terveen, L., & Halfaker, A. (2018). Bot Detection in Wikidata
Using Behavioral and Other Informal Cues.
*Proceedings of the ACM on Human-Computer Interaction*, *2*(CSCW), 64.  pdf
https://dl.acm.org/ft_gateway.cfm?id=3274333&type=pdf
In theory, we could get this into ORES if there was strong demand.  As Pine
points out, we'd need to delay some other projects.  For reference, the
next thing on the backlog that I'm looking at is setting article quality
prediction for Swedish Wikipedia.
-Aaron
On Mon, Feb 11, 2019 at 11:19 AM Jonathan Morgan jmorgan@wikimedia.org
wrote:
...
This may be naive, but... isn't the wishlist filling this need? And if not
through a consensus-driven method like the wishlist, how should a WMF team
prioritize which power user tools it needs to focus on?
Or is just a matter of "Yes, wishlist, but more of it"?

Jonathan

On Mon, Feb 11, 2019 at 2:34 AM bawolff bawolff+wn@gmail.com wrote:
...
Sure its certainly a front we can do better on.
I don't think Kasada is a product that's appropriate at this time.
Ignoring
...
the ideological aspect of it being non-free software, there's a lot of
easy
...
things we could and should try first.
However, I'd caution against viewing this as purely a technical problem.
Wikimedia is not like other websites - we have allowable bots. For many
commercial websites, the only good bot is a dead bot. Wikimedia has many
good bots. On enwiki usually they have to be approved, I don't think
that's
...
true on all wikis. We also consider it perfectly ok to do limited testing
of bots before it is approved. We also encourage the creation of
alternative "clients", which from a server perspective looks like a bot.
Unlike other websites where anything non-human is evil, here we need to
ensure our blocking corresponds to social norms of the community. This
may
...
sound not that hard, but I think it complicates botblocking more than is
obvious at first glance.
Second, this sort of thing is something that tends to far through the
cracks at WMF. AFAIK the last time there was a team responsible for admin
tools & anti-abuse was 2013 (
https://www.mediawiki.org/wiki/Admin_tools_development). I believe
(correct
me if I'm wrong) that anti-harrasment team is all about human harassment
and not anti-abuse in this sense. Security is adjacent to this problem,
but
...
traditionally has not considered this problem in scope. Even core tools
like checkuser have been largely ignored by the foundation for many many
years.
I guess this is a long winded way of saying - I think there should be a
team responsible for this sort of stuff at WMF, but there isn't one. I
think there's a lot of rather easy things we can try (Off the top of my
head: Better captchas. More adaptive rate limits that adjust based on how
evilish you look, etc), but they definitely require close involvement
with
...
the community to ensure that we do the actual right thing.
--
Brian
(p.s. Consider this a volunteer hat email)
On Sun, Feb 10, 2019 at 6:06 AM Pine W wiki.pine@gmail.com wrote:
...
To clarify the types of unwelcome bots that we have, here are the ones
that
...
I think are most common:

Spambots

Vandalbots

Unauthorized bots which may be intended to act in good faith but

which
...
...
may cause problems that could probably have been identified during
standard
...
testing in Wikimedia communities which have a relatively well developed
bot
...
approval process. (See
https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.)
Maybe unwelcome bots are not a priority for WMF at the moment, in which
case I could add this subject into a backlog. I am sorry if I sound
grumpy
...
at WMF regarding this subject; this is a problem but I know that there
are
...
millions of problems and I don't expect a different project to be
dropped
...
...
in order to address this one.
While it is a rough analogy, I think that this movie clip helps to
illustrate a problem of bad bots. Although the clip is amusing, I am
not
...
...
amused by unwelcome bots causing problems on ENWP or anywhere else in
the
...
...
Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA
Thanks,
Pine
( https://meta.wikimedia.org/wiki/User:Pine )
On Sat, Feb 9, 2019, 1:40 PM Pine W <wiki.pine@gmail.com wrote:
...
OK. Yesterday I was looking with a few other ENWP people at what I
think
...
...
was a series of edits by either a vandal bot or an inadequately
designed
...
...
and unapproved good faith bot. I read that it made approximately 500
edits
...
before someone who knew enough about ENWP saw what was happening and
did
...
...
something about it. I don't know how many problematic bots we have,
in
...
...
...
addition to vandal bots, but I am confident that they drain a
nontrivial
...
...
amount of time from stewards, admins, and patrollers.
I don't know how much of a priority WMF places on detecting and
stopping
...
...
unwelcome bots, but I think that the question of how to decrease the
numbers and effectiveness of unwelcome bots would be a good topic for
WMF
...
...
to research.
Pine
( https://meta.wikimedia.org/wiki/User:Pine )
On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza gtisza@wikimedia.org
wrote:
...
...
...
On Fri, Feb 8, 2019 at 6:20 PM Pine W wiki.pine@gmail.com wrote:
...
I don't know how practical it would be to implement an approach
like
...
...
...
...
this
...
in the Wikiverse, and whether licensing proprietary technology
would
...
...
be
...
...
...
required.
They are talking about Polyform [1], a reverse proxy that filters
traffic
...
...
with a combination of browser fingerprinting, behavior analysis and
proof
...
...
of work.
Proof of work is not really useful unless you have huge levels of
bot
...
...
...
...
traffic from a single bot operator (also it means locking out users
with
...
...
...
no
Javascript); browser and behavior analysis very likely cannot be
outsourced
to a third party for privacy reasons. Maybe we could do it ourselves
(although it would still bring up interesting questions
privacy-wise)
...
...
but
...
...
it would be a huge undertaking.
[1] https://www.kasada.io/product/
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- 

Aaron Halfaker

Principal Research Scientist

Head of the Scoring Platform team
Wikimedia Foundation

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] A potential new way to deal with spambots