A potential new way to deal with spambots

List overview All Threads
Download

newer

older

Announcing the Docker Special...

GSoC' 19 project idea ( proposal)

Pine W

9 Feb 2019 9 Feb '19

3:19 a.m.

This sounds like an interesting potential approach to deal with spambots, and hopefully to deter the people who make them. https://techcrunch.com/2019/02/05/kasada-bots/ I don't know how practical it would be to implement an approach like this in the Wikiverse, and whether licensing proprietary technology would be required. I would be interested in decreasing the quantity and effectiveness of spambots that misuse WMF infrastructure, damage the quality of Wikimedia content, and drain significant cumulative time from the limited supply of good faith contributors. Pine ( https://meta.wikimedia.org/wiki/User:Pine )

Show replies by date

Gergo Tisza

9 Feb 9 Feb

10:23 p.m.

On Fri, Feb 8, 2019 at 6:20 PM Pine W <wiki.pine(a)gmail.com> wrote:

...

I don't know how practical it would be to implement an approach like this in the Wikiverse, and whether licensing proprietary technology would be required.

They are talking about Polyform [1], a reverse proxy that filters traffic with a combination of browser fingerprinting, behavior analysis and proof of work. Proof of work is not really useful unless you have huge levels of bot traffic from a single bot operator (also it means locking out users with no Javascript); browser and behavior analysis very likely cannot be outsourced to a third party for privacy reasons. Maybe we could do it ourselves (although it would still bring up interesting questions privacy-wise) but it would be a huge undertaking. [1] https://www.kasada.io/product/

Pine W

10:40 p.m.

OK. Yesterday I was looking with a few other ENWP people at what I think was a series of edits by either a vandal bot or an inadequately designed and unapproved good faith bot. I read that it made approximately 500 edits before someone who knew enough about ENWP saw what was happening and did something about it. I don't know how many problematic bots we have, in addition to vandal bots, but I am confident that they drain a nontrivial amount of time from stewards, admins, and patrollers. I don't know how much of a priority WMF places on detecting and stopping unwelcome bots, but I think that the question of how to decrease the numbers and effectiveness of unwelcome bots would be a good topic for WMF to research. Pine ( https://meta.wikimedia.org/wiki/User:Pine ) On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <gtisza(a)wikimedia.org> wrote:

...

On Fri, Feb 8, 2019 at 6:20 PM Pine W <wiki.pine(a)gmail.com> wrote:

I don't know how practical it would be to implement an approach like this in the Wikiverse, and whether licensing proprietary technology would be required.

Pine W

10 Feb 10 Feb

7:06 a.m.

To clarify the types of unwelcome bots that we have, here are the ones that I think are most common: 1) Spambots 2) Vandalbots 3) Unauthorized bots which may be intended to act in good faith but which may cause problems that could probably have been identified during standard testing in Wikimedia communities which have a relatively well developed bot approval process. (See https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.) Maybe unwelcome bots are not a priority for WMF at the moment, in which case I could add this subject into a backlog. I am sorry if I sound grumpy at WMF regarding this subject; this is a problem but I know that there are millions of problems and I don't expect a different project to be dropped in order to address this one. While it is a rough analogy, I think that this movie clip helps to illustrate a problem of bad bots. Although the clip is amusing, I am not amused by unwelcome bots causing problems on ENWP or anywhere else in the Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA Thanks, Pine ( https://meta.wikimedia.org/wiki/User:Pine ) On Sat, Feb 9, 2019, 1:40 PM Pine W <wiki.pine(a)gmail.com wrote:

...

On Fri, Feb 8, 2019 at 6:20 PM Pine W <wiki.pine(a)gmail.com> wrote:

I don't know how practical it would be to implement an approach like

this

in the Wikiverse, and whether licensing proprietary technology would be required.

bawolff

11 Feb 11 Feb

11:33 a.m.

Sure its certainly a front we can do better on. I don't think Kasada is a product that's appropriate at this time. Ignoring the ideological aspect of it being non-free software, there's a lot of easy things we could and should try first. However, I'd caution against viewing this as purely a technical problem. Wikimedia is not like other websites - we have allowable bots. For many commercial websites, the only good bot is a dead bot. Wikimedia has many good bots. On enwiki usually they have to be approved, I don't think that's true on all wikis. We also consider it perfectly ok to do limited testing of bots before it is approved. We also encourage the creation of alternative "clients", which from a server perspective looks like a bot. Unlike other websites where anything non-human is evil, here we need to ensure our blocking corresponds to social norms of the community. This may sound not that hard, but I think it complicates botblocking more than is obvious at first glance. Second, this sort of thing is something that tends to far through the cracks at WMF. AFAIK the last time there was a team responsible for admin tools & anti-abuse was 2013 ( https://www.mediawiki.org/wiki/Admin_tools_development). I believe (correct me if I'm wrong) that anti-harrasment team is all about human harassment and not anti-abuse in this sense. Security is adjacent to this problem, but traditionally has not considered this problem in scope. Even core tools like checkuser have been largely ignored by the foundation for many many years. I guess this is a long winded way of saying - I think there should be a team responsible for this sort of stuff at WMF, but there isn't one. I think there's a lot of rather easy things we can try (Off the top of my head: Better captchas. More adaptive rate limits that adjust based on how evilish you look, etc), but they definitely require close involvement with the community to ensure that we do the actual right thing. -- Brian (p.s. Consider this a volunteer hat email) On Sun, Feb 10, 2019 at 6:06 AM Pine W <wiki.pine(a)gmail.com> wrote:

...

edits

before someone who knew enough about ENWP saw what was happening and did something about it. I don't know how many problematic bots we have, in addition to vandal bots, but I am confident that they drain a nontrivial amount of time from stewards, admins, and patrollers. I don't know how much of a priority WMF places on detecting and stopping unwelcome bots, but I think that the question of how to decrease the numbers and effectiveness of unwelcome bots would be a good topic for WMF to research. Pine ( https://meta.wikimedia.org/wiki/User:Pine ) On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <gtisza(a)wikimedia.org> wrote: > On Fri, Feb 8, 2019 at 6:20 PM Pine W <wiki.pine(a)gmail.com> wrote: > > > I don't know how practical it would be to implement an approach like > this > > in the Wikiverse, and whether licensing proprietary technology would

> > required. > > > > They are talking about Polyform [1], a reverse proxy that filters

traffic

> with a combination of browser fingerprinting, behavior analysis and

proof

> of work. > Proof of work is not really useful unless you have huge levels of bot > traffic from a single bot operator (also it means locking out users with > no > Javascript); browser and behavior analysis very likely cannot be > outsourced > to a third party for privacy reasons. Maybe we could do it ourselves > (although it would still bring up interesting questions privacy-wise)

but

it would be a huge undertaking. [1] https://www.kasada.io/product/ _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Jonathan Morgan

6:18 p.m.

This may be naive, but... isn't the wishlist filling this need? And if not through a consensus-driven method like the wishlist, how should a WMF team prioritize which power user tools it needs to focus on? Or is just a matter of "Yes, wishlist, but more of it"? - Jonathan On Mon, Feb 11, 2019 at 2:34 AM bawolff <bawolff+wn(a)gmail.com> wrote:

...

To clarify the types of unwelcome bots that we have, here are the ones

that

I think are most common: 1) Spambots 2) Vandalbots 3) Unauthorized bots which may be intended to act in good faith but which may cause problems that could probably have been identified during

standard

testing in Wikimedia communities which have a relatively well developed

bot

approval process. (See https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.) Maybe unwelcome bots are not a priority for WMF at the moment, in which case I could add this subject into a backlog. I am sorry if I sound

grumpy

at WMF regarding this subject; this is a problem but I know that there

are

millions of problems and I don't expect a different project to be dropped in order to address this one. While it is a rough analogy, I think that this movie clip helps to illustrate a problem of bad bots. Although the clip is amusing, I am not amused by unwelcome bots causing problems on ENWP or anywhere else in the Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA Thanks, Pine ( https://meta.wikimedia.org/wiki/User:Pine ) On Sat, Feb 9, 2019, 1:40 PM Pine W <wiki.pine(a)gmail.com wrote: > OK. Yesterday I was looking with a few other ENWP people at what I

think

> was a series of edits by either a vandal bot or an inadequately

designed

and unapproved good faith bot. I read that it made approximately 500

edits > before someone who knew enough about ENWP saw what was happening and

did

> something about it. I don't know how many problematic bots we have, in > addition to vandal bots, but I am confident that they drain a

nontrivial

> amount of time from stewards, admins, and patrollers. > > I don't know how much of a priority WMF places on detecting and

stopping

> unwelcome bots, but I think that the question of how to decrease the > numbers and effectiveness of unwelcome bots would be a good topic for

WMF

> to research. > > Pine > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <gtisza(a)wikimedia.org>

wrote:

> On Fri, Feb 8, 2019 at 6:20 PM Pine W <wiki.pine(a)gmail.com> wrote: > > > I don't know how practical it would be to implement an approach like > this > > in the Wikiverse, and whether licensing proprietary technology would

> > required. > > > > They are talking about Polyform [1], a reverse proxy that filters

traffic

> with a combination of browser fingerprinting, behavior analysis and

proof >> of work. >> Proof of work is not really useful unless you have huge levels of bot >> traffic from a single bot operator (also it means locking out users

with

> no > Javascript); browser and behavior analysis very likely cannot be > outsourced > to a third party for privacy reasons. Maybe we could do it ourselves > (although it would still bring up interesting questions privacy-wise)

but

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>

Aaron Halfaker

6:46 p.m.

We've been working on unflagged bot detection on my team. It's far from a real product integration, but we have shown that it works in practice. We tested this in Wikidata, but I don't see a good reason why a similar strategy wouldn't work for English Wikipedia. Hall, A., Terveen, L., & Halfaker, A. (2018). Bot Detection in Wikidata Using Behavioral and Other Informal Cues. *Proceedings of the ACM on Human-Computer Interaction*, *2*(CSCW), 64. pdf <https://dl.acm.org/ft_gateway.cfm?id=3274333&type=pdf> In theory, we could get this into ORES if there was strong demand. As Pine points out, we'd need to delay some other projects. For reference, the next thing on the backlog that I'm looking at is setting article quality prediction for Swedish Wikipedia. -Aaron On Mon, Feb 11, 2019 at 11:19 AM Jonathan Morgan <jmorgan(a)wikimedia.org> wrote:

...

Sure its certainly a front we can do better on. I don't think Kasada is a product that's appropriate at this time.

Ignoring

the ideological aspect of it being non-free software, there's a lot of

easy

things we could and should try first. However, I'd caution against viewing this as purely a technical problem. Wikimedia is not like other websites - we have allowable bots. For many commercial websites, the only good bot is a dead bot. Wikimedia has many good bots. On enwiki usually they have to be approved, I don't think

that's

true on all wikis. We also consider it perfectly ok to do limited testing of bots before it is approved. We also encourage the creation of alternative "clients", which from a server perspective looks like a bot. Unlike other websites where anything non-human is evil, here we need to ensure our blocking corresponds to social norms of the community. This

may

sound not that hard, but I think it complicates botblocking more than is obvious at first glance. Second, this sort of thing is something that tends to far through the cracks at WMF. AFAIK the last time there was a team responsible for admin tools & anti-abuse was 2013 ( https://www.mediawiki.org/wiki/Admin_tools_development). I believe (correct me if I'm wrong) that anti-harrasment team is all about human harassment and not anti-abuse in this sense. Security is adjacent to this problem,

but

traditionally has not considered this problem in scope. Even core tools like checkuser have been largely ignored by the foundation for many many years. I guess this is a long winded way of saying - I think there should be a team responsible for this sort of stuff at WMF, but there isn't one. I think there's a lot of rather easy things we can try (Off the top of my head: Better captchas. More adaptive rate limits that adjust based on how evilish you look, etc), but they definitely require close involvement

with

the community to ensure that we do the actual right thing. -- Brian (p.s. Consider this a volunteer hat email) On Sun, Feb 10, 2019 at 6:06 AM Pine W <wiki.pine(a)gmail.com> wrote:

To clarify the types of unwelcome bots that we have, here are the ones

that > I think are most common: > > 1) Spambots > > 2) Vandalbots > > 3) Unauthorized bots which may be intended to act in good faith but

which

may cause problems that could probably have been identified during

standard

testing in Wikimedia communities which have a relatively well developed

bot

grumpy

at WMF regarding this subject; this is a problem but I know that there

are > millions of problems and I don't expect a different project to be

dropped

> in order to address this one. > > While it is a rough analogy, I think that this movie clip helps to > illustrate a problem of bad bots. Although the clip is amusing, I am

not

> amused by unwelcome bots causing problems on ENWP or anywhere else in

the

Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA Thanks, Pine ( https://meta.wikimedia.org/wiki/User:Pine ) On Sat, Feb 9, 2019, 1:40 PM Pine W <wiki.pine(a)gmail.com wrote: > OK. Yesterday I was looking with a few other ENWP people at what I

think

> was a series of edits by either a vandal bot or an inadequately

designed

and unapproved good faith bot. I read that it made approximately 500

edits > before someone who knew enough about ENWP saw what was happening and

did > > something about it. I don't know how many problematic bots we have,

> addition to vandal bots, but I am confident that they drain a

nontrivial

> amount of time from stewards, admins, and patrollers. > > I don't know how much of a priority WMF places on detecting and

stopping

> unwelcome bots, but I think that the question of how to decrease the > numbers and effectiveness of unwelcome bots would be a good topic for

WMF

> to research. > > Pine > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <gtisza(a)wikimedia.org>

wrote: > > > >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <wiki.pine(a)gmail.com> wrote: > >> > >> > I don't know how practical it would be to implement an approach

> >> this > >> > in the Wikiverse, and whether licensing proprietary technology

would

> be > >> > required. > >> > > >> > >> They are talking about Polyform [1], a reverse proxy that filters > traffic > >> with a combination of browser fingerprinting, behavior analysis and > proof > >> of work. > >> Proof of work is not really useful unless you have huge levels of

bot

>> traffic from a single bot operator (also it means locking out users

with > >> no > >> Javascript); browser and behavior analysis very likely cannot be > >> outsourced > >> to a third party for privacy reasons. Maybe we could do it ourselves > >> (although it would still bring up interesting questions

privacy-wise)

but

> it would be a huge undertaking. > > > [1] https://www.kasada.io/product/ > _______________________________________________ > Wikitech-l mailing list > Wikitech-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Aaron Halfaker Principal Research Scientist Head of the Scoring Platform team Wikimedia Foundation

John Erling Blad

13 Feb 13 Feb

1:52 p.m.

It is extremely easy to detect a bot unless the bot operator chose to make it hard. Just make a model for how the user interacts with the input devices, and do anomaly detection. That imply use of Javascript though, but users not using JS are either very dubious or quite well-known. There are nearly no new users that does not use JS. Reused a previous tex-file, and did not clean it up? "Magnetic Normal Modes of Bi-Component Permalloy Structures" ;) On Mon, Feb 11, 2019 at 6:47 PM Aaron Halfaker <ahalfaker(a)wikimedia.org> wrote:

...

pdf

...

<https://dl.acm.org/ft_gateway.cfm?id=3274333&type=pdf> In theory, we could get this into ORES if there was strong demand. As

Pine

...

points out, we'd need to delay some other projects. For reference, the next thing on the backlog that I'm looking at is setting article quality prediction for Swedish Wikipedia. -Aaron On Mon, Feb 11, 2019 at 11:19 AM Jonathan Morgan <jmorgan(a)wikimedia.org> wrote: > This may be naive, but... isn't the wishlist filling this need? And if

not

...

> through a consensus-driven method like the wishlist, how should a WMF

team

...

> prioritize which power user tools it needs to focus on? > > Or is just a matter of "Yes, wishlist, but more of it"? > > - Jonathan > > On Mon, Feb 11, 2019 at 2:34 AM bawolff <bawolff+wn(a)gmail.com> wrote: > > > Sure its certainly a front we can do better on. > > > > I don't think Kasada is a product that's appropriate at this time. > Ignoring > > the ideological aspect of it being non-free software, there's a lot of > easy > > things we could and should try first. > > > > However, I'd caution against viewing this as purely a technical

problem.

...

> > Wikimedia is not like other websites - we have allowable bots. For

many

...

> > commercial websites, the only good bot is a dead bot. Wikimedia has

many

...

> > good bots. On enwiki usually they have to be approved, I don't think > that's > > true on all wikis. We also consider it perfectly ok to do limited

testing

...

> > of bots before it is approved. We also encourage the creation of > > alternative "clients", which from a server perspective looks like a

bot.

...

> > Unlike other websites where anything non-human is evil, here we need

...

> > ensure our blocking corresponds to social norms of the community. This > may > > sound not that hard, but I think it complicates botblocking more than

...

> > obvious at first glance. > > > > Second, this sort of thing is something that tends to far through the > > cracks at WMF. AFAIK the last time there was a team responsible for

admin

...

> > tools & anti-abuse was 2013 ( > > https://www.mediawiki.org/wiki/Admin_tools_development). I believe > > (correct > > me if I'm wrong) that anti-harrasment team is all about human

harassment

...

> > and not anti-abuse in this sense. Security is adjacent to this

problem,

...

> but > > traditionally has not considered this problem in scope. Even core

tools

...

> > like checkuser have been largely ignored by the foundation for many

many

...

> > years. > > > > I guess this is a long winded way of saying - I think there should be

...

> > team responsible for this sort of stuff at WMF, but there isn't one. I > > think there's a lot of rather easy things we can try (Off the top of

...

> > head: Better captchas. More adaptive rate limits that adjust based on

how

...

> > evilish you look, etc), but they definitely require close involvement > with > > the community to ensure that we do the actual right thing. > > > > -- > > Brian > > (p.s. Consider this a volunteer hat email) > > > > On Sun, Feb 10, 2019 at 6:06 AM Pine W <wiki.pine(a)gmail.com> wrote: > > > > > To clarify the types of unwelcome bots that we have, here are the

ones

...

> > that > > > I think are most common: > > > > > > 1) Spambots > > > > > > 2) Vandalbots > > > > > > 3) Unauthorized bots which may be intended to act in good faith but > which > > > may cause problems that could probably have been identified during > > standard > > > testing in Wikimedia communities which have a relatively well

developed

...

> > bot > > > approval process. (See > > > https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.) > > > > > > Maybe unwelcome bots are not a priority for WMF at the moment, in

which

...

> > > case I could add this subject into a backlog. I am sorry if I sound > > grumpy > > > at WMF regarding this subject; this is a problem but I know that

there

...

> > are > > > millions of problems and I don't expect a different project to be > dropped > > > in order to address this one. > > > > > > While it is a rough analogy, I think that this movie clip helps to > > > illustrate a problem of bad bots. Although the clip is amusing, I am > not > > > amused by unwelcome bots causing problems on ENWP or anywhere else

...

> the > > > Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA > > > > > > Thanks, > > > > > > Pine > > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > > > > > On Sat, Feb 9, 2019, 1:40 PM Pine W <wiki.pine(a)gmail.com wrote: > > > > > > > OK. Yesterday I was looking with a few other ENWP people at what I > > think > > > > was a series of edits by either a vandal bot or an inadequately > > designed > > > > and unapproved good faith bot. I read that it made approximately

500

...

> > > edits > > > > before someone who knew enough about ENWP saw what was happening

and

...

> > did > > > > something about it. I don't know how many problematic bots we

have,

...

> in > > > > addition to vandal bots, but I am confident that they drain a > > nontrivial > > > > amount of time from stewards, admins, and patrollers. > > > > > > > > I don't know how much of a priority WMF places on detecting and > > stopping > > > > unwelcome bots, but I think that the question of how to decrease

the

...

> > > > numbers and effectiveness of unwelcome bots would be a good topic

for

...

> > WMF > > > > to research. > > > > > > > > Pine > > > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > > > > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <gtisza(a)wikimedia.org> > > wrote: > > > > > > > >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <wiki.pine(a)gmail.com>

wrote:

...

> > > >> > > > >> > I don't know how practical it would be to implement an approach > like > > > >> this > > > >> > in the Wikiverse, and whether licensing proprietary technology > would > > > be > > > >> > required. > > > >> > > > > >> > > > >> They are talking about Polyform [1], a reverse proxy that filters > > > traffic > > > >> with a combination of browser fingerprinting, behavior analysis

and

...

> > > proof > > > >> of work. > > > >> Proof of work is not really useful unless you have huge levels of > bot > > > >> traffic from a single bot operator (also it means locking out

users

...

> > with > > > >> no > > > >> Javascript); browser and behavior analysis very likely cannot be > > > >> outsourced > > > >> to a third party for privacy reasons. Maybe we could do it

ourselves

...

> >> (although it would still bring up interesting questions

privacy-wise)

but >> it would be a huge undertaking. >> >> >> [1] https://www.kasada.io/product/ >> _______________________________________________ >> Wikitech-l mailing list >> Wikitech-l(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> > _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Aaron Halfaker Principal Research Scientist Head of the Scoring Platform team Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Yongmin H.

11 Feb 11 Feb

7:08 p.m.

Stewards are just 34 people and are not enough to be a big voting power at the wishlist like enwiki people. What we actually need cannot get it thru that way. -- Yongmin Sent from my iPhone Text licensed under CC BY ND 2.0 KR Please note that this address is list-only address and any non-mailing list mails will be treated as spam. Please use https://encrypt.to/0x947f156f16250de39788c3c35b625da5beff197a 2019. 2. 12. 02:18, Jonathan Morgan <jmorgan(a)wikimedia.org> 작성:

...

On Mon, Feb 11, 2019 at 2:34 AM bawolff <bawolff+wn(a)gmail.com> wrote: Sure its certainly a front we can do better on. I don't think Kasada is a product that's appropriate at this time. Ignoring the ideological aspect of it being non-free software, there's a lot of easy things we could and should try first. However, I'd caution against viewing this as purely a technical problem. Wikimedia is not like other websites - we have allowable bots. For many commercial websites, the only good bot is a dead bot. Wikimedia has many good bots. On enwiki usually they have to be approved, I don't think that's true on all wikis. We also consider it perfectly ok to do limited testing of bots before it is approved. We also encourage the creation of alternative "clients", which from a server perspective looks like a bot. Unlike other websites where anything non-human is evil, here we need to ensure our blocking corresponds to social norms of the community. This may sound not that hard, but I think it complicates botblocking more than is obvious at first glance. Second, this sort of thing is something that tends to far through the cracks at WMF. AFAIK the last time there was a team responsible for admin tools & anti-abuse was 2013 ( https://www.mediawiki.org/wiki/Admin_tools_development). I believe (correct me if I'm wrong) that anti-harrasment team is all about human harassment and not anti-abuse in this sense. Security is adjacent to this problem, but traditionally has not considered this problem in scope. Even core tools like checkuser have been largely ignored by the foundation for many many years. I guess this is a long winded way of saying - I think there should be a team responsible for this sort of stuff at WMF, but there isn't one. I think there's a lot of rather easy things we can try (Off the top of my head: Better captchas. More adaptive rate limits that adjust based on how evilish you look, etc), but they definitely require close involvement with the community to ensure that we do the actual right thing. -- Brian (p.s. Consider this a volunteer hat email)

On Sun, Feb 10, 2019 at 6:06 AM Pine W <wiki.pine(a)gmail.com> wrote: To clarify the types of unwelcome bots that we have, here are the ones

that

I think are most common: 1) Spambots 2) Vandalbots 3) Unauthorized bots which may be intended to act in good faith but which may cause problems that could probably have been identified during

standard

testing in Wikimedia communities which have a relatively well developed

bot

grumpy

at WMF regarding this subject; this is a problem but I know that there

are

millions of problems and I don't expect a different project to be dropped in order to address this one. While it is a rough analogy, I think that this movie clip helps to illustrate a problem of bad bots. Although the clip is amusing, I am not amused by unwelcome bots causing problems on ENWP or anywhere else in the Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA Thanks, Pine ( https://meta.wikimedia.org/wiki/User:Pine ) > On Sat, Feb 9, 2019, 1:40 PM Pine W <wiki.pine(a)gmail.com wrote: > > OK. Yesterday I was looking with a few other ENWP people at what I

think

> was a series of edits by either a vandal bot or an inadequately

designed

and unapproved good faith bot. I read that it made approximately 500

edits > before someone who knew enough about ENWP saw what was happening and

did

> something about it. I don't know how many problematic bots we have, in > addition to vandal bots, but I am confident that they drain a

nontrivial

> amount of time from stewards, admins, and patrollers. > > I don't know how much of a priority WMF places on detecting and

stopping

> unwelcome bots, but I think that the question of how to decrease the > numbers and effectiveness of unwelcome bots would be a good topic for

WMF

> to research. > > Pine > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <gtisza(a)wikimedia.org>

wrote:

>> On Fri, Feb 8, 2019 at 6:20 PM Pine W <wiki.pine(a)gmail.com> wrote: >> >> I don't know how practical it would be to implement an approach like > this >> in the Wikiverse, and whether licensing proprietary technology would

>> required. >> > > They are talking about Polyform [1], a reverse proxy that filters

traffic

> with a combination of browser fingerprinting, behavior analysis and

proof >> of work. >> Proof of work is not really useful unless you have huge levels of bot >> traffic from a single bot operator (also it means locking out users

with

but

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Pine W

11:18 p.m.

Thanks for the replies. I think that detailed discussion of the pros and cons of the Tech Wishlist should be separate from this thread, but I agree that one way to get a subject like unflagged bot detection addressed could be through the Tech Wishlist assuming that WMF is willing to devote resources to that topic if it ranked in the top X places. It sounds like there are a few different ways that work in this area could be resourced: 1. As mentioned above, making it be a tech wishlist item and having Community Tech work on it; 2. Having the Anti-Harrassment Tools team work on it; 3. Having the Security team work on it; 4. Having the ORES team work on it; 5. Funding work through a WMF grants program; 6. Funding through a mentorship program like GSOC. I believe that GSOC previously supported work on CAPTCHA improvements. Of the above options I suggest first considering 2 and 4. Having AHAT staff work on unflagged bot detection might be scope creep under the existing AHAT charter but perhaps AHAT's charter could be modified into something that would resemble the charter for an "Administrators' Tools Team". And if the ORES team has already done some work on unflagged bot detection then perhaps ORES and AHAT staff could collaborate on this topic. In the first half of the next WMF fiscal year, I think that planning for an existing WMF team or combination of staff from existing teams to work on unflagged bot detection would be good. If WMF does not resource this topic, then if community people want unflagged bot detection be resourced, we can consider other options such as 1 and 5. Pine ( https://meta.wikimedia.org/wiki/User:Pine )

David Barratt

12 Feb 12 Feb

6:48 a.m.

http://gph.is/2lnp32Z On Mon, Feb 11, 2019 at 5:19 PM Pine W <wiki.pine(a)gmail.com> wrote:

...

Pine W

8:48 p.m.

Hi David, do you have a question? I saw the GIF but I don't know how to interpret it in the context of this conversation. Thanks, Pine ( https://meta.wikimedia.org/wiki/User:Pine ) On Tue, Feb 12, 2019 at 5:49 AM David Barratt <dbarratt(a)wikimedia.org> wrote:

...

http://gph.is/2lnp32Z

Jonathan Morgan

9:32 p.m.

Couple thoughts: 1. ORES platform (ores.wikimedia.org) was designed to host a wide range of machine learning models, not just the ones built by Aaron Halfaker himself. So, if there is a computer scientist out there who is interested in training and maintaining a new bot-detection model, it can be hosted on and surfaced through ORES. Then anyone with some bot- or web-development skills can build tools on top of that model. Noting this because that's one of the main points of having a "scoring platform": it separates the (necessarily WMF-led) work of production platform development from the development of purpose-built tools. 2. If anyone knows a computer scientist who is interested in developing and piloting a model like this please send them our way. Members of the Research team, or Aaron, *may* have capacity to support a formal collaboration 3. This seems way too complex for a GSOC project to me, but I'd love to be wrong about that. If there are students who are interested in working on this, please send them our way (no promises, obvs). 4. Modifying the charter of an existing WMF product team seems somewhat out of scope for this ask, task, and venue. :) - J On Mon, Feb 11, 2019 at 2:19 PM Pine W <wiki.pine(a)gmail.com> wrote:

...

-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>

bawolff

8:34 a.m.

The tech wishlist is awesome, and they do a lot of great work. However, I don't think this type of democratic-driven development is appropriate for all things. If it were we would just get rid of all the other dev teams and just have a wish-list. In this case what is needed is an anti-abuse strategy, not just a one-off feature. This involves development of many features over the long term, maintenance, long-term product management, integration into the whole etc. Even in real life, nobody ever votes for maintenance until its way too late and everything is about to explode. Not to mention the product research aspect of it - wishlist inherently encourages people to think inside the box as it is basically asking the question of what's wrong with the current box. You can't vote for something if you don't realize its a choice. As other's have mentioned, majority rules is also sometimes not the appropriate way to choose what to do. Sometimes there are things that only affect a minority, but its an important minority. Sometimes there are things that affect everyone slightly and they win over things that affect a small class significantly (Of course both types of things are important). Sometimes there are things that are long term important but short term unimportant [Not saying that people can't vote rationally for long term tasks, just that the wishlist is mostly developed around the idea of short term tasks, short enough you can do about 10 of them in a year]. -- Brian On Mon, Feb 11, 2019 at 5:18 PM Jonathan Morgan <jmorgan(a)wikimedia.org> wrote:

...

To clarify the types of unwelcome bots that we have, here are the ones

that

I think are most common: 1) Spambots 2) Vandalbots 3) Unauthorized bots which may be intended to act in good faith but

which

may cause problems that could probably have been identified during

standard

testing in Wikimedia communities which have a relatively well developed

bot

grumpy

at WMF regarding this subject; this is a problem but I know that there

are

millions of problems and I don't expect a different project to be

dropped

in order to address this one. While it is a rough analogy, I think that this movie clip helps to illustrate a problem of bad bots. Although the clip is amusing, I am not amused by unwelcome bots causing problems on ENWP or anywhere else in

the

think

> was a series of edits by either a vandal bot or an inadequately

designed

and unapproved good faith bot. I read that it made approximately 500

edits > before someone who knew enough about ENWP saw what was happening and

did

> something about it. I don't know how many problematic bots we have, in > addition to vandal bots, but I am confident that they drain a

nontrivial

> amount of time from stewards, admins, and patrollers. > > I don't know how much of a priority WMF places on detecting and

stopping

> unwelcome bots, but I think that the question of how to decrease the > numbers and effectiveness of unwelcome bots would be a good topic for

WMF

> to research. > > Pine > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <gtisza(a)wikimedia.org>

wrote:

> >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <wiki.pine(a)gmail.com> wrote: >> >> > I don't know how practical it would be to implement an approach

>> this >> > in the Wikiverse, and whether licensing proprietary technology

would

> > required. > > > > They are talking about Polyform [1], a reverse proxy that filters

traffic

> with a combination of browser fingerprinting, behavior analysis and

proof >> of work. >> Proof of work is not really useful unless you have huge levels of bot >> traffic from a single bot operator (also it means locking out users

with

but

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>

Pine W

9:35 p.m.

Since we're discussing how the Tech Wishlist works then I will comment on a few points specifically regarding that wishlist. 1. A gentle correction: the recommendations are ranked by vote, not by consensus. This has pros and cons. 2a. If memory serves me correctly, the wishlist process was designed by WMF rather than designed by community consensus. I may be wrong about this, but in my search of historical records I have not found evidence to the contrary. I think that redesigning the process would be worth considering, and I hope that a redesign would help to account for the types of needs that bawolff described in his second paragraph. 2b.. I think that it's an overstatement to say that "nobody ever votes for maintenance until its way too late and everything is about to explode". I think that many non-WMF people are aware of our backlogs, the endless requests for help and conflict resolution, and the many challenges of maintaining what we have with the current population of skilled and good faith non-WMF people. However, I have the impression that there is a common *tendency* among humans in general to chase shiny new features instead of doing mostly thankless work, and I agree that the tech wishlist is unlikely even in a redesigned form to be well suited for long term planning. I think that WMF's strategy process may be a better way to plan for the long term, including for maintenance activities that are mostly thankless and do not necessarily correlate with increasing someone's personal power, making their resume look better, or having fun. Fortunately the volunteer mentality of many non-WMF people means that we do have people who are willing to do mostly thankless, mundane, and/or stressful work, and I think that some of us feel that our work is important for maintaining the encyclopedia even when we do not enjoy it, but we have a finite supply of time from such people. Pine ( https://meta.wikimedia.org/wiki/User:Pine )

bawolff

13 Feb 13 Feb

1:13 p.m.

I actually meant a different type of maintenance. Maintaining the encyclopedia (and other wiki projects) is of course an activity that needs software support. But software is also something that needs maintenance. Technology, standards, circumstances change over time. Software left alone will "bitrot" over time. A long term technical strategy to do anything needs to account for that, plan for that. One off feature development does not. Democratically directed one-off feature development accounts for that even less. In response to Johnathan: So lets say that ORES/magic AI detects something is a bot. Then what? That's a small part of the picture. In fact you don't even need AI to do this, plenty of the vandal bots have generic programming language user-agents (AI could of course be useful for long-tail here, but there's much simpler stuff to start off with). Do we expose this to abusefilter somehow? Do we add a tag to mark it in RC/watchlist? Do we block it? Do we rate limit it? What amount of false positives are acceptable? What is the UI for all this? To what extent is this hard coded, and to what extent do communities control the feature? etc We don't need products to detect bots. Making products to detect bots is easy. We need product managers to come up with socio-technical systems that make sense in our special context. -- Brian On Tue, Feb 12, 2019 at 8:36 PM Pine W <wiki.pine(a)gmail.com> wrote:

...

Jonathan Morgan

5:26 p.m.

Brian, I think we may be talking past each other. I'm Mr. Socio-technical systems. I thought what was being requested was a way to detect bots. I maintain my own bots, work extensively with product teams, and have a deep and abiding familiarity with the complexity of designing effective tools for WIkipedia. - J On Wed, Feb 13, 2019 at 4:14 AM bawolff <bawolff+wn(a)gmail.com> wrote:

...

Since we're discussing how the Tech Wishlist works then I will comment

on a

few points specifically regarding that wishlist. 1. A gentle correction: the recommendations are ranked by vote, not by consensus. This has pros and cons. 2a. If memory serves me correctly, the wishlist process was designed by

WMF

rather than designed by community consensus. I may be wrong about this,

but

in my search of historical records I have not found evidence to the contrary. I think that redesigning the process would be worth

considering,

and I hope that a redesign would help to account for the types of needs that bawolff described in his second paragraph. 2b.. I think that it's an overstatement to say that "nobody ever votes

for

maintenance until its way too late and everything is about to explode". I think that many non-WMF people are aware of our backlogs, the endless requests for help and conflict resolution, and the many challenges of maintaining what we have with the current population of skilled and good faith non-WMF people. However, I have the impression that there is a

common

*tendency* among humans in general to chase shiny new features instead of doing mostly thankless work, and I agree that the tech wishlist is

unlikely

even in a redesigned form to be well suited for long term planning. I

think

that WMF's strategy process may be a better way to plan for the long

term,

including for maintenance activities that are mostly thankless and do not necessarily correlate with increasing someone's personal power, making their resume look better, or having fun. Fortunately the volunteer mentality of many non-WMF people means that we do have people who are willing to do mostly thankless, mundane, and/or stressful work, and I

think

that some of us feel that our work is important for maintaining the encyclopedia even when we do not enjoy it, but we have a finite supply of time from such people. Pine ( https://meta.wikimedia.org/wiki/User:Pine ) _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>

Pine W

14 Feb 14 Feb

7:59 a.m.

On Wed, Feb 13, 2019 at 12:13 PM bawolff <bawolff+wn(a)gmail.com> wrote:

...

I understand. I was intending to comment on maintenance activities in general, whether that be maintenance of a city's water system, maintenance of the text of encyclopedia articles, or maintenance of software. My train of thought proceeded into a somewhat detailed commentary detail regarding maintenance of non-software Wikimedia elements. I think that the tendency to under-resource maintenance in favor of novelties is similar in many domains of human activity, but I also think that humans collectively are not so unwise that we will prefer novelties over maintenance every time that there is a referendum on whether to maintain an existing service or to create something new. {{Citation needed}} I think that multiple good points have been raised in this thread regarding the subjects of technical and human systems for detecting and intervening against possible unflagged bots. I am wondering what a good way would be to get a WMF product manager or someone similar to dedicate time to this topic. My preference remains that one or more WMF people, or teams, add this to their list of topics to address in a future quarter such as Q1 of the WMF 2019-2020 fiscal year. I don't know how the WMF Community Tech team plans for maintenance of features after the features are initially built, debugged, and deployed, and based on the current state of this discussion I don't currently have a strong opinion regarding whether Community Tech or a different team would be best suited to work on the topic of unflagged bots. I also don't know how WMF makes decisions about what goals are for teams other than Community Tech for future quarters, but that information could be helpful to have for this conversation. Thanks, Pine ( https://meta.wikimedia.org/wiki/User:Pine )

1891

days inactive

1896

days old

wikitech-l@lists.wikimedia.org

Manage subscription

17 comments

8 participants

tags (0)

participants (8)

Aaron Halfaker
bawolff
David Barratt
Gergo Tisza
John Erling Blad
Jonathan Morgan
Pine W
Yongmin H.