sockpuppets and how to find them sooner

List overview All Threads
Download

newer

older

gender balance of wikipedia...

Re: [Wiki-research-l] sockpuppets...

Kerry Raymond

23 Aug 2019 23 Aug '19

5:57 a.m.

Currently, to open a sockpuppet investigation, you must name the two (or more) accounts that you believe to be sockpuppets with "clear, behavioural evidence of sock puppetry" which is typically in the form of pairs of edits that demonstrate similar edit behaviours that are unlikely to naturally occur. Now if you spend enough time on-wiki, you develop an intuition about behaviours you see on your watchlist and in article edit histories. Often I am highly suspicious that an account is a sockpuppet, but I cannot report them because I don't know which other account is involved. As a example, I recently encounted User:Shelati an account about 1 day old at that time with nearly 100 edits in that day all about 1-2 minutes apart, mostly making a similar change to a large number of Australian place infoboxes. https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati <https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&of fset=20190728053057&limit=100&target=Shelati> &offset=20190728053057&limit=100&target=Shelati Genuine new users do not edit that quickly, do not use templates and do not mess structurally with infoboxes (at most they try to change the values). It "smelled" like a sockpuppet. However, as I did not recognise that pattern of edit behaviour as being that of any other user I was familiar with, it wasn't something I could report for sockpuppet investigation. Anyhow after about 2 weeks, the user was blocked as a sockpuppet. Someone must have noticed and figured out the other account: https://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations/Meganesia/ Archive Two weeks and 1,279 edits later . that's over 1000 possibly problematic edits after I first suspected them. But that's nothing compared with another ongoing situation in which a very large number of different IPs are engaged in a pattern of problem edits on mostly Australian articles (a few different types of edits but an obvious "quack like a duck" situation). The IP number changes frequently (and one assumes deliberately). The edits potentially go back to 2013 but appear to have intensified in 2018/2019. Here's one user's summary of all the IP addresses involved, and the extent to which they have been cleaned up, given many thousands of edits are involved, see: https://en.wikipedia.org/wiki/User:IamNotU/History_cleanup As well as the damage done to the content (which harms the readers), these IP sockpuppets are consuming enormous amounts of effort to track them down and revert them, which could be more productively used to improve the content. We need better tools to foil these pests. So I want to put that challenge out to this list. Kerry

Show replies by date

RhinosF1

23 Aug 23 Aug

6:43 a.m.

Just a note that you can still go through warnings for vandalism etc. and report to AIV. Or at that edit speed, you may have a chance at AN at reporting for bot-like edits which will draw attention to the account. If you ever need help, things like #wikipedia-en-help on Freenode IRC exist so you can ask other users. RhinosF1 Miraheze Volunteer On Fri, 23 Aug 2019 at 06:57, Kerry Raymond <kerry.raymond(a)gmail.com> wrote:

...

&offset=20190728053057&limit=100&target=Shelati Genuine new users do not edit that quickly, do not use templates and do not mess structurally with infoboxes (at most they try to change the values). It "smelled" like a sockpuppet. However, as I did not recognise that pattern of edit behaviour as being that of any other user I was familiar with, it wasn't something I could report for sockpuppet investigation. Anyhow after about 2 weeks, the user was blocked as a sockpuppet. Someone must have noticed and figured out the other account: https://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations/Meganesia/ Archive Two weeks and 1,279 edits later . that's over 1000 possibly problematic edits after I first suspected them. But that's nothing compared with another ongoing situation in which a very large number of different IPs are engaged in a pattern of problem edits on mostly Australian articles (a few different types of edits but an obvious "quack like a duck" situation). The IP number changes frequently (and one assumes deliberately). The edits potentially go back to 2013 but appear to have intensified in 2018/2019. Here's one user's summary of all the IP addresses involved, and the extent to which they have been cleaned up, given many thousands of edits are involved, see: https://en.wikipedia.org/wiki/User:IamNotU/History_cleanup As well as the damage done to the content (which harms the readers), these IP sockpuppets are consuming enormous amounts of effort to track them down and revert them, which could be more productively used to improve the content. We need better tools to foil these pests. So I want to put that challenge out to this list. Kerry _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- RhinosF1 Miraheze Volunteer

Timothy Wood

1:42 p.m.

You are correct that in all but the most obvious cases, filing an SPI can be exceptionally time consuming. I'm afraid there is no obvious technical solution there that would not involve a complicated AI that is probably beyond the ability of the foundation to produce. There is quite a bit of data available in the form of years of SPIs, but it seems like you're talking about Facebook or Google levels of machine learning, and even years of SPIs is tiny compared to the amount of data they work with. On a separate note, frequently changing IP adresses is most often an indicator of nothing more than someone who is editing on a mobile connection. This can usually be easily verified with an online IP lookup. V/r TJW/GMG On Fri, Aug 23, 2019, 02:44 RhinosF1 <rhinosf1(a)gmail.com> wrote:

...

Currently, to open a sockpuppet investigation, you must name the two (or more) accounts that you believe to be sockpuppets with "clear,

behavioural

evidence of sock puppetry" which is typically in the form of pairs of

edits

that demonstrate similar edit behaviours that are unlikely to naturally occur. Now if you spend enough time on-wiki, you develop an intuition

about

behaviours you see on your watchlist and in article edit histories.

Often I

am highly suspicious that an account is a sockpuppet, but I cannot report them because I don't know which other account is involved. As a example, I recently encounted User:Shelati an account about 1 day

old

at that time with nearly 100 edits in that day all about 1-2 minutes

apart,

mostly making a similar change to a large number of Australian place infoboxes. https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati <

https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&am…

fset=20190728053057&limit=100&target=Shelati <

https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&am…

&offset=20190728053057&limit=100&target=Shelati Genuine new users do not edit that quickly, do not use templates and do

not

mess structurally with infoboxes (at most they try to change the values). It "smelled" like a sockpuppet. However, as I did not recognise that pattern of edit behaviour as being that of any other user I was familiar with, it wasn't something I could report for sockpuppet investigation. Anyhow

after

about 2 weeks, the user was blocked as a sockpuppet. Someone must have noticed and figured out the other account:

https://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations/Meganesia/

Archive Two weeks and 1,279 edits later . that's over 1000 possibly problematic edits after I first suspected them. But that's nothing compared with another ongoing situation in which a very large number of different IPs are

engaged

in a pattern of problem edits on mostly Australian articles (a few different types of edits but an obvious "quack like a duck" situation). The IP

number

changes frequently (and one assumes deliberately). The edits potentially

back to 2013 but appear to have intensified in 2018/2019. Here's one

user's

summary of all the IP addresses involved, and the extent to which they

have

been cleaned up, given many thousands of edits are involved, see: https://en.wikipedia.org/wiki/User:IamNotU/History_cleanup As well as the damage done to the content (which harms the readers),

these

IP sockpuppets are consuming enormous amounts of effort to track them

down

and revert them, which could be more productively used to improve the content. We need better tools to foil these pests. So I want to put that challenge out to this list. Kerry _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- RhinosF1 Miraheze Volunteer _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Kerry Raymond

3:23 p.m.

...

On 23 Aug 2019, at 11:42 pm, Timothy Wood <timothyjosephwood(a)gmail.com> wrote: You are correct that in all but the most obvious cases, filing an SPI can be exceptionally time consuming. I'm afraid there is no obvious technical solution there that would not involve a complicated AI that is probably beyond the ability of the foundation to produce. There is quite a bit of data available in the form of years of SPIs, but it seems like you're talking about Facebook or Google levels of machine learning, and even years of SPIs is tiny compared to the amount of data they work with. On a separate note, frequently changing IP adresses is most often an indicator of nothing more than someone who is editing on a mobile connection. This can usually be easily verified with an online IP lookup. V/r TJW/GMG > On Fri, Aug 23, 2019, 02:44 RhinosF1 <rhinosf1(a)gmail.com> wrote: > Just a note that you can still go through warnings for vandalism etc. and > report to AIV. > > Or at that edit speed, you may have a chance at AN at reporting for > bot-like edits which will draw attention to the account. > > If you ever need help, things like #wikipedia-en-help on Freenode IRC exist > so you can ask other users. > > RhinosF1 > Miraheze Volunteer > > On Fri, 23 Aug 2019 at 06:57, Kerry Raymond <kerry.raymond(a)gmail.com> wrote: > > > Currently, to open a sockpuppet investigation, you must name the two (or > > more) accounts that you believe to be sockpuppets with "clear, behavioural > > evidence of sock puppetry" which is typically in the form of pairs of edits > > that demonstrate similar edit behaviours that are unlikely to naturally > > occur. Now if you spend enough time on-wiki, you develop an intuition about > > behaviours you see on your watchlist and in article edit histories. Often I > > am highly suspicious that an account is a sockpuppet, but I cannot report > > them because I don't know which other account is involved. > > > > > > > > As a example, I recently encounted User:Shelati an account about 1 day old > > at that time with nearly 100 edits in that day all about 1-2 minutes apart, > > mostly making a similar change to a large number of Australian place > > infoboxes. > > > > > > > > https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati > > < > > https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&am… > > fset=20190728053057&limit=100&target=Shelati > > <https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&offset=20190728053057&limit=100&target=Shelati> > > > > > &offset=20190728053057&limit=100&target=Shelati > > > > > > > > Genuine new users do not edit that quickly, do not use templates and do not > > mess structurally with infoboxes (at most they try to change the values). > > It > > "smelled" like a sockpuppet. However, as I did not recognise that pattern > > of > > edit behaviour as being that of any other user I was familiar with, it > > wasn't something I could report for sockpuppet investigation. Anyhow after > > about 2 weeks, the user was blocked as a sockpuppet. Someone must have > > noticed and figured out the other account: > > > > > > > > > > https://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations/Meganesia/ > > Archive > > > > > > > > Two weeks and 1,279 edits later . that's over 1000 possibly problematic > > edits after I first suspected them. But that's nothing compared with > > another > > ongoing situation in which a very large number of different IPs are engaged > > in a pattern of problem edits on mostly Australian articles (a few > > different > > types of edits but an obvious "quack like a duck" situation). The IP number > > changes frequently (and one assumes deliberately). The edits potentially go > > back to 2013 but appear to have intensified in 2018/2019. Here's one user's > > summary of all the IP addresses involved, and the extent to which they have > > been cleaned up, given many thousands of edits are involved, see: > > > > > > > > https://en.wikipedia.org/wiki/User:IamNotU/History_cleanup > > > > > > > > As well as the damage done to the content (which harms the readers), these > > IP sockpuppets are consuming enormous amounts of effort to track them down > > and revert them, which could be more productively used to improve the > > content. We need better tools to foil these pests. So I want to put that > > challenge out to this list. > > > > > > > > Kerry > > > > > > > > > > > > > > > > _______________________________________________ > > Wiki-research-l mailing list > > Wiki-research-l(a)lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > -- > RhinosF1 > Miraheze Volunteer > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Timothy Wood

9:51 p.m.

Then again, apparently the Foundation has a PR team whose only job is to compile the latest marketing buzzwords, and they seem to really love AI. You might get some buy in. Never know. V/r TJW/GMG On Fri, Aug 23, 2019, 11:23 Kerry Raymond <kerry.raymond(a)gmail.com> wrote:

...

That's why I think we need "signatures" which is my shorthand for things like a hash function or a bounding box, a means by which many non-matching accounts can be eliminated at low cost, reserving the high cost comparisons (machine or human) only for high probability candidates. It is machine-computed and *stored* on the banning/blocking of a user. When a suspect user is presented, it calculates their signature and then compares them against the pre-calculated signatures of the bad users. I don't think it is too expensive if we can find the right "signature". CPU cycles are pretty fast. I only have an average laptop CPU-wise but I burn through loads of comparisons of geographic boundaries (complex polygons with many points) thanks to bounding boxes which reduce the complex shape to the smallest rectangle that contains it. Testing intersection of polygons is expensive, testing the intersection of rectangles is trivial. I think we can probably ignore the myriad of trivial bad guys for the purposes of signature collecting, eg blocked for vandalism after their first few edits. Sock puppets or their masters don't immediately appear as bad guys on individual edits. It's often more about long-term behaviours like POV pushing, refusal to engage in consensus building, slow burning edit wars, etc, that does not show on individual edits. Kerry Sent from my iPad On 23 Aug 2019, at 11:42 pm, Timothy Wood <timothyjosephwood(a)gmail.com> wrote: You are correct that in all but the most obvious cases, filing an SPI can be exceptionally time consuming. I'm afraid there is no obvious technical solution there that would not involve a complicated AI that is probably beyond the ability of the foundation to produce. There is quite a bit of data available in the form of years of SPIs, but it seems like you're talking about Facebook or Google levels of machine learning, and even years of SPIs is tiny compared to the amount of data they work with. On a separate note, frequently changing IP adresses is most often an indicator of nothing more than someone who is editing on a mobile connection. This can usually be easily verified with an online IP lookup. V/r TJW/GMG On Fri, Aug 23, 2019, 02:44 RhinosF1 <rhinosf1(a)gmail.com> wrote:

Currently, to open a sockpuppet investigation, you must name the two (or more) accounts that you believe to be sockpuppets with "clear,

behavioural

evidence of sock puppetry" which is typically in the form of pairs of

edits

that demonstrate similar edit behaviours that are unlikely to naturally occur. Now if you spend enough time on-wiki, you develop an intuition

about

behaviours you see on your watchlist and in article edit histories.

Often I

am highly suspicious that an account is a sockpuppet, but I cannot

report

them because I don't know which other account is involved. As a example, I recently encounted User:Shelati an account about 1 day

old

at that time with nearly 100 edits in that day all about 1-2 minutes

apart,

mostly making a similar change to a large number of Australian place infoboxes.

https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati

https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&am…

fset=20190728053057&limit=100&target=Shelati <

https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&am…

&offset=20190728053057&limit=100&target=Shelati Genuine new users do not edit that quickly, do not use templates and do

not

mess structurally with infoboxes (at most they try to change the

values).

It "smelled" like a sockpuppet. However, as I did not recognise that

pattern

of edit behaviour as being that of any other user I was familiar with, it wasn't something I could report for sockpuppet investigation. Anyhow

after

about 2 weeks, the user was blocked as a sockpuppet. Someone must have noticed and figured out the other account:

https://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations/Meganesia/

engaged

in a pattern of problem edits on mostly Australian articles (a few different types of edits but an obvious "quack like a duck" situation). The IP

number

changes frequently (and one assumes deliberately). The edits

potentially go

back to 2013 but appear to have intensified in 2018/2019. Here's one

user's

summary of all the IP addresses involved, and the extent to which they

have

been cleaned up, given many thousands of edits are involved, see: https://en.wikipedia.org/wiki/User:IamNotU/History_cleanup As well as the damage done to the content (which harms the readers),

these

IP sockpuppets are consuming enormous amounts of effort to track them

down

Nick Wilson (Quiddity)

24 Aug 24 Aug

10:19 a.m.

On Fri, Aug 23, 2019 at 5:23 PM Kerry Raymond <kerry.raymond(a)gmail.com> wrote:

...

The https://www.mediawiki.org/wiki/Wikimedia_Scoring_Platform_team might have some insights into these questions, although I believe they (current and some former members) are active on this mailing list, so might chime in here. On Fri, Aug 23, 2019 at 11:52 PM Timothy Wood <timothyjosephwood(a)gmail.com> wrote:

...

Then again, apparently the Foundation has a PR team whose only job is to [...]

Timothy Wood

25 Aug 25 Aug

1:57 a.m.

Is that what they do? I thought we mostly did that. TJW/GMG On Sat, Aug 24, 2019, 06:20 Nick Wilson (Quiddity) <nwilson(a)wikimedia.org> wrote:

...

On Fri, Aug 23, 2019 at 5:23 PM Kerry Raymond <kerry.raymond(a)gmail.com> wrote:

That's why I think we need "signatures" which is my shorthand for things like a hash function or a bounding box, a means by which many

non-matching

accounts can be eliminated at low cost, reserving the high cost

comparisons

(machine or human) only for high probability candidates. [...]

wrote: > Then again, apparently the Foundation has a PR team whose only job is to > [...]

Please do not denigrate groups of people. Communicating about the movement's mission and activities with large parts of the outside world, and helping others in the movement to also do so, is an important role (and is just part of their role). Similar to your own role in OTRS. However that is all off-topic in this thread. I hope everyone has a pleasant weekend. Quiddity _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Leila Zia

26 Aug 26 Aug

7:49 p.m.

Kerry, thanks for kicking this off. One update on our end: There is a general alignment between a few different teams/departments in WMF that this is an important problem to support chekcusers with in a better way than what we do today. I gave a presentation in Wikimania about the research on sockpuppet detection [1] which is primarily conducted by Srijan Kumar. The goal of the research is to build models that use public data to identify accounts that are predicted to be sockpuppets as soon as possible. Srijan has made significant progress on this front and we'll be presenting the results of the model to checkusers shortly to get their feedback. Check out the slide deck [3] if you're interested to learn more. More updates about the project will appear in [3]. Best, Leila [1] https://wikimania.wikimedia.org/wiki/2019:Research/Sockpuppet_detection_in_… [2] https://wikimania.wikimedia.org/wiki/File:Wikimania2019_research_presentati… [3] https://meta.wikimedia.org/wiki/Research:Sockpuppet_detection_in_Wikimedia_… On Sat, Aug 24, 2019 at 6:58 PM Timothy Wood <timothyjosephwood(a)gmail.com> wrote:

...

Is that what they do? I thought we mostly did that. TJW/GMG On Sat, Aug 24, 2019, 06:20 Nick Wilson (Quiddity) <nwilson(a)wikimedia.org> wrote:

On Fri, Aug 23, 2019 at 5:23 PM Kerry Raymond <kerry.raymond(a)gmail.com> wrote:

That's why I think we need "signatures" which is my shorthand for things like a hash function or a bounding box, a means by which many

non-matching

accounts can be eliminated at low cost, reserving the high cost

comparisons

(machine or human) only for high probability candidates. [...]

wrote: > Then again, apparently the Foundation has a PR team whose only job is to > [...]

_______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

1740

days inactive

1743

days old

wiki-research-l@lists.wikimedia.org

Manage subscription

7 comments

5 participants

tags (0)

participants (5)

Kerry Raymond
Leila Zia
Nick Wilson (Quiddity)
RhinosF1
Timothy Wood