Re: [Wiki-research-l] "identity disclosure hurt the reliability of review systems, but not necessarily efforts provision"

14 Aug 2015


      Did the paper look at the other side of the coin where an anonymous reviewer uses their dislike of someone to give them a poorer score than they deserved? From the description of the experiment, I assume none of the participants knew one another before the experiment and so could not have brought a dislike into the room so that issue was presumably not tested.
Generally in reviewing there are 3 roles involved:
*        The “owner” of the thing being reviewed
*        The “reviewer”
*        The “coordinator” of the reviewing
In blind-to-owner reviewing, the coordinator (and sometimes other reviewers) know the identity of the reviewer even if the owner does not. To some extent, this serves to keep the reviewers honest in relation to any conflict of interest with the owner by either reviewing scrupulously fairly or by refusing to review. Obviously in smaller communities (e.g. a niche academic topic), where real names are used and there is a strong chance of the coordinator (or other reviewers) knowing about your likes and dislikes, there is a stronger likelihood of the reviews being honest or being refused. This means a small number of reviews can suffice as a reasonable opinion as to the merits of the paper. In a large community, like TripAdvisor which uses usernames and which is not a blind review (in the sense that the user names are disclosed but not real names), it is far less likely that conflicts of interest will be detected so a “reasonable opinion” requires a lot more reviews to hope to dominate any bias introduced by friends/foes acting as reviewers.
With respect to Wikipedia, where everyone is a potential reviewer (and executioner) of every edit and we have watchlists which allow individuals to choose to be reviewers of particular articles (making them the most likely reviewers of edits of that article). As a result, we often have few “reviews” and an “enemy” can easily revert an edit. An enemy can use a user’s contributions to easily find other edit’s and revert them too (if some flimsy pretext exists), allowing harassment (a pattern of behaviour that may be very visible to the victim but not to the larger community). We really only have a “negative review” system in Wikipedia. If a reviewer feels positive about an edit, then they usually leave it alone.  They can leave a thanks or Wikilove or a positive Talk message (but the stats would suggest this is rarely done), but that any of those positive actions does not prevent a revert. Oh, and we do it all behind anonymity and pseudonyms with a policy against outing ensuring there is little real world accountability for people’s actions.
Is it any wonder we have a toxic culture on Wikipedia? Build a housing estate with dark alleyways with no visibility to passersby and you create a mugger’s paradise and a no-go zone for everyone else. Build a collaborative environment online, create lots of dark corners, remove real world accountability, have a relatively ineffective solutions to sockpuppeting and we get the ultimate playground for the cyber bully. Oh, and we’ve just designed a Visual Editor to funnel more newcomers into the dark corners.
I think we need to do a lot of work on Wikipedia’s design for a good culture. But in meantime (knowing that hell freezes over faster than changes on Wikipedia), we could use big data to try to pro-actively find patterns of undesirable behaviour. If one account experiences a high level of reverts from another account, maybe someone else should be reviewing those edits to see if it’s bullying or vandalism. If one account appears to be doing a lot of reverting, maybe someone else should be reviewing to see if there’s some kind of undesirable pattern to it. At the moment, we do not appear to be proactive. We rely on victims knowing how to complain and how to provide a lot of documentation to support their case. This is something newcomers are unlikely to know how to do; indeed, even more experienced editors have been known to just walk away than put up with aggressive behaviour.
Kerry
From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Kevin G Crowston
Sent: Friday, 14 August 2015 6:55 AM
To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] "identity disclosure hurt the reliability of review systems, but not necessarily efforts provision"
Subject:  Wiki-research-l Digest, Vol 120, Issue 16
Can someone who has access to that paper please share the method and results as fair use?
Here are a few key paragraphs from the methods and results section.
Method
"We test two “social conditions” Anonymous  vs. Identity  disclosure ( anonymous : participants do not know each other; identity : participants were first asked to introduce themselves before experiments and, during the experiments, participants were shown name of their reviewer on the computer screen and vice versa) . In addition to social condition, we introduce “repeated matching” ( 1 round : matching of counterpart and reviewer changes every round; 3 round : each participant will be matched with the same counterpart and the same reviewer for three consecutive rounds) as another dimension to compare the effect of short-term and long-term social pressure. Employing 2 × 2 design, our experiments consist of four treatments, varying in two dimensions.
Parameters used are P =120, k =0.15. NE prediction is e* =10 under correct-reporting (Proposition 1) and e* =1  under  over-reporting  (Proposition  2).  Expected  p rofit  is π *=45  and π *=59.85,  respectively. Experiments were implemented using z-Tree software (Fischbacher 2007). Participants of experiments consist  of  108  undergraduate  students  at  a  large  pu blic  research  university  in  the  United  States.  For Anonymous  1 round   treatment,  we  conducted  three  experimental  sessions.  Each  of  the  rest  three treatments consisted of two sessions. Each session had 12 participants with 15 decision rounds. Subjects received a course credit and cash payment based on their results of the game."
Findings
"In Anonymous , correct-reporting is most frequently observed at 42% (1-round) and 59% (3-round). In Identity ,  over-reporting  accounts  for  more  than  64%  and  72% ,  respectively.  The  findings  in  reporting behaviors are in line with our expectation as shown  on Propositions 1 and 2. That is, when identity is revealed,  reviewers  feel  social  pressure  to  be  nice   toward  players.  Therefore,  evaluation  systems  are considered less reliable because it fails to reflect the true efforts.
Another finding is that reporting behavior seems st rengthened for a longer period of matching. Repeate d matching seems to make reviewers form a stronger so cial pressure. For example, higher proportion of over-reporting in Identity 3 round  suggests an increased social pressure due to expec tation of a long-term relationship between reviewers and players. “
"Median efforts from four treatments are all lower than that of standard economics prediction (assuming objective-r eporting, e* =10). This suggests that players suspect reliability  of  evaluation  systems.  For Anonymous 1 round   and 3 round   where  correct-reporting  is observed at a high rate, the median efforts are  both  8.  This high median  efforts imply that players in anonymous  settings  generally  hold  a  stronger  belief   that  evaluation  systems  are  still  reliable  at  some extent.
Surprisingly,  for Identity  1 round   where  scores  are  mostly  inflated,  the  median  effort  is  still  8.  The average effort is at 8.2, even higher than both Anonymous  treatments (7.3 and 7.6). This finding suggests a  gap  between  players’  belief  and  reviewers’  behavior.  While  reviewers  feel  strong  social  pressure  by revealing identity, players doubt that social pressure of reviewers will be enough to be nice. However , such doubt disappears when players recognize a long-term  relationship. For Identity 3 round , the median and average efforts decrease into 5 and 5.6, respectively, the lowest level among all treatments."
Kevin Crowston  | Distinguished Professor of Information Science |  School of Information Studies
Syracuse University
348 Hinds Hall
Syracuse, New York 13244
t  (315) 443.1676   f  315.443.5806   e   mailto:crowston@syr.edu mailto:crowston@syr.edu
http://crowston.syr.edu/ crowston.syr.edu

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] "identity disclosure hurt the reliability of review systems, but not necessarily efforts provision"