All,
As Tim mentioned we are seriously looking at privacy/identity/security/anonymity issues, specifically as it pertains to IP address exposure -- both from legal and technical standpoint. This won't happen overnight as we need to get people to work on this and there are a lot of asks, but this is on our radar.
On a related note, let's skip the sarcasm and treat each other with straightforward honestly. And for non-English speakers -- who are also (if not more) in need of this -- sarcasm can be very confusing.
Thanks, Lila
On Fri, Apr 3, 2015 at 4:02 PM, Cristian Consonni kikkocristian@gmail.com wrote:
Hi Brian,
2015-03-30 0:25 GMT+02:00 Brian reflection@gmail.com:
Although the initial goal of the Netflix Prize was to design a collaborative filtering algorithm, it became notorious when the data was used to de-anonymize Netflix users. Researchers proved that given just a user's movie ratings on one site, you can plug those ratings into another site, such as the IMDB. You can then take that information, and with some Google searches and optionally a bit of cash (for websites that sell user information, including, in some cases, their SSN) figure out who they
are.
You could even drive up to their house and take a selfie with them, or follow them to work and meet their boss and tell them about their views
on
the topics they were editing.
somewhat tangentially, and to bring back this to topic to a more scientific setting I would like to point out that there has already been reasearch in the past on this topic.
I highly recommend reading the following paper:
Lieberman, Michael D., and Jimmy Lin. "You Are Where You Edit: Locating Wikipedia Contributors through Edit Histories." ICWSM. 2009. (PDF < http://www.pensivepuffin.com/dwmcphd/syllabi/infx598_wi12/papers/wikipedia/l...
)
For those of you that don't want to read the whole paper, you can find a recap of the most relevant findings in this presentation by Maurizio Napolitano: < http://www.slideshare.net/napo/social-geography-wikipedia-a-quick-overwiew
The main idea is associating spatial coordinates to a Wikipedia articles when possible, this articles are called "geopages". Then you extract from the history of articles the users which have edited a geopage. If you plot the geopages edited by a given contributor you can see that they tend to cluster, so you can define an "edit area". The study finds that 30-35% of contributors concentrate their edits in an edit area smaller than 1 deg^2 (~12,362 km^2, approximately the area of Connecticut or Northern Ireland[1] (thanks, Wikipedia!)).
For another free/libre project with a geographic focus like OpenStreetMap this is even more marked, check out for example this tool «“Your OSM Heat Map” (aka Where did you contribute?)»[2] by Pascal Neis.
This, of course, is not a straightforward de-anonimization but this methods work in principle for every contributor even if you obfuscate their IP or username (provided that you can still assign all the edits from a given user to a unique and univocal identifier)
C [1] https://en.wikipedia.org/wiki/Square_degree [2a] http://yosmhm.neis-one.org/ [2b] http://neis-one.org/2011/08/yosmhm/
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe