Hi,
You may have followed the discussion on Wikimedia-l (and enwiki-l).
For a mere intellectual curiosity I would like to know why hashing the IPs with a varying salt won't work.
Wouldn't that provide a way to obfuscate IP addresses while maintaining uniqueness (i. e. a given IP gets alway hashed to the same hash).
Tim said in a message on enwiki-l that he has looked into the matter but haven't found any satisfying solution.
So what's the problem with salted hashes?
Note: I have read something about hashing but I am far from being an expert, please assume I am the classical layman.
Thanks in advance to anyone who will take the time to explain.
C ---------- Messaggio inoltrato ---------- Da: "Lila Tretikov" lila@wikimedia.org Data: 05/Apr/2015 11:30 Oggetto: Re: [Wikimedia-l] Announcing: The Wikipedia Prize! A: "Wikimedia Mailing List" wikimedia-l@lists.wikimedia.org Cc:
All,
As Tim mentioned we are seriously looking at privacy/identity/security/anonymity issues, specifically as it pertains to IP address exposure -- both from legal and technical standpoint. This won't happen overnight as we need to get people to work on this and there are a lot of asks, but this is on our radar.
On a related note, let's skip the sarcasm and treat each other with straightforward honestly. And for non-English speakers -- who are also (if not more) in need of this -- sarcasm can be very confusing.
Thanks, Lila
On Fri, Apr 3, 2015 at 4:02 PM, Cristian Consonni kikkocristian@gmail.com wrote:
Hi Brian,
2015-03-30 0:25 GMT+02:00 Brian reflection@gmail.com:
Although the initial goal of the Netflix Prize was to design a collaborative filtering algorithm, it became notorious when the data was used to de-anonymize Netflix users. Researchers proved that given just a user's movie ratings on one site, you can plug those ratings into
another
site, such as the IMDB. You can then take that information, and with
some
Google searches and optionally a bit of cash (for websites that sell
user
information, including, in some cases, their SSN) figure out who they
are.
You could even drive up to their house and take a selfie with them, or follow them to work and meet their boss and tell them about their views
on
the topics they were editing.
somewhat tangentially, and to bring back this to topic to a more scientific setting I would like to point out that there has already been reasearch in the past on this topic.
I highly recommend reading the following paper:
Lieberman, Michael D., and Jimmy Lin. "You Are Where You Edit: Locating Wikipedia Contributors through Edit Histories." ICWSM. 2009. (PDF <
http://www.pensivepuffin.com/dwmcphd/syllabi/infx598_wi12/papers/wikipedia/l...
)
For those of you that don't want to read the whole paper, you can find a recap of the most relevant findings in this presentation by Maurizio Napolitano: < http://www.slideshare.net/napo/social-geography-wikipedia-a-quick-overwiew
The main idea is associating spatial coordinates to a Wikipedia articles when possible, this articles are called "geopages". Then you extract from the history of articles the users which have edited a geopage. If you plot the geopages edited by a given contributor you can see that they tend to cluster, so you can define an "edit area". The study finds that 30-35% of contributors concentrate their edits in an edit area smaller than 1 deg^2 (~12,362 km^2, approximately the area of Connecticut or Northern Ireland[1] (thanks, Wikipedia!)).
For another free/libre project with a geographic focus like OpenStreetMap this is even more marked, check out for example this tool «“Your OSM Heat Map” (aka Where did you contribute?)»[2] by Pascal Neis.
This, of course, is not a straightforward de-anonimization but this methods work in principle for every contributor even if you obfuscate their IP or username (provided that you can still assign all the edits from a given user to a unique and univocal identifier)
C [1] https://en.wikipedia.org/wiki/Square_degree [2a] http://yosmhm.neis-one.org/ [2b] http://neis-one.org/2011/08/yosmhm/
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
_______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe