Obfuscating IP addresses on history pages - Wikitech-l

5 Apr 2015


      Hi,
You may have followed the discussion on Wikimedia-l (and enwiki-l).
For a mere intellectual curiosity I would like to know why hashing the IPs
with a varying salt won't work.
Wouldn't that provide a way to obfuscate IP addresses while maintaining
uniqueness (i. e. a given IP gets alway hashed to the same hash).
Tim said in a message on enwiki-l that he has looked into the matter but
haven't found any satisfying solution.
So what's the problem with salted hashes?
Note: I have read something about hashing but I am far from being an
expert, please assume I am the classical layman.
Thanks in advance to anyone who will take the time to explain.
C
---------- Messaggio inoltrato ----------
Da: "Lila Tretikov" lila@wikimedia.org
Data: 05/Apr/2015 11:30
Oggetto: Re: [Wikimedia-l] Announcing: The Wikipedia Prize!
A: "Wikimedia Mailing List" wikimedia-l@lists.wikimedia.org
Cc:
All,
As Tim mentioned we are seriously looking at
privacy/identity/security/anonymity issues, specifically as it pertains to
IP address exposure -- both from legal and technical standpoint. This won't
happen overnight as we need to get people to work on this and there are a
lot of asks, but this is on our radar.
On a related note, let's skip the sarcasm and treat each other with
straightforward honestly. And for non-English speakers -- who are also (if
not more) in need of this -- sarcasm can be very confusing.
Thanks,
Lila
On Fri, Apr 3, 2015 at 4:02 PM, Cristian Consonni kikkocristian@gmail.com
wrote:
...
Hi Brian,
2015-03-30 0:25 GMT+02:00 Brian reflection@gmail.com:
...
Although the initial goal of the Netflix Prize was to design a
collaborative filtering algorithm, it became notorious when the data was
used to de-anonymize Netflix users. Researchers proved that given just a
user's movie ratings on one site, you can plug those ratings into
another
...
...
site, such as the IMDB. You can then take that information, and with
some
...
...
Google searches and optionally a bit of cash (for websites that sell
user
...
...
information, including, in some cases, their SSN) figure out who they
are.
...
You could even drive up to their house and take a selfie with them, or
follow them to work and meet their boss and tell them about their views
on
...
the topics they were editing.
somewhat tangentially, and to bring back this to topic to a more
scientific setting I would like to point out that there has already
been reasearch in the past on this topic.
I highly recommend reading the following paper:
Lieberman, Michael D., and Jimmy Lin. "You Are Where You Edit:
Locating Wikipedia Contributors through Edit Histories." ICWSM. 2009.
(PDF <
http://www.pensivepuffin.com/dwmcphd/syllabi/infx598_wi12/papers/wikipedia/l...
...
...
)
For those of you that don't want to read the whole paper, you can find
a recap of the most relevant findings in this presentation by Maurizio
Napolitano:
<
http://www.slideshare.net/napo/social-geography-wikipedia-a-quick-overwiew
...
The main idea is associating spatial coordinates to a Wikipedia
articles when possible, this articles are called "geopages". Then you
extract from the history of articles the users which have edited a
geopage. If you plot the geopages edited by a given contributor you
can see that they tend to cluster, so you can define an "edit area".
The study finds that 30-35% of contributors concentrate their edits in
an edit area smaller than 1 deg^2 (~12,362 km^2, approximately the
area of Connecticut or Northern Ireland[1] (thanks, Wikipedia!)).
For another free/libre project with a geographic focus like
OpenStreetMap this is even more marked, check out for example this
tool «“Your OSM Heat Map” (aka Where did you contribute?)»[2] by
Pascal Neis.
This, of course, is not a straightforward de-anonimization but this
methods work in principle for every contributor even if you obfuscate
their IP or username (provided that you can still assign all the edits
from a given user to a unique and univocal identifier)
C
[1] https://en.wikipedia.org/wiki/Square_degree
[2a] http://yosmhm.neis-one.org/
[2b] http://neis-one.org/2011/08/yosmhm/

Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe