Hi Brian,
2015-03-30 0:25 GMT+02:00 Brian <reflection(a)gmail.com>om>:
Although the initial goal of the Netflix Prize was to
design a
collaborative filtering algorithm, it became notorious when the data was
used to de-anonymize Netflix users. Researchers proved that given just a
user's movie ratings on one site, you can plug those ratings into another
site, such as the IMDB. You can then take that information, and with some
Google searches and optionally a bit of cash (for websites that sell user
information, including, in some cases, their SSN) figure out who they are.
You could even drive up to their house and take a selfie with them, or
follow them to work and meet their boss and tell them about their views on
the topics they were editing.
somewhat tangentially, and to bring back this to topic to a more
scientific setting I would like to point out that there has already
been reasearch in the past on this topic.
I highly recommend reading the following paper:
Lieberman, Michael D., and Jimmy Lin. "You Are Where You Edit:
Locating Wikipedia Contributors through Edit Histories." ICWSM. 2009.
(PDF
<http://www.pensivepuffin.com/dwmcphd/syllabi/infx598_wi12/papers/wikipedia/lieberman-lin.YouAreWhereYouEdit.ICWSM09.pdf>)
For those of you that don't want to read the whole paper, you can find
a recap of the most relevant findings in this presentation by Maurizio
Napolitano:
<http://www.slideshare.net/napo/social-geography-wikipedia-a-quick-overwiew>
The main idea is associating spatial coordinates to a Wikipedia
articles when possible, this articles are called "geopages". Then you
extract from the history of articles the users which have edited a
geopage. If you plot the geopages edited by a given contributor you
can see that they tend to cluster, so you can define an "edit area".
The study finds that 30-35% of contributors concentrate their edits in
an edit area smaller than 1 deg^2 (~12,362 km^2, approximately the
area of Connecticut or Northern Ireland[1] (thanks, Wikipedia!)).
For another free/libre project with a geographic focus like
OpenStreetMap this is even more marked, check out for example this
tool «“Your OSM Heat Map” (aka Where did you contribute?)»[2] by
Pascal Neis.
This, of course, is not a straightforward de-anonimization but this
methods work in principle for every contributor even if you obfuscate
their IP or username (provided that you can still assign all the edits
from a given user to a unique and univocal identifier)
C
[1]
https://en.wikipedia.org/wiki/Square_degree
[2a]
http://yosmhm.neis-one.org/
[2b]
http://neis-one.org/2011/08/yosmhm/