Pine, have you considered asking Milowent who they work with on the IP
data? I really, really doubt that there is some sort of shady back-alley
data dealing going down here. - Jonathan
On Thu, Oct 16, 2014 at 9:52 PM, Pine W <wiki.pine(a)gmail.com> wrote:
Thanks Toby.
I understand that IPs are not an especially accurate way to look at unique
visitors, but for the purposes of the Signpost's traffic report and the Top
25 I feel that they are reasonable approximations of ways to filter out
what appear to be automated requests.
I am ok with holding those logs for 30 days, although I am a little
surprised to hear that this is happening. However, what worries me a bit
more is the idea that a staff member can be accessing those logs without
that access being recorded. This might be something that you wish to
investigate further.
I am not interested in getting this staff person into trouble. The
information that they are providing is useful to the Signpost and certainly
seems to be sanitized to a reasonable degree. However, it does concern me
that they can access these logs without someone knowing about it, it seems
to me that this sort of activity should be proactively disclosed to people
in WMF who conduct legal and security reviews, and I hope you will consider
what sort of security features are appropriate to make sure that occasions
when anyone accesses the raw logs are recorded in a robust manner. I worry
that if this one staffer can access logs without the higher-ups knowing
about it, it is possible that someone who intends to do unethical
activities with WMF's data could also access the logs without being noticed.
Thanks,
Pine
On Thu, Oct 16, 2014 at 9:31 PM, Toby Negrin <tnegrin(a)wikimedia.org>
wrote:
Hi Pine --
Thanks for this -- it's a challenging topic but one that the Analytics
team takes very seriously.
I'm not familiar with the IP address review that's referenced in the
link. I don't know who the staffer might be. We don't currently calculate
unique visitors to anything in Analytics and IP address is not a
particularly accurate way to assess unique visitors regardless (due to
proxies/NATs/etc).
We do store IPs as part of page requests in our raw logs which are
deleted every 30 days. This data is kept on a system where access is
limited and controlled by the operations team. We're in line with the
privacy policy on this.
To be clear, we are currently considering mechanisms to count unique
"requests" -- we rely on Comscore for this data and for several reasons,
primarily related to mobile usage, it's not sufficient to understand our
usage patterns. We are putting together some proposals to do this in as
limited way as possible and that's respectful to our users. We'll share
this with the community when we feel we understand the use cases and
trade-offs well enough to discuss in an informed manner.
-Toby
We do store the IP address associated with varnish requests as part of
the log. This data is
On Thu, Oct 16, 2014 at 8:50 PM, Pine W <wiki.pine(a)gmail.com> wrote:
Hi again Analytics,
I was under the impression that no records are kept of which IPs access
which articles on Wikipedia when no edits are made, but it appears that
such records are in fact kept [1].
Is this proper? This practice appears to be permissible under the
Privacy Policy which states that "We use IP addresses for research and
analytics; to better personalize content, notices, and settings for you; to
fight spam, identity theft, malware, and other kinds of abuse; and to
provide better mobile and other applications."
It is possible that this information is relevant for determining the
number of unique visitors that Wikipedia gets and that this information is
always properly filtered before it gets to the Signpost. However, given
recent discussions which I thought said that Wikipedia was not instrumented
to track unique visitors, I am surprised to learn that this already seems
to be happening and that the situation has been this way for some time, so
I would appreciate clarification.
I want to emphasize that this question is about clarifying the practice
of tracking likely unique visitors by IP. This question is not intended to
start flame wars, get people into trouble, or limit the Signpost's access
to properly filtered information if there has been a determination that
WMF's retention of the raw data is appropriate. There might be appropriate
secondary questions about making sure that access to the raw IP access data
is carefully contained and secured.
Thank you very much,
Pine
[1]
https://en.wikipedia.org/w/index.php?title=User_talk%3ASerendipodous&di…
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Jonathan T. Morgan
Learning Strategist
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
jmorgan(a)wikimedia.org