Hey Andrew,

that’s a great question. I asked Legal to review the implications of publicly releasing a snapshot of this data and I’ll post the outcome of the audit on this list. FWIW the data in question will be aggregated from the logs of raw HTTP request that WMF passively receives. This is the same type of data we previously used for the presentation on readership trends the Analytics Team gave at Monthly Metrics in December [1] The format of the logs and the data they contain is described here [2]

Personally identifiable information (such as IP addresses or User Agents) will not be used other than for the purpose of filtering bots and automated requests: clickthrough data will be obtained by parsing and counting specific string occurrences (such as an article title) in the referer string of an HTTP request. In other words, we will be counting and aggregating occurrences of requests for article B having article A as a string in the referral. I’ll work with Ellery to release the code of the log parsing script so it can be publicly reviewed before we move forward.

Hope this addresses your concerns,

Dario

[1] https://meta.wikimedia.org/w/index.php?title=File:2014_Readership_Update,_WMF_Metrics_Meeting,_December.pdf&page=10
[2] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive

On Jan 12, 2015, at 1:27 PM, Andrew Gray <andrew.gray@dunelm.org.uk> wrote:

Hi all,

I'm curious about the privacy implications as well. I can't think of
specific problems with this data, *but* it's information that I didn't
think we'd ever been logging. We've historically been quite hands-off
with any kind of reader information, other than raw hit counts, and
there might well be some community discomfort at discovering it's been
both tracked and released, even if completely anonymised.

Andrew.

On 12 January 2015 at 20:08, Toby Negrin <tnegrin@wikimedia.org> wrote:
Thanks Amir -- feel free to have your friend reach out to this list
directly.

As Ellery said, we're figuring our if there are any privacy implications in
releasing this dataset.

-Toby

On Mon, Jan 12, 2015 at 12:05 PM, Amir E. Aharoni
<amir.aharoni@mail.huji.ac.il> wrote:

I am asking for a real-life friend who is doing some research. It's not
for any particular project of mine, but I can easily imagine that it can be
useful for a lot of editors and product managers as I wrote in the opening
post.

(And I cannot think of any privacy problems if the data is not tied to any
particular people, but maybe I'm naive.)


--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬

2015-01-12 22:00 GMT+02:00 Toby Negrin <tnegrin@wikimedia.org>:

Hi Amir --

Would you like to see these datasets released publicly or was there a
specific project you were interested in using them for?

thanks,

-Toby

On Mon, Jan 12, 2015 at 5:44 AM, Amir E. Aharoni
<amir.aharoni@mail.huji.ac.il> wrote:

Hi,

Are there metrics about which links in each article are the most
clicked?

I can think there's a lot to be learned from it:
* Data-driven suggestions for manual of style about linking (too much
and too few links are a perennial topic of argument)
* How do people traverse between topics.
* Which terms in the article may need a short explanation in parentheses
rather than just a link.
* How far down into the article do people bother to read.

Anyway, I can think that accessibility to such data can optimize both
readership and editing.

And maybe this can be just taken right from the logs, without any
additional EventLogging.

--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




-- 
- Andrew Gray
 andrew.gray@dunelm.org.uk

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics