I'll review Daniel's email and will get back to him/you on this list
in the next day or so.
Leila
--
Leila Zia
Senior Research Scientist
Wikimedia Foundation
On Mon, Jul 24, 2017 at 7:59 AM, Nuria Ruiz <nuria@wikimedia.org> wrote:
> Daniel,
>
> Singining an NDA is not enough to get access to the data, you also need to
> be part of a formal research collaboration with our research team, they
> have a number of those and they are not likely to accept any more soon but
> you can contact them on that regard:
> https://www.mediawiki.org/wiki/Wikimedia_Research/ Formal_collaborations
>
> Thanks,
>
> Nuria
>
>
>
> On Mon, Jul 24, 2017 at 6:37 AM, Daniel Oberski <daniel.oberski@gmail.com>
> wrote:
>>
>> Dear list,
>>
>> I'm posting a recent conversation with Dan below, as well as a few
>> follow-up questions.
>>
>> Dan was kind enough to point out this list. I apologize that the post is
>> "backward" (in
>> email-thread format) due to my ignorance, will use this list from now on.
>>
>> Thanks, Daniel
>>
>>
>> ----
>>
>> Hi Dan
>>
>>
>> Thanks for getting back to me so quickly!
>>
>> >Thanks for writing. In general these questions are best asked on our
>> > public list, so other
>> >people can see and benefit from any answers:
>> > https://lists.wikimedia.org/mailman/listinfo/
>> >analytics
>>
>> Thanks, I've joined this list and will ask subsequent questions there.
>>
>> >* pairs of pages: we have two datasets that are mentioned in this task
>> > https://
>> >phabricator.wikimedia.org/T158972 which should be very interesting for
>> > this purpose. They
>> >aren't being updated right now, and the task is to do just that. We'll
>> > probably get to
>> >that within the next 3 months, but a bunch of us are on paternity leave
>> > this summer, so
>> >things are a little slower than normal
>>
>> This seems close to what I need. From the descriptions I gather the
>> linkage is by session.
>> Is there also a linkage by ip (with IP's removed of course)?
>>
>> >* country data for pageviews: for privacy reasons we only allow access to
>> > this with an
>> >NDA. We have good data on it, but you need to sign this NDA and use our
>> > cluster to access
>> >it, being careful about what you report about it to the world at large.
>> > Here's information
>> >on that: https://wikitech.wikimedia.org/wiki/Volunteer_NDA
>>
>> I've read this and am happy to sign an NDA. I understand it is best to be
>> as specific as
>> possible about the reasoning, intentions with the data, and permissions
>> required. For me to
>> figure this out it would be useful to know the relevant parts of the
>> database schema, and
>> perhaps a hint as to which data might be most interesting there. Would you
>> be able to point
>> me towards that?
>>
>> >Hope that helps, and feel free to write back to the public list in the
>> > future.
>>
>> Definitely, very helpful and thank you!
>>
>> Best, Daniel
>>
>>
>> On Wed, Jul 19, 2017 at 9:51 AM, Oberski, D.L. (Daniel)
>> <d.l.oberski@uu.nl> wrote:
>> Dear Dan,
>>
>>
>> My name is Daniel Oberski, I'm an associate professor of data science
>> methodology in the
>> department of statistics at Utrecht University in the Netherlands.
>>
>> I've been using your incredibly useful pageviews API to study correlations
>> between the
>> amount of interest people show in a topic (pageviews) with other data such
>> as political
>> party preference over time. That has yielded some interesting results
>> (which I have yet to
>> write up).
>>
>> However, to do a better study it would be very helpful to have slightly
>> more information
>> than is in the API. Specifically, it would be very useful to be able to
>> query, for each
>> _pair_ of pages, how many people (or IP's) viewed _both_ of those pages.
>> That way I can find
>> out which pages are really indicative of interest in a specific common
>> topic, rather than
>> just correlated by accident. In addition, I've found it hard to figure out
>> pageviews for
>> specific pages by country rather than language.
>>
>> My question is, would you happen to know if is there any way to obtain
>> this information?
>> (does not necessarily have to be through the API.) Or do you know if there
>> are people to
>> whom I might talk about this?
>>
>> Thanks for reading (to) the end and best regards,
>>
>> Daniel
>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>