We have talked in the past about releasing granular geocoded pageview data
so that we may track the spread of diseases. The efforts of the Los Alamos
National Lab folks to do this in a privacy sensitive way are on-going, and
we have our own efforts as well, but completely solving this problem in the
general case is known to be very hard.
So, I felt personally compelled in the case of Zika, and the confusing
coverage it has seen, to offer to personally help. I can run queries, test
hypotheses, and help publish data that could back up articles. Privacy of
our editors is of course still obviously protected, but that's easier to do
in a specific case with human review than in the general case.
I offer as much of my volunteer time as will get the job done, plus any of
my official time that my team-mates deem appropriate (they're pretty
awesome, so you probably have me double full time if you need me).
Hi Jeremiah,
I hope you don't mind that I've cc-ed our public mailing list, which is
where discussions like this belong. Dear list, Jeremiah is asking about
our plans for the future of the pageview API.
We definitely see the API as a priority. Right now we are fixing bugs and
improving capacity and loading processes, so maintenance. We are a small
team and we want to make sure we have a solid platform on which to build
new features.
But, we are increasingly committed to publishing open data, and
actively working through the tricky security and privacy implications,
that's one of our main goals this quarter [1].
As for the date range available, we only have the quality source data we
need going back to May 1st, 2015. We will finish back-filling to that date
but we can't go further back since we delete the more sensitive raw logs
that we generate this data from (for privacy reasons).
To follow our work in general, please see our backlog [2] and the tag for
the pageview api [3]. We also have a large amount of requests that we
haven't turned into individual tasks, they are in the form of a
conversation here: https://phabricator.wikimedia.org/T112956
[1]
https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q3_Goals#Analy…
[2] https://phabricator.wikimedia.org/tag/analytics-kanban/
[3] https://phabricator.wikimedia.org/tag/pageviews-api/ (this is new so we
are still tagging the relevant tasks).
On Wednesday, February 17, 2016, Jeremiah Lewis <jeremiah.lewis(a)razorfish.de>
wrote:
> Hey Dan,
>
>
> Oliver Keyes, with whom I've been chatting with about his pageviews
> package for R, gave me your contact details. Oliver indicated that you are
> currently in charge of the pageviews rest API and might be able to shed
> light on the roadmap for future releases.
>
>
> Although I just stumbled across the dataset this past month, the
> wikimedia pageviews data appears to be a wonderful resource with a breadth
> and depth unusual for open source data sets about internet users and the
> topics they are interested in.
>
>
> That said, the API is currently limited in the date range of data which it
> displays (only a bit over 5 months, if I'm not mistaken). What are the next
> steps for the API? Is it being actively maintained? Is it seen as a
> priority for the Wikimedia analytics team? Or are you still trying to
> figure out if there is enough interest in a product based on the
> pageviews data?
>
>
> For both personal and business reasons, I'd love to see this API expanded
> and actively kept up; let me know if there's anything I could do vis a vis
> publicity (i.e. blogpost, etc.) that might give this dataset higher
> priority internally.
>
>
> I look forward to hearing your thoughts on the API's future.
>
>
> Best,
>
> Jeremiah
>
>
>
> [image: cid:66B437D7-2F0B-4676-931F-BE071C62AA24@hsd1.wa.comcast.net.]
> *Jeremiah Lewis* / Junior Business Analyst /// Skype: jpsl91
>
> Razorfish GmbH
> Stralauer Allee 2b
> 10245 Berlin
>
> Chamber of Commerce: Frankfurt am Main – HRB 45639, company registered and
> located in Frankfurt am Main. Directors: Sascha Martini, Ariel Marciano.
> Authorized signatory: Kai Greib.
>
>
>
>
> ------------------------------------------------------------------------
> Disclaimer The information in this email and any attachments may contain
> proprietary and confidential information that is intended for the
> addressee(s) only. If you are not the intended recipient, you are hereby
> notified that any disclosure, copying, distribution, retention or use of
> the contents of this information is prohibited. When addressed to our
> clients or vendors, any information contained in this e-mail or any
> attachments is subject to the terms and conditions in any governing
> contract. If you have received this e-mail in error, please immediately
> contact the sender and delete the e-mail.
>
Lightning talks start in about 5 minutes. Public link at
http://www.youtube.com/watch?v=D3fyCgBWvFc
Optional IRC participation in #wikimedia-tech. (Note, not #wikimedia-office)
Cheers,
Pine
--
Hi everyone,
Just a reminder that the February Lightning Talks
<https://www.mediawiki.org/wiki/Lightning_Talks#February_2016> start in
*25 minutes.*
Come join us in the 5th Floor Collab Space or follow along here:
http://www.youtube.com/watch?v=D3fyCgBWvFc
IRC: #wikimedia-tech
Hope to see you there!
Megan
On Tue, Feb 2, 2016 at 4:22 PM, Kevin Leduc <kevin(a)wikimedia.org> wrote:
> Hi All,
>
>
> The next Lightning Talks are scheduled for February 16th (two weeks from
> today). We hope at least 4 people will sign up for the talks by Friday
> February 12th otherwise we will postpone them another month. Lightning
> Talks are an opportunity for teams @ WMF & in the Community to showcase
> something they have achieved: a quarterly goal, milestone, release, or
> anything of significance to the rest of the foundation and the movement as
> a whole.
>
>
> Each presentation will be 10 minutes or less including time for questions.
>
> Sign up here: https://www.mediawiki.org/wiki/Lightning_Talks#February_2016
>
>
> Next round of Lightning Talks:
>
> When: Tuesday February 16, 1900 UTC
> <
http://www.timeanddate.com/worldclock/fixedtime.html?msg=Lightning+Talks&is…
>,
> 11am PST (We have added this Lightning Talk to the WMF Engineering, Fun &
> Learning, and Staff calendars)
>
> Where: 5th Floor
>
> Remotees: On-Air google hangout will be provided just before the meeting
>
> IRC: #wikimedia-tech
>
> YouTube stream: http://www.youtube.com/watch?v=D3fyCgBWvFc
>
>
> Thanks!
>
> Kevin Leduc, Megan Neisler, Brendan Campbell
>
>
> _______________________________________________
> Wmfall mailing list
> Wmfall(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wmfall
>
>
--
Megan Neisler
Project Coordinator- Engineering
Wikimedia Foundation
mneisler(a)wikimedia.org <mguss(a)wikimedia.org>
--
Megan Neisler
Project Coordinator- Engineering
Wikimedia Foundation
mneisler(a)wikimedia.org <mguss(a)wikimedia.org>
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I should have started this discussion a while ago, but it's easier to catch
up on work during vacation :)
We currently have 3 available static file dumps of pageview data. I will
explain them here and explain my thoughts on simplifying the situation.
Feel free to turn this thread into a wiki.
* PAGECOUNTS-RAW <http://dumps.wikimedia.org/other/pagecounts-raw/>. We
have this data going back to 2007. This is using a very simple pageview
definition which incorrectly counts things like banner views as pageviews
(for example).
* PAGECOUNTS-ALL-SITES
<http://dumps.wikimedia.org/other/pagecounts-all-sites/>. We have this
data starting in late 2014. Compared to PAGECOUNTS-RAW, this dataset also
adds traffic from the mobile versions of our sites. But it's still using
the same simple pageview definition.
* PAGEVIEWS <http://dumps.wikimedia.org/other/pageviews/>. We have this
data starting in May 2015. It implements the new and much improved
pageview definition <https://meta.wikimedia.org/wiki/Research:Page_view>
that we now use. This is the same pageview definition used in the pageview
API. This dataset also removes spider traffic and any automata traffic
that we can detect.
All three datasets are in the same format (Domasz's archive format).
So, before we can simplify this confusing situation, we need your help and
input about what to keep and how to keep it. Here's the approach I would
take:
Combine pagecounts-raw with pagecounts-all-sites into a new dataset called
"pagecounts". Keep producing data to this dataset forever, but remove
"pagecounts-raw" and "pagecounts-all-sites". This way, we can compare new
data with historical data going back as far as we need. We would explain
on dumps.wikimedia.org/other that this dataset gains mobile data starting
in October 2014, to explain the relative local spike that happens there.
This dataset would remain a pretty bad estimate of actual page views, and
would remain sensitive to automata and spider spikes. But in combination
with the "pageviews" dataset, I think it would be useful.
What do you all think? Sound off in this thread, and if we have consensus
I'll start the cleanup.
(cross-posting)
Reminder that these lightning talks are happening tomorrow, Tuesday
February 16, at 1900 UTC / 11:00 AM Pacific. Because there are 3 presenters
and a 1-hour block of time, each presenter has about 15 minutes including
time for questions. We might finish early.
On the agenda:
* Pine: "LearnWIki" Instructional video series on Wikipedia mechanics
(Including VE and citoid) and community practices
<https://meta.wikimedia.org/wiki/Grants:IEG/Motivational_and_educational_vid…>
* Madhu Viswanathan: "Counting unique devices accessing Wikipedia projects
using Last access method"
* Rosemary Rein: "Program Capacity and Learning-Building a Roadmap Together"
<https://commons.wikimedia.org/wiki/File:Program_Capacity_and_Learning_Roadm…>
Hope to see you there!
Pine
On Tue, Feb 2, 2016 at 5:47 PM, Kevin Leduc <kevin(a)wikimedia.org> wrote:
> Thanks for forwarding Pine! I welcome any 10 minute talks from GLAM and
> Education as well. If you add your name to the list [1], email me as well
> so I can contact you and forward notes for Lightning Talk speakers.
>
> [1] https://www.mediawiki.org/wiki/Lightning_Talks#February_2016
>
> On Tue, Feb 2, 2016 at 4:59 PM, Pine W <wiki.pine(a)gmail.com> wrote:
>
>> Boldly forwarding* in case others would like to view or present a
>> lightning talk. I plan to give a lightning talk about the video series
>> <https://meta.wikimedia.org/wiki/Grants:IEG/Motivational_and_educational_vid…>
>> which I'm in the process of producing with the support of an individual
>> engagement grant.
>>
>> Although these talks can be about technical topics like video formats, I
>> think that there are education and GLAM activities that could fit under the
>> umbrella as well, especially if they have technical or research aspects.
>> For example, I'll probably focus much of my presentation on my background
>> research and project design process.
>>
>> Hope to see you there!
>> Pine
>>
>> * To boldly forward where no one has forwarded before
>>
>> ---------- Forwarded message ----------
>> From: Kevin Leduc <kevin(a)wikimedia.org>
>> Date: Tue, Feb 2, 2016 at 4:23 PM
>> Subject: [Wikitech-l] Fwd: February 2016 Lightning Talks
>> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
>>
>>
>> ---------- Forwarded message ----------
>> From: Kevin Leduc <kevin(a)wikimedia.org>
>> Date: Tue, Feb 2, 2016 at 4:22 PM
>> Subject: February 2016 Lightning Talks
>> To: "Staff (All)" <wmfall(a)lists.wikimedia.org>
>>
>>
>> Hi All,
>>
>>
>> The next Lightning Talks are scheduled for February 16th (two weeks from
>> today). We hope at least 4 people will sign up for the talks by Friday
>> February 12th otherwise we will postpone them another month. Lightning
>> Talks are an opportunity for teams @ WMF & in the Community to showcase
>> something they have achieved: a quarterly goal, milestone, release, or
>> anything of significance to the rest of the foundation and the movement as
>> a whole.
>>
>>
>> Each presentation will be 10 minutes or less including time for questions.
>>
>> Sign up here:
>> https://www.mediawiki.org/wiki/Lightning_Talks#February_2016
>>
>>
>> Next round of Lightning Talks:
>>
>> When: Tuesday February 16, 1900 UTC
>> <
>> http://www.timeanddate.com/worldclock/fixedtime.html?msg=Lightning+Talks&is…
>> >,
>> 11am PST (We have added this Lightning Talk to the WMF Engineering, Fun &
>> Learning, and Staff calendars)
>>
>> Where: 5th Floor
>>
>> Remotees: On-Air google hangout will be provided just before the meeting
>>
>> IRC: #wikimedia-tech
>>
>> YouTube stream: http://www.youtube.com/watch?v=D3fyCgBWvFc
>>
>>
>> Thanks!
>>
>> Kevin Leduc, Megan Neisler, Brendan Campbell
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>>
>
Hi Folks,
I have no trouble connecting to s1-analytics-slave.eqiad.wmnet via
terminal, but when I try to use my favorite GUI (sequel pro), I am suddenly
having trouble. I was out for over a month, so there might have been
changes during that time--I think I had trouble just before leaving as well.
Here is my setup:
[image: Inline image 1]
And here is my error.
* MySQL said: Lost connection to MySQL server at 'reading initial
communication packet', system error: 0*
Could anyone spare some time tomorrow to troubleshoot with me? The FAQ
for sequel pro says the following:
"On the server, configure MySQL by editing /etc/my.cnf and comment or
remove skip-networking from the [mysqld] section. Then, restart MySQL
Server."
Thanks,
J