Hi all,
For all Hive users using stat1002/1004, you might have seen a deprecation
warning when you launch the hive client - that claims it's being replaced
with Beeline. The Beeline shell has always been available to use, but it
required supplying a database connection string every time, which was
pretty annoying. We now have a wrapper
<https://github.com/wikimedia/operations-puppet/blob/production/modules/role…>
script
setup to make this easier. The old Hive CLI will continue to exist, but we
encourage moving over to Beeline. You can use it by logging into the
stat1002/1004 boxes as usual, and launching `beeline`.
There is some documentation on this here:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline.
If you run into any issues using this interface, please ping us on the
Analytics list or #wikimedia-analytics or file a bug on Phabricator
<http://phabricator.wikimedia.org/tag/analytics>.
(If you are wondering stat1004 whaaat - there should be an announcement
coming up about it soon!)
Best,
--Madhu :)
Hello!
The Analytics team would like to announce that we have migrated the
reportcard to a new domain:
https://analytics.wikimedia.org/dashboards/reportcard/#pageviews-july-2015-…
The migrated reportcard includes both legacy and current pageview data,
daily unique devices and new editors data. Pageview and devices data is
updated daily but editor data is still updated ad-hoc.
The team is working at this time on revamping the way we compute edit data
and we hope to be able to provide monthly updates for the main edit metrics
this quarter. Some of those will be visible in the reportcard but the new
wikistats will have more detailed reports.
You can follow the new wikistats project here:
https://phabricator.wikimedia.org/T130256
Thanks,
Nuria
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770>
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes:
• determining the most frequent links people click on for a given article
• determining the most common links people followed to an article
• determining how much of the total traffic to an article clicked on a link in that article
• generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream>
Ellery and Dario
Google Code-in is an annual contest for 13-17 year old students. It
will take place from Nov28 to Jan17 and is not only about coding tasks.
While we wait whether Wikimedia will get accepted:
* You have small, self-contained bugs you'd like to see fixed?
* Your documentation needs specific improvements?
* Your user interface has small design issues?
* Your Outreachy/Summer of Code project welcomes small tweaks?
* You'd enjoy helping someone port your template to Lua?
* Your gadget code uses some deprecated API calls?
* You have tasks in mind that welcome some research?
Also note that "Beginner tasks" (e.g. "Set up Vagrant" etc) and
"generic" tasks are very welcome (e.g. "Choose & fix 2 PHP7 issues
from the list in https://phabricator.wikimedia.org/T120336 ").
Because we will need hundreds of tasks. :)
And we also have more than 400 unassigned open 'easy' tasks listed:
https://phabricator.wikimedia.org/maniphest/query/HCyOonSbFn.z/#R
Would you be willing to mentor some of those in your area?
Please take a moment to find / update [Phabricator etc.] tasks in your
project(s) which would take an experienced contributor 2-3 hours. Check
https://www.mediawiki.org/wiki/Google_Code-in/Mentors
and please ask if you have any questions!
For some achievements from last round, see
https://blog.wikimedia.org/2017/02/03/google-code-in/
Thanks!,
andre
--
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/
As was previously announced on the xmldatadumps-l list, the sql/xml dumps
generated twice a month will be written to an internal server, starting
with the November run. This is in part to reduce load on the web/rsync/nfs
server which has been doing this work also until now. We want separation
of roles for some other reasons too.
Because I want to get this right, and there are a lot of moving parts, and
I don't want to rsync all the prefetch data over to these boxes again next
month after cancelling the move:
********
If needed, the November full run will be delayed for a few days.
If the November full run takes too long, the partial run, usually starting
on the 20th of the month, will not take place.
*********
Additionally, as described in an earlier email on the xmldatadumps-l list:
*********
files will show up on the web server/rsync server with a substantial
delay. Initially this may be a day or more. This includes index.html and
other status files.
*********
You can keep track of developments here:
https://phabricator.wikimedia.org/T178893
If you know folks not on the lists in the recipients field for this email,
please forward it to them and suggest that they subscribe to this list.
Thanks,
Ariel
Hi everybody,
mysql will not be available on db1047 for some time due to maintenance for
https://phabricator.wikimedia.org/T177405.
Ping me on IRC (elukey) or the #wikimedia-analytics channell if you
encounter any issue or if you have any questions.
Thanks!
Luca
o/
Prior to Thursday, 28th September, if your client-side EventLogging
instrumentation logged event via mw.track, then only events tracked
during the first pageview of a user's session were logged.
Now, technically, the events weren't ignored or dropped. Instead, the
subscriber for the "event" topic was never subscribed when the module
was loaded from the ResourceLoader's cache and so events published on
that topic simply weren't received and logged.
This bug was discovered while testing some instrumentation maintained
by Readers Web [0] and independently by Timo Tijhof, who submitted the
ideal fix [1] promptly.
-Sam
[0] https://phabricator.wikimedia.org/T175918
[1] https://gerrit.wikimedia.org/r/#/c/378804/
---
Engineering Manager
Readers
Timezone: BST (UTC+1)
IRC (Freenode): phuedx