Hi all,
For all Hive users using stat1002/1004, you might have seen a deprecation
warning when you launch the hive client - that claims it's being replaced
with Beeline. The Beeline shell has always been available to use, but it
required supplying a database connection string every time, which was
pretty annoying. We now have a wrapper
<https://github.com/wikimedia/operations-puppet/blob/production/modules/role…>
script
setup to make this easier. The old Hive CLI will continue to exist, but we
encourage moving over to Beeline. You can use it by logging into the
stat1002/1004 boxes as usual, and launching `beeline`.
There is some documentation on this here:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline.
If you run into any issues using this interface, please ping us on the
Analytics list or #wikimedia-analytics or file a bug on Phabricator
<http://phabricator.wikimedia.org/tag/analytics>.
(If you are wondering stat1004 whaaat - there should be an announcement
coming up about it soon!)
Best,
--Madhu :)
Hello!
The Analytics team would like to announce that we have migrated the
reportcard to a new domain:
https://analytics.wikimedia.org/dashboards/reportcard/#pageviews-july-2015-…
The migrated reportcard includes both legacy and current pageview data,
daily unique devices and new editors data. Pageview and devices data is
updated daily but editor data is still updated ad-hoc.
The team is working at this time on revamping the way we compute edit data
and we hope to be able to provide monthly updates for the main edit metrics
this quarter. Some of those will be visible in the reportcard but the new
wikistats will have more detailed reports.
You can follow the new wikistats project here:
https://phabricator.wikimedia.org/T130256
Thanks,
Nuria
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770>
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes:
• determining the most frequent links people click on for a given article
• determining the most common links people followed to an article
• determining how much of the total traffic to an article clicked on a link in that article
• generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream>
Ellery and Dario
Hi everybody,
just wanted to let you know that we have stopped the Eventlogging Mysql
Kafka consumers on eventlog1001 for
https://phabricator.wikimedia.org/T183123. They will be re-enabled as soon
as possible.
Thanks!
Luca
Hi everybody,
as outlined in https://phabricator.wikimedia.org/T181518 the Analytics team
needs to repurpose the notebook1002 host (one of the PAWS/Jupyter nodes) as
Kafka Analytics broker for a urgent maintenance procedure. We are not aware
of anybody actively using it (as it happens with notebook1001) but to be on
the safe side all the home directories will be saved on notebook1001's /srv
directory in case somebody needs that data.
We are in the process of ordering new hardware to replace the current
notebook1001 and 1002 hosts, so the absence of notebook1002 will be only
temporary.
Thanks!
Luca (on behalf of the Analytics team)
Hello from Analytics Team!
We are happy to announce the Alpha release of Wikistats 2. Wikistats has
been redesigned for architectural simplicity, faster data processing, and a
more dynamic and interactive user experience. First goal is to match the
numbers of the current system, and to provide the most important reports,
as decided by the Wikistats community (see survey) [1]. Over time, we will
continue to migrate reports and add new ones that you find useful. We can
also analyze the data in new and interesting ways, and look forward to
hearing your feedback and suggestions. [2]
You can go directly to Spanish Wikipedia
https://stats.wikimedia.org/v2/#/es.wikipedia.org
or browse all projects
https://stats.wikimedia.org/v2/#/all-projects
The new site comes with a whole new set of APIs, similar to our existing
Pageview API but with edit data. You can start using them today, they are
documented here:
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats
FAQ:
Why is this an alpha?
There are features that we feel a full-fledged product should have that are
still missing, such as localization. The data-processing pipeline for the
new Wikistats has been rebuilt from scratch (it uses distributed-computing
tools such as Hadoop) and we want to see how it is used before calling it
final. Also while we aim to update data monthly, it will happen a few days
after the month rolls because of the amount of data to move and compute.
How about comparing data between two wikis?
You can do it with two tabs but we are aware this UI might not solve all
use cases for the most advanced Wikistats users. We aim to tackle those in
the future.
How do I file bugs?
Use the handy link in the footer:
https://phabricator.wikimedia.org/maniphest/task/edit/?title=Wikistats%20Bu…
How do I comment on design?
The consultation on design already happened but we are still watching the
talk page:
https://www.mediawiki.org/wiki/Wikistats_2.0_Design_Project/RequestforFeedb…
[1]
https://www.mediawiki.org/wiki/Analytics/Wikistats/DumpReports/Future_per_r…
[2] https://wikitech.wikimedia.org/wiki/Talk:Analytics/Systems/Wikistats
Hi Everyone,
The next Research Showcase will be live-streamed this Wednesday, December
13, 2017 at 11:15 AM (PST) 18:15 UTC.
YouTube stream: https://www.youtube.com/watch?v=OoVwus1Owtk
As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here.
This month's presentation:
*The State of the Article Expansion Recommendation System*
By Leila Zia
Only 1% of English Wikipedia articles are labeled with quality class Good
or better, and 37% of the articles are stubs. We are building an article
expansion recommendation system to change this in Wikipedia, across many
languages. In this presentation, I will talk with you about our current
thinking of the vision and direction of the research that can help us build
such a recommendation system, and share more about one specific area of
research we have heavily focused on in the past months: building a
recommendation system that can help editors identify what sections to add
to an already existing article. I present some of the challenges we faced,
the methods we devised or used to overcome them, and the result of the
first line of experiments on the quality of such recommendations (teaser:
the results are really promising. The precision and recall at 10 is 80%.)
--
Lani Goto
Project Assistant, Engineering Admin
Hi everybody,
we'd need to reboot the analytics1003 host for Linux kernel and openjdk
updates tomorrow Dec 07 at 10 AM CET. Hive and Oozie will stop for a
(hopefully) brief amount of time, but since they'll need to stop before the
reboot it might happen that in flight jobs/queries fail. We'll try to avoid
the reboot if too many jobs are running, but at some point we'll need to
pull the trigger.
Please let me know on IRC (#wikimedia-analytics, elukey) or via email if
you have any issue with this maintenance.
Thanks and sorry for the trouble!
Luca (on behalf of the Analytics team)