Hi all,
For all Hive users using stat1002/1004, you might have seen a deprecation
warning when you launch the hive client - that claims it's being replaced
with Beeline. The Beeline shell has always been available to use, but it
required supplying a database connection string every time, which was
pretty annoying. We now have a wrapper
<https://github.com/wikimedia/operations-puppet/blob/production/modules/role…>
script
setup to make this easier. The old Hive CLI will continue to exist, but we
encourage moving over to Beeline. You can use it by logging into the
stat1002/1004 boxes as usual, and launching `beeline`.
There is some documentation on this here:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline.
If you run into any issues using this interface, please ping us on the
Analytics list or #wikimedia-analytics or file a bug on Phabricator
<http://phabricator.wikimedia.org/tag/analytics>.
(If you are wondering stat1004 whaaat - there should be an announcement
coming up about it soon!)
Best,
--Madhu :)
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770>
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes:
• determining the most frequent links people click on for a given article
• determining the most common links people followed to an article
• determining how much of the total traffic to an article clicked on a link in that article
• generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream>
Ellery and Dario
Hi everyone!
Wikimedia is releasing a new service today: EventStreams
<https://wikitech.wikimedia.org/wiki/EventStreams>. This service allows us
to publish arbitrary streams of JSON event data to the public. Initially,
the only stream available will be good ol’ RecentChanges
<https://www.mediawiki.org/wiki/Manual:RCFeed>. This event stream overlaps
functionality already provided by irc.wikimedia.org and RCStream
<https://wikitech.wikimedia.org/wiki/RCStream>. However, this new service
has advantages over these (now deprecated) services.
1.
We can expose more than just RecentChanges.
2.
Events are delivered over streaming HTTP (chunked transfer) instead of
IRC or socket.io. This requires less client side code and fewer special
routing cases on the server side.
3.
Streams can be resumed from the past. By using EventSource, a
disconnected client will automatically resume the stream from where it left
off, as long as it resumes within one week. In the future, we would like
to allow users to specify historical timestamps from which they would like
to begin consuming, if this proves safe and tractable.
I did say deprecated! Okay okay, we may never be able to fully deprecate
irc.wikimedia.org. It’s used by too many (probably sentient by now) bots
out there. We do plan to obsolete RCStream, and to turn it off in a
reasonable amount of time. The deadline iiiiiis July 7th, 2017. All
services that rely on RCStream should migrate to the HTTP based
EventStreams service by this date. We are committed to assisting you in
this transition, so let us know how we can help.
Unfortunately, unlike RCStream, EventStreams does not have server side
event filtering (e.g. by wiki) quite yet. How and if this should be done
is still under discussion <https://phabricator.wikimedia.org/T152731>.
The RecentChanges data you are used to remains the same, and is available
at https://stream.wikimedia.org/v2/stream/recentchange. However, we may
have something different for you, if you find it useful. We have been
internally producing new Mediawiki specific events
<https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema…>
for a while now, and could expose these via EventStreams as well.
Take a look at these events, and tell us what you think. Would you find
them useful? How would you like to subscribe to them? Individually as
separate streams, or would you like to be able to compose multiple event
types into a single stream via an API? These things are all possible.
I asked for a lot of feedback in the above paragraphs. Let’s try and
centralize this discussion over on the mediawiki.org EventStreams talk page
<https://www.mediawiki.org/wiki/Talk:EventStreams>. In summary, the
questions are:
-
What RCStream clients do you maintain, and how can we help you migrate
to EventStreams? <https://www.mediawiki.org/wiki/Topic:Tkjkee2j684hkwc9>
-
Is server side filtering, by wiki or arbitrary event field, useful to
you? <https://www.mediawiki.org/wiki/Topic:Tkjkabtyakpm967t>
-
Would you like to consume streams other than RecentChanges?
<https://www.mediawiki.org/wiki/Topic:Tkjk4ezxb4u01a61> (Currently
available events are described here
<https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema…>
.)
Thanks!
- Andrew Otto
Hey there!
I'm helping someone who is looking for a couple of indicators based on
Wikipedia data (specifically number of Wiki pages available to a country’s
population and the number of Wiki edits per user).
I can access the latest data on the first indicator, which requires number
of articles in each language. This is at the link found here
https://stats.wikimedia.org/EN/TablesArticlesTotal.htm
However, I see from this link
<https://stats.wikimedia.org/wikimedia/squids/SquidReportsCountriesLanguages…>
that
we have not published data on the number of Wiki page edits since 2014. Do
we plan to update this data at some point in the near future? Or has this
metric been discontinued?
Thanks,
Anne
--
*Anne Gomez* // Reading Product Manager, New Readers
<https://meta.wikimedia.org/wiki/New_Readers>
https://wikimediafoundation.org/
*Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment. Donate
<http://donate.wikimedia.org>. *
Hey,
I'm trying to create a report of pageviews of all the articles that uses a
file from a specific commons category ("Wikimedia Israel - Channel 2
videos").
I pulled up the list of articles using Glamorous:
https://tools.wmflabs.org/glamtools/glamorous.php?doit=1&category=Wikimedia…
And now I look for a way to use massviews in order to get the pageviews of
this pages (or even better, pagesviews per file).
And last thing, although I asked it half a year ago, I'll try again, maybe
something has changed since - there is an easy way to get video views
statistics also of this files?
Thanks :)
*Regards,Itzik Edri*
Chairperson, Wikimedia Israel
+972-54-5878078 | http://www.wikimedia.org.il
Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment!
Hi everyone,
We are planning an upgrade of the Hadoop cluster on February 28th. We need
to take the cluster down for this upgrade. The actual upgrade shouldn’t
take more than 2 hours, but we’re going to reserve the whole work day of
February 28th to do this, just in case something goes wrong.
Hadoop, Hive and Spark will all be unavailable during this time. (Kafka,
Druid, pivot, and stat* boxes will continue to be accessible.)
You can keep track of our progress in Phabricator:
https://phabricator.wikimedia.org/T152714
If you have a hard need to use Hadoop on this day, let us know soon. We
can reschedule this upgrade for another date if we need to.
Thanks!
-Andrew & Luca, your friendly A-Team Ops Engineers :)
Wikistats no longer seems to include many Wikipedias in its results; for
example, if you look at the sitemap for Wikipedias
<https://stats.wikimedia.org/EN/Sitemap.htm>, many large and active wikis
are listed among the projects that were excluded because they had fewer
than 10 articles and fewer than 10 edits in January. These include Turkish,
Thai, Hindi, Ukrainian, Hebrew, and Chinese.
Any idea what's going on?
--
Neil Patel Quinn <https://meta.wikimedia.org/wiki/User:Neil_P._Quinn-WMF>,
product analyst
Wikimedia Foundation
Hi Analytics!
I was looking at the page views chart for the Special:RecentChanges page, here
https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-… <https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-…>
Over the last year, the PVs count has gone through some pretty wild swings, some of which were sustained over months. I assume these are artifacts or bugs, and not a reflection of true usage patterns? Is that right? (The last month has seen a new surge; any idea what that is?)
_____________________
Joe Matazzoni
Product Manager, Collaboration
Wikimedia Foundation, San Francisco
mobile 202.744.7910
jmatazzoni(a)wikimedia.org
"Imagine a world in which every single human being can freely share in the sum of all knowledge."
Hi everybody,
we are currently experiencing a wide outage of the AQS API, so Pageviews
are not available at the moment. I am currently working on it, will update
this list as soon as possible.
Really sorry for the trouble,
Luca