Hi everybody,
as part of https://phabricator.wikimedia.org/T183297 the Analytics team is
migrating all the Varnishkafka Eventlogging traffic from Kafka Analytics to
Kafka Jumbo. The procedure that we are going to use is the following:
1) change the varnishkafka configuration - this will effectively migrate
all the traffic from the caching hosts to Kafka Jumbo.
2) eventlogging keeps pulling data from the Kafka analytics topics until it
consumes all the events.
3) change eventlogging's config to pull data from Kafka Jumbo.
This change should not cause any data loss or inconsistency but we had to
create new dashboards:
- https://grafana.wikimedia.org/dashboard/db/eventlogging and
https://grafana.wikimedia.org/dashboard/db/eventlogging-schema are still
pointing to the analytics cluster, so they will likely show a big drop
after the migration. I added a big banner explaining what's happening in
both.
- https://grafana.wikimedia.org/dashboard/db/eventlogging-jumbo and
https://grafana.wikimedia.org/dashboard/db/eventlogging-schema-jumbo point
to the new Kafka Jumbo cluster, so they should soon start showing data.
Those dashboards are still work in progress so for every inconsistency
please follow up with me (elukey) or Andrew (ottomata) on
#wikimedia-analytics.
Thanks!
Luca (on behalf of the Analytics team)
Hello!
The Analytics team would like to announce that we have migrated the
reportcard to a new domain:
https://analytics.wikimedia.org/dashboards/reportcard/#pageviews-july-2015-…
The migrated reportcard includes both legacy and current pageview data,
daily unique devices and new editors data. Pageview and devices data is
updated daily but editor data is still updated ad-hoc.
The team is working at this time on revamping the way we compute edit data
and we hope to be able to provide monthly updates for the main edit metrics
this quarter. Some of those will be visible in the reportcard but the new
wikistats will have more detailed reports.
You can follow the new wikistats project here:
https://phabricator.wikimedia.org/T130256
Thanks,
Nuria
Hi Nick,
I made a Quarry query to do this for you: https://quarry.wmflabs.org/query/25400
You will have to fork it and remove the "LIMIT 10" to get it to run on
all the English Wikipedia articles. It may take too long or produce
too much data, in which case please ask on this list for someone who
can run it for you.
USE enwiki_p;
SELECT page_title as article, COUNT(DISTINCT pli.pl_from) as inlinks,
COUNT(DISTINCT plo.pl_title) as outlinks
FROM page
JOIN pagelinks AS pli ON page.page_title = pli.pl_title AND pli.pl_namespace = 0
AND page.page_namespace = 0 AND page.page_is_redirect = 0
JOIN pagelinks AS plo ON page.page_id = plo.pl_from AND plo.pl_namespace = 0
AND page.page_namespace = 0 AND page.page_is_redirect = 0
GROUP BY article
LIMIT 10;
Refs.: https://www.mediawiki.org/wiki/Manual:Pagelinks_tablehttps://www.mediawiki.org/wiki/Manual:Page_table
> From: Nick Bell <bhink03(a)gmail.com>
> Subject: [Analytics] Ingoing and outgoing internal links enquiry
>
> Dear Analytics Team,
>
> I’m doing a project on Wikipedia for my Maths degree, and I was hoping you
> could help me acquire some data about Wikipedia.
>
> I would like to get the number of incoming internal links and outgoing
> internal links for every page, if possible. I could limit this if needs be,
> as I am aware this totals around 11 million values.
>
> I have minimal programming experience, so if this is unreasonable or
> impossible please let me know. I very much appreciate your time considering
> my request.
>
>
>
> Many thanks,
>
>
> Nicholas Bell
>
> Mathematics Undergraduate
>
> University of Bristol
Dear Analytics Team,
I’m doing a project on Wikipedia for my Maths degree, and I was hoping you
could help me acquire some data about Wikipedia.
I would like to get the number of incoming internal links and outgoing
internal links for every page, if possible. I could limit this if needs be,
as I am aware this totals around 11 million values.
I have minimal programming experience, so if this is unreasonable or
impossible please let me know. I very much appreciate your time considering
my request.
Many thanks,
Nicholas Bell
Mathematics Undergraduate
University of Bristol
Hi everybody,
today as part of https://phabricator.wikimedia.org/T114199 we migrated all
the eventlogging daemons (except the zmq-forwarder, see
https://gerrit.wikimedia.org/r/#/c/415218/) from eventlog1001 (Ubuntu
Trusty) to eventlog1002 (Debian Stretch). The maintenance that we followed
involved some downtime (13:26->13:32 UTC) that will probably be reflected
in all the Eventlogging metrics, including the schema ones.
Please let us know if you notice anything out of the ordinary during the
next hours.
Thanks a lot!
Luca (on behalf of the Analytics team)
Hi everybody,
today, while performing maintenance to the Eventlogging Master database, we
ended up in https://phabricator.wikimedia.org/T188991 (TL;DR: two hours of
data inserted to the slave database and not the master one). We are working
to find a feasible solution to avoid loosing data and getting out this
inconsistent state, so as precautionary measure the Eventlogging mysql
consumers have been stopped.
A couple of notes:
- The Eventlogging machinery is working as expected, except mysql insertion
of course.
- The HDFS data has not been affected by this issue.
Please check the task for more updates, or follow up with the Analytics
team on IRC (#wikimedia-analytics on freenode).
Thanks and sorry for the trouble!
Luca (on behalf of the Analytics team)
Hi everybody,
tomorrow EU morning (Wed Mar 7th) I'd need to reboot stat100[56] and
analytics1003 for kernel security updates. Hive and Oozie (Analytics Hadoop
cluster) will not be available for a (hopefully) brief period of time.
Please let me know if there is an important work that you are doing that
cannot be stopped and the maintenance will be postponed accordingly :)
Tracking task: https://phabricator.wikimedia.org/T188594
Thanks!
Luca (on behalf of the Analytics team)
Sorry, forwarding to Analytics...
Hi Angelina,
I don't think there's any (legal) way of tracking Wikipedia traffic.
All Wikipedia traffic data is protected by WMF's privacy policy[1]
and handled accordingly.
We do, however, provide public sanitized high-level statistics on page
views for Wikipedia in various ways (not to specific companies or
organizations, but rather to the world at large). What "Next Big Sound"
is probably doing, is consuming one of those public sources, but we
don't know which one.
These are 2 of the main sources this company might be grabbing stats from:
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviewshttps://dumps.wikimedia.org/
Cheers!
[1] https://wikimediafoundation.org/wiki/Privacy_policy
On Fri, Mar 2, 2018 at 5:16 PM, Marcel Ruiz Forns <mforns(a)wikimedia.org>
wrote:
> Hi Angelina,
>
> I don't think there's any (legal) way of tracking Wikipedia traffic.
> All Wikipedia traffic data is protected by WMF's privacy policy[1]
> and handled accordingly.
>
> We do, however, provide public sanitized high-level statistics on page
> views for Wikipedia in various ways (not to specific companies or
> organizations, but rather to the world at large). What "Next Big Sound"
> is probably doing, is consuming one of those public sources, but we
> don't know which one.
>
> These are 2 of the main sources this company might be grabbing stats from:
> https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews
> https://dumps.wikimedia.org/
>
> Cheers!
>
> [1] https://wikimediafoundation.org/wiki/Privacy_policy
>
>
> On Fri, Mar 2, 2018 at 4:19 PM, Marcel Ruiz Forns <mforns(a)wikimedia.org>
> wrote:
>
>> Oh, forgot the subscribe link, here:
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>> Cheers!
>>
>> On Fri, Mar 2, 2018 at 4:18 PM, Marcel Ruiz Forns <mforns(a)wikimedia.org>
>> wrote:
>>
>>> Hi Angelina,
>>>
>>> I'm the administrator of this mailing-list. Just to let you know that
>>> your email was automatically filtered out by the mailing-list bot because
>>> your address is not subscribed to it. I just unblocked it, so yopu will
>>> receive a response in short. However, please subscribe to send further
>>> emails to the list.
>>>
>>> Thanks!
>>>
>>>
>>> On Wed, Feb 28, 2018 at 5:04 PM, BTShasSTOLENmyHEART <
>>> zangeliniz(a)gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I recently spoke with "Next Big Sound" which is a company that tracks
>>>> Wikipedia page views on certain artists. They informed me that they got
>>>> details of the views directly from Wikipedia (because I had emailed them
>>>> that the View counts mentioned on Wikipedia and Next Big Sound show a major
>>>> discrepancy). There are rumors flying about saying that the information
>>>> only gathered is from Desktop Views, in which the counts are extremely
>>>> similar. Is there any way you can confirm this as true? Or is there another
>>>> method you also count that is gathered for other companies that collect
>>>> views? I know you have no idea of what Next Big Sound is presenting to the
>>>> world wide audience, but I wanted to know if you can explain what
>>>> information is given to Next Big Sound in terms of data. Thank you
>>>>
>>>>
>>>> Sincerely,
>>>>
>>>> Angelina Zamora
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics(a)lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>>
>>> --
>>> *Marcel Ruiz Forns*
>>> Analytics Developer
>>> Wikimedia Foundation
>>>
>>
>>
>>
>> --
>> *Marcel Ruiz Forns*
>> Analytics Developer
>> Wikimedia Foundation
>>
>
>
>
> --
> *Marcel Ruiz Forns*
> Analytics Developer
> Wikimedia Foundation
>
--
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation