Hi!
This was the final sprint of Q4 of fiscal year 2012-2013. We successfully
deliverd the initial version of Wikimetrics, in total we delivered 38
points, for the coming sprint we have scheduled 30 points. For the coming
quarter, we will focus on three epics:
1) Additional metrics for Wikimetrics -
https://mingle.corp.wikimedia.org/projects/analytics/cards/779
2) Stabilize Hadoop -
https://mingle.corp.wikimedia.org/projects/analytics/cards/795
3) Canary Event Modeling -
https://mingle.corp.wikimedia.org/projects/analytics/cards/789
Apologies for cross-posting; ideally you should receive this on the
Analytics Mailinglist so we can have one focal point for conversation. If
you are not on the Analytics list then please subscribe at
https://lists.wikimedia.org/mailman/listinfo/analytics
## Defects & Features completed (Ready for Showcase/Shipping/Done) during
Sprint ending 2013-07-24 ##
#577 F - Authentication messages - Dario Taraborelli - Product (E3) -
Wikimetrics
#578 F - Job management - Dario Taraborelli - Product (E3) -Wikimetrics
#585 F - Custom repo name mapping for Gerrit/Git integration - Diederik van
Liere - Analytics - Kraken
#615 I - Puppetize Hue - Operations - Kraken
#664 F - Use language selector instead of giant project list in cohort
upload screen - Dario Taraborelli - Product (E3) - Wikimetrics
#700 F - Port static cohort user interaction - Jessie Wild - Learning &
Evaluation - Wikimetrics
#711 F - Enforce cohort description as part of the upload process - Jessie
Wild - Learning & Evaluation - Wikimetrics
#716 I - Debianization dClass-dev and dClass-dev-jni - Operations - Kraken
## Current Sprint (ending 2013-08-07) ##
Stories in progress from last sprint:
#429 F - View detailed list of jobs / requests in queue - Dario
Taraborelli - Product (E3) - Wikimetrics
New stories:
#703 F - Measure contributions made by an editor - Dario Taraborelli -
Product (E3) - Wikimetrics
#768 I - High Availability Namanode - Diederik van Liere - Anaytics - Kraken
#798 F - Setup X-CS sampled Wikipedia Zero and do reportcard analysis with
this data - Amit Kapoor - Wikipedia Zero
#817 D - Use new dclass-api in Kraken - Tomasz Finc - Mobile - Kraken
#797 F - Limited user role - Jaime Anstee - Grantmaking and Program
Evaluation - Wikimetrics
F = Feature
D = Defect
I = Infrastructure Task
S = Spike
Any mingle card can be accessed using the base url
https://mingle.corp.wikimedia.org/projects/analytics/cards/XYZ where XYZ is
the Mingle card id.
If you have any questions, comments or feedback: please let us know!
Best,
Diederik
Hi all,
Over the years we've had several serious issues with huge underreporting on
page view data due to message loss on udp2log.
There are now several diagnostic tools: alerts are sent and there is
real-time monitoring http://tinyurl.com/kqmtfss
But none of those help to quantify total monthly loss.
I upgraded an existing csv file to html report, to be updated monthly.
http://stats.wikimedia.org/wikimedia/squids/SquidDataMonthlyPerSquidSet.htm
This reports show total monthly message loss as a percentage, plus a
breakdown of message loss and traffic volume by server role and location.
Basic idea behind the report is that as we use 1:1000 sampling, for each
squid server we should find sequence numbers between logged messages to be
1000 apart, on average.
If we actually find they are 1050 apart that translates into 4.7% data loss.
On how this is calculated see
http://stats.wikimedia.org/wikimedia/squids/SquidDataMonthlyPerSquidSet.htm#
calc
I use a weighted average for calculating total percentage data loss, taking
into account data volume per server cluster, and ignoring servers where the
sequence number mechanism is still broken (ssl servers).
Role and implementation of udp2log are in flux. But in any setup it would be
good to have such overall assessment of loss.
Cheers,
Erik
Of this page...
<
http://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryB…
>
Pretty please! :)
--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
A colleague asks for a pointer to measurements of the influence of
WikiLove. Thanks.
--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation
I was poking around on stats.wikimedia.org and reportcard.wmflabs.org to
see if I could find out how overall editing levels had changed (if any)
over the past year. Unfortunately, it seems that all of our "edits per
month" graphs show all edits, including bot edits. Since changes in bot
editing levels are often dramatic from month to month, this noise
effectively cancels out the usefulness of the graphs. For example, you
can see a huge spike in March when I presume the Wikidata bots were
running at full force:
http://reportcard.wmflabs.org/#secondary-graphs-tabhttp://stats.wikimedia.org/EN/ChartsWikipediaEN.htm#3
My question is: Would it be possible to replace or augment these graphs
with graphs that exclude bot edits? I know that bot status is not stored
in the revision table, so this would be quite expensive to tally. Would
it be prohibitively expensive? Sorry if this is a dumb question.
Ryan Kaldari
On Jul 24, 2013 12:43 AM, "Ikuya Yamada" <ikuya(a)sfc.keio.ac.jp> wrote:
> It seems that the page view statistics data does not contain the
> actual data for the last few hours.
>
> http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-07/
>
> Are there any failures on the server-side?
Just looking at file sizes I can see 15, 16, and 20-05(the current hour)
UTC all look smaller than normal. (yes, something's broken)
-Jeremy
Hi folks,
Is there an API to request statistical numbers of Wikipedia? By
statistical data I mean what Wikistats provides, but there I can only
get HTML tables which I'd need to parse "by hand", at the same time I
don't know if there is a Limn server with these data, is there?
Another thing, an API providing it make sense or already exists an
easier solution?
Thank you,
--
Jonas Xavier
Ô__ ---- social: https://joindiaspora.com/i/f58dad0668db
c/ /'_ --- email: eris(a)sdf.org
(*) \(*) -- chat: eris(a)jabber.sdf.org
~~~~~~~~~~ -
FYI There is some issue indeed. Newest file are empty!
Erik
-----Original Message-----
From: wikitech-l-bounces(a)lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Ikuya Yamada
Sent: Wednesday, July 24, 2013 6:43 AM
To: wikitech-l(a)lists.wikimedia.org
Subject: [Wikitech-l] Page view stats failure
Hello,
It seems that the page view statistics data does not contain the actual data for the last few hours.
http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-07/
Are there any failures on the server-side?
Thanks,
Ikuya
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l