Hi Analytics,
I've been digging through some of the wiki page count files and found some
strange results.
In several files, the Main_page visit count is vastly lower than expected:
mruttley$ cat pagecounts-20141101-170000 | grep "^en Main_page"
en Main_page 260 6202982
mruttley$ cat pagecounts-20150201-170000 | grep "^en Main_page"
en Main_page 200 4802139
Only 260 and 200 page views!
What do you reckon? Am I doing it wrong?
Best regards,
Matthew
+ Analytics list
---------- Forwarded message ----------
From: Vojtěch Dostál <vojtech.dostal(a)wikimedia.cz>
Date: Fri, Feb 27, 2015 at 2:19 AM
Subject: [Wikimetrics] Monthly metrics
To: wikimetrics(a)lists.wikimedia.org
Hi, does anyone please know when the Monthly metrics for Wikimedia projects
will get updated?
http://stats.wikimedia.org/CS/TablesWikipediaCS.htm
I know it says that the process is delayed but it would actually be nice if
someone would look into the problem - I am using these metrics to analyze
our impact.
thanks
Vojtěch Dostál
místopředseda / vice-chairman
Wikimedia Česká republika / Wikimedia Czech Republic
http://www.wikimedia.cz
Facebook <https://www.facebook.com/Wikimedia.CR> | Twitter
<https://twitter.com/Wikimedia_CR> | Newsletter <http://eepurl.com/FsHJr>
_______________________________________________
Wikimetrics mailing list
Wikimetrics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
jmorgan(a)wikimedia.org
Hi,
just a quick heads up that the Analytics cluster got stuck today. And
jobs deadlocked themselves waiting for other jobs to free resources.
For the time being, to allow the cluster to catch up for the missed
hours, I suspended the refining jobs.
This gives the cluster enough resources to catch up with importing the
kafka data that it missed during the day.
But this also means that the datasets:
pagecounts-all-sites,
pagecounts-raw,
legacy_tsvs
will fall behind a bit, and the wmf.webrequest data will not see new
data while the cluster is catching up.
Tomorrow, in the European morning when the cluster has caught up, I'll
enable refining again, and the datasets should catch up again.
Sorry for the inconveniences,
Christian
P.S.: Suspending refining looks a bit drastic. But if we only killed
the resource hungry jobs without stopping refining, refining would
start during the catch up of camus and produce faulty datasets.
Hence, we suspended refining for now. Tomorrow, we'll resume the
suspended jobs and have the datasets catch up again.
P.P.S.: If you have resource hungry jobs on the Analytics cluster, if
possible please wait until tomorrow to run them.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hey all,
As part of the quality assurance work on the new pageviews definition
I've hand-coded 20,000 rows from the webrequests table that the new
definition identifies as pageviews - 10,000 from mobile, and 10,000
from desktop, spread out and pseudo-randomly sampled over multiple
days and hours.
TL;DR it looks really promising, but we need some expansion of how
we're storing pageIDs, and to filter out edit attempts on the desktop
side. And we still have a lot of handcoding to do.
On Mobile, the definition is doing exactly what we expect it to do,
and including precisely the classes of pages we want. There is,
seriously, a 100% success rate there. The only limiting factor is
around turning "pageviews" (views of our HTML content) into the sort
of pageviews that can be aggregated on a per-page basis - in other
words, grabbing the pageID and namespace. A recent patch to MediaWiki
by Ori, Otto and others means that the pageID and namespace are now
automatically passed through to the varnish, which makes this a LOT
easier.
Buuuut...they're not being passed through for app requests, which is a
big blind spot if we assume apps behave differently. They're also not
being passed through for, e.g., index.php?action=render style
requests.
On Desktop, the definition is doing /almost/ what we want it to do.
The big problem is that due to a change in the MIME type edit requests
report with, it's including edit attempts: whoops. We should be able
to filter this with a trivial regex change...I think. I'd need a
better idea of whether URL parameters tend to be localised.
So, promising, needs edits filtered!
The next step is a further round of hand-coding, this time targeted at
requests the new definition /excludes/.
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
Hey all!
We've released a highly-aggregated dataset of readership data -
specifically, data about where, geographically, traffic to each of our
projects (and all of our projects) comes from. The data can be found
at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
put together an exploration tool for it at
https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
CC-ing Ori. He mentioned he was given a box today but no further details.
Thanks,
Nuria
On Wed, Feb 25, 2015 at 5:25 PM, Sean Pringle <springle(a)wikimedia.org>
wrote:
> On Sun, Feb 22, 2015 at 1:20 PM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
> > Coordination on Monday sounds good.
>
> Did you guys come to any conclusion about vanadium?
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
Respected Sir / Ma'am,
I, the undersigned, is a IIIrd year
undergraduate pursuing my Bachelors in Computer Science from India and is a
Data Science enthusiast.
I share deep interest in the field of Data
Science and Analytics and have done various research projects in the field
of Data Mining, Data Visualization, Social Media Analytics at institutes
like IIM Ahmedabad, IISc Bangalore etc.
As wikimedia is the world's largest
resource of education data various analytics can be performed on the data
to make various services easy and more predictable.
I want to work on any project that
wikimedia Research and Analytics team is working upon and assist them to
develop something great.
I am willing to work full time as an
intern from home without any stipend demands. I am attaching my resume with
some of my research reports which would give you a brief idea of my
previous work and knowledge in this field.
I request you to please have a look at my
resume, i am just a data geek who want to work with some of the best
professional and develop something really useful. I ensure you, given an
opportunity, i wont let you down.
Thanks a lot.
Waiting for your reply eagerly.
Regards,
Vikramank Singh,
Co-Founder at www.notemybook.in
+91-9768 251 481
FYI
-------- Messaggio inoltrato --------
Oggetto: [Xmldatadumps-l] Your comments needed (long term dumps rewrite?)
Data: Thu, 19 Feb 2015 12:30:01 +0200
Mittente: Ariel Glenn WMF <ariel(a)wikimedia.org>
A: Xmldatadumps-l(a)lists.wikimedia.org
The MediaWiki Core team has opened a discussion about getting more
involved in and maybe redoing the dumps infrastructure. A good starting
point is to understand how folks use the dumps already or want to use
them but can't, and some questions about that are listed here:
https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_Core_Team/Backlog/Improv…
I've added some notes but please go weigh in. Don't be shy about what
you do/what you need, this is the time to get it all on the table.
Ariel