Hi,
The Mobile team will be running WikiGrok experiments in the first half
of March 2015. Dario and I will be working closely with the team and will
coordinate with Analytics-devs to make sure EventLogging can handle the
throughput. The expected throughput is what EL experienced through the last
WikiGrok experiment in early January. This email is a heads up since EL has
limited capacity, other teams may want to run experiments, and we need to
plan for experiments in advance.
Best,
Leila
Who is still using s1-analytics-slave, and for what sorts of things?
The analytics-store is just an alias for dbstore1002, which is about to be
duplicated as dbstore2002 in CODFW with all wikis and eventlogging.
We could potentially dub dbstore2002 as "analytics-slave" or similar, and
reclaim the old s1-analytics-slave in EQIAD -- which is really db1047, out
of warranty, and likely one of a batch of database nodes that will be
decommissioned this year.
dbstore2002 has nearly 3x more memory than db1047, and twice as many cores,
so you guys would be gaining from this transaction ;-)
Sean
--
DBA @ WMF
FYI,
Yesterday Christian and re-enabled bits webrequest log production from the varnish hosts. We also updated the oozie jobs that process these, so that now, the refined webrequest table includes all webrequest sources. Also, as of 2015-02-13T00:00, the refined table’s is_pageview field is using the newer pageview definition that counts 304 requests.
-Ao
(Thanks Christian!)
Hey all,
The pageviews stored at stats.wikimedia.org and the Vital Signs
dashboards showed a substantial drop in pageviews to Wikimedia
Commons, primarily from mobile, beginning on 1 January 2015. I was
tasked with investigating and I'm reporting what I found so that we
have a note of the problems this brings up.
>From an investigation of requests to that site at that time, it
appears that this is a perfect storm of known deficiencies in the
legacy pageviews definition, fundraising changes, and mobile changes.
To summarise:
1. The legacy Pageviews definition contains Special pages, including
Special:BannerRandom and Special:HideBanner;
2. The mobile website was historically loading things from Commons in
such a way as to trigger calls to Special:HideBanner, which were
picked up by the legacy definition as "pageviews to commons";
3. The Mobile team deployed changes to their image loading setup at
the end of December that stopped this from happening, and that
coincided with the disabling of the Fundraising primary campaign.
4. The result of this was an apparent massive drop in traffic to
Commons from the mobile site - when the actual inaccuracy was the
inclusion of that traffic in the first place.
There are several lessons to be learned from this. First, it is worth
reiterating the deficiencies and inaccuracies inherent in the legacy
pageview definition, many (but certainly not all) of which centre on
how it treats the fundraising banners. We are working as rapidly as we
can to completely deprecate this definition, replacing it with a new
one which is not subject to this kind of variation. We are currently
in the middle of performing final QA testing on the new definition:
once it is satisfactory, we will deploy it as soon as humanly possible
and deprecate the legacy definition.
Second, let me emphasise how critical it is that the teams building
MediaWiki and our instances of it - Platform, Operations, Mobile, you
name it - keep us in the loop about changes that they make. This was a
very dramatic shift in client logic around requests: it flew under our
radar. We should have a process in place for letting Analytics know
about these changes before they happen so that we do not end up with
inaccurate data and a constant game of catchup.
Thanks,
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
I have not seen updated statistics on Wikipedia article traffic statistics .
| |
| | | | | |
| Wikipedia article traffic statisticsWikipedia article traffic statistics What do Wikipedia's readers care about? Is Britney Spears more popular than Brittany? Is Asia Carrera more popular than Asia? |
| |
| View on stats.grok.se | Preview by Yahoo |
| |
| |
Hey folks,
Dario and I just updated the scholarly citations dataset to include Digital
Object Identifiers. We found 742k citations (524k unique DOIs) in 172k
articles. Our spot checking suggests that 98% of these DOIs resolve. The
remaining 2% were extracted correctly, but they appear to be typos.
http://dx.doi.org/10.6084/m9.figshare.1299540
Like the dataset that we released for PubMed Identifiers, this dataset includes
the first known occurrence of a DOI citation in an English Wikipedia
article and the associated revision metadata, based on the most recent
complete content dump of English Wikipedia.
Feel free to share this with anyone interested via:
https://twitter.com/WikiResearch/status/564908585008627712
We'll be organizing our own work and analysis of these citations here:
https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wik…
-Aaron
Team:
EL has dropped events for about 8 hours last night. The analytics team
shall work on backfilling that data. Here is the backlog item associated to
that task:
https://phabricator.wikimedia.org/T88692
Thanks,
Nuria
Hi Guys,
I have to use your data for an assignment and need to know what is the
timezone to which you are naming your pagecount files?
Thanks a lot!
Mike