Re: [Analytics] Data Collection

21 Dec 2015

Caitlin, it sounds to me like you might benefit from knowing where the
pageviews come from (in your case Australia and New Zealand).  We have been
trying to release that data but it's much more sensitive so we have to be
very careful how we aggregate.  If you think that would be useful, you can
always post a task in phabricator and try to find someone who has access to
our internal cluster to run some specific queries for you.  In that case,
knowing exactly what you want and exactly what format you want it in would
be helpful.

On Sun, Dec 20, 2015 at 5:14 PM, &lt;Caitlin.Gardner(a)csiro.au&gt; wrote:

...
  Hi Dan,

 The aim of our project is to determine whether we can establish a
 prediction technique for high impact (not high risk) species before they
 enter Australia and New Zealand. We are using data for species of 18
 industries that have already entered and are high or low impact species (at
 this stage removing moderate impact). Monthly pageviews going back further
 than May 2015 would be useful (or even daily pageviews, but monthly would
 suffice). I should be able to use this response for my analysis. Pageview
 data may only show us high risk pest species - but it is all worth an
 investigation for us.

 Unfortunately I'm not familiar with the methods used to access the older
 data - but the links you have all sent me will be useful and if I decide I
 need more data, I can try those methods myself.

 Thank you all for your help! I should be okay from here.

 Cheers,
 Caitlin

 -----Original Message-----
 From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf
 Of analytics-request(a)lists.wikimedia.org
 Sent: Saturday, 19 December 2015 5:46 AM
 To: analytics(a)lists.wikimedia.org
 Subject: Analytics Digest, Vol 46, Issue 38

 Send Analytics mailing list submissions to
         analytics(a)lists.wikimedia.org

 To subscribe or unsubscribe via the World Wide Web, visit
         https://lists.wikimedia.org/mailman/listinfo/analytics
 or, via email, send a message with subject or body 'help' to
         analytics-request(a)lists.wikimedia.org

 You can reach the person managing the list at
         analytics-owner(a)lists.wikimedia.org

 When replying, please edit your Subject line so it is more specific than
 "Re: Contents of Analytics digest..."

 Today's Topics:

    1. Re: Data collection (Dan Andreescu)

 ----------------------------------------------------------------------

 Message: 1
 Date: Tue, 15 Dec 2015 08:50:11 -0500
 From: Dan Andreescu &lt;dandreescu(a)wikimedia.org&gt;
 To: "A mailing list for the Analytics Team at WMF and everybody who
         has an interest in Wikipedia and analytics."
         &lt;analytics(a)lists.wikimedia.org&gt;
 Subject: Re: [Analytics] Data collection
 Message-ID:
         <CA+aepCRqZ9YwHPCdo-1F-2uF-Vy3OiBZj=
 PjCD-MturFy0qyVA(a)mail.gmail.com&gt;
 Content-Type: text/plain; charset="utf-8"

 Hi Caitlin,

 Using the python client for the pageview API (
 https://github.com/mediawiki-utilities/python-mwviews), you could do:

 from mwviews.api import PageviewsClient
 p = PageviewsClient()
 p.article_views('en.wikipedia',

['Abacarus_hystrix','Acarus_siro','Aceria_tosichella','Acyrthosiphon_pisum','Ahasverus_advena','Anthrenus_flavipes','Aphis_craccivora','Arhopalus','Balaustium_medicagoense','Bemisia_tabaci','Brevicoryne_brassicae','Bruchus','Ceratitis_capitata','Cicadulina','Cryptolestes','Daktulosphaira_vitifoliae','Delia','Ephestia_elutella','Ephestia_kuehniella','Etiella_behrii','Frankliniella_occidentalis','Frankliniella','Henosepilachna_vigintioctopunctata','Heteronychus_arator','Lachesilla_quercus','Lasioderma_serricorne','Liposcelis_bostrychophila','Macrosiphum_euphorbiae','Marchalina_hellenica','Myzus_persicae','Naupactus','Nezara_viridula','Oligonychus_ununguis','Oryzaephilus_surinamensis','Panonychus_ulmi','Penthaleus','Pieris_rapae','Piezodorus','Plodia_interpunctella','Plutella_xylostella','Rhopalosiphon','rhopalosiphum_maidis','Rhopalosiphum_padi','Rhyzopertha_dominica','Sirex_noctilio','Sitophilus_granarius','Sitophilus_oryzae','Sitotroga_cerealella','Sminthurus_viridis','Spodoptera_exempta','Stegobium_paniceum','Tetranychus','Thrips_palmi','Thrips','Tribolium_castaneum','Tribolium_confusum','Trogoderma_granarium','Trogoderma'],
 start='20150501')

 Some of the articles in your list don't exist on en.wikipedia (like
 'Frankliniella') but for what exists this returns the views as far back as
 we have them.  When we're done filling up the API we'll have data back to
 May 2015, but right now it only goes to August.  If you need it back
 further, you have to parse the dumps as others have said.  I'm curious why
 you need the older data, it's interesting to us as we try to figure out
 what else to expose through the API.  Would monthly pageviews be just as
 good?

 I attached the result of that query here in JSON format

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Data Collection