Analytics October 2013

analytics@lists.wikimedia.org

36 participants
27 discussions

Updated ScanMail aka Wikimedia mail stats
by Federico Leva (Nemo) 26 Oct '13

26 Oct '13

http://www.infodisiac.com/Wikipedia/ScanMail/ has not been updated since February, so I've made an update myself: https://toolserver.org/~nemobis/ScanMail/ If you want to add aliases for your name or someone else's, it's easy to submit patches, even for a script monkey like me, [1] so you can just edit the list after %aliases in <https://git.wikimedia.org/blob/analytics%2Fwikistats.git/HEAD/mail-lists%2F…>,[2] without flooding Erik's email. :) A useful part to update is the list of WMF board members, in the same file after the line "my $board = $false ;": can anybody do it? Updating the stats only requires download of some 300 MB archives and less than 5 minutes of your CPU: if someone else wants to take over the update and hosting, please do so. Nemo [1] https://gerrit.wikimedia.org/r/#/projects/analytics/wikistats,dashboards/de… https://www.mediawiki.org/wiki/Git/Tutorial [2] Currently <https://git.wikimedia.org/blob/analytics%2Fwikistats.git/73240944e60acf409f…>

1 0

What kind of traffic data would you be interested to analyze from NCBI?
by Daniel Mietchen 26 Oct '13

26 Oct '13

Dear all, I am at http://www.ncbi.nlm.nih.gov/ today and have mentioned that it would be interesting for our community to see how much traffic they get from Wikimedia servers. They said that they would be willing to look into the possibility of providing such data, but before embarking on that, they would like to get an overview of what kind of data or analyses people would be interested in. If you have suggestions in this regard, please post them at https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_National_Institute…. Thanks and cheers, Daniel

1 0

Join the inaugural Wiki Research Hackathon on November 9
by Dario Taraborelli 26 Oct '13

26 Oct '13

Cross-posting the announcement from the Wikimedia Blog. The details of the event are on Meta and we're also creating meetup.com pages for the local events. Check them out and RSVP if you're planning to attend. Looking forward to see you on November 9! Dario, on behalf of the organizers Join the inaugural Wiki Research Hackathon on November 9 Last summer at Wikimania in Hong Kong, the annual global Wikimedia conference, we (a group of Wikipedia researchers) discussed how we could make wiki researchmore impactful. In our work in academia and on Wikimedia projects, we saw a host of missed opportunities to share ideas, hypotheses, code, and research methods. We set out to create a space to bring researchers together with Wikipedians and facilitate problem solving, discovery and innovation with the use of open data and open source tools. Labs2 (L2) aims to build this space, by providing infrastructure and venues for collaborative wiki research. Today we’re thrilled to announce the inaugural Wiki Research Hackathon – a global event hosted by Wikimedia Foundation researchers, academic researchers and Wikipedians from around the world on Saturday, November 9, 2013. What This hackathon is an opportunity for anyone interested in research on wikis, Wikipedia, and open collaboration to meet, share ideas, and work together. It is targeted at Wikipedia editors, students, researchers, coders and anyone interested in designing new tools, statistics and data visualization, and producing new knowledge about Wikimedia projects and their communities. The goal of this event is to: share knowledge about research tools and datasets (and how to use them) ask burning research questions (and learn how to answer them) get involved in ongoing research projects (or start new ones) design new data-driven apps and tools (or hack existing ones) Where (Locations are approximate) This hackathon will be held both as a series of local meetups (Perth, Mannheim, Oxford,Rio de Janeiro, Chicago, Minneapolis, San Francisco, Seattle, etc.) and virtual meetups (Asia/Oceania, Europe/Africa & The Americas) for those who can’t make it to the local events. An IRC channel (#wikimedia-labsconnect) and a Google Hangout open throughout the day will allow attendees to connect online. How Interested attendees can sign up for the event on Meta-wiki. Local and virtual meetups are listed on theevent page. All you need to do is add your name to the list of participants for the event that makes sense for you. Who For any question about the event (including volunteering for a local meetup), you can reach us at wrh(a)wikimedia.org or leave a message on the hackathon’s talk page on Meta-wiki. We look forward to seeing you on November 9. Aaron Halfaker, Wikimedia Foundation Jonathan Morgan, Wikimedia Foundation Morten Warncke-Wang, University of Minnesota Aaron Shaw, Northwestern University Dario Taraborelli, Wikimedia Foundation Taha Yasseri, Oxford University Henrique Andrade, Wikimedia Foundation

1 0

Upcoming mobile request logs from ulsfo hosts
by Mark Bergsma 25 Oct '13

25 Oct '13

Hi, I've just configured 4 new ulsfo servers as mobile caches: cp4011.ulsfo.wmnet cp4012.ulsfo.wmnet cp4019.ulsfo.wmnet cp4020.ulsfo.wmnet These are listed in cache.pp accordingly. Please make sure that mobile requests from these servers are accounted for, and let me know when this is the case. I'd like to start giving them traffic ASAP. :) Thanks, -- Mark Bergsma <mark(a)wikimedia.org> Lead Operations Architect Wikimedia Foundation

3 2

Re: [Analytics] Danger zone: EventLogging editing data (action needed)
by Arthur Richards 25 Oct '13

25 Oct '13

Greetings Ori and analytics team. Is there documentation somewhere about how and where to determine that EventLogging events are being properly recorded? We had to do a quick deployment last night to change schemas and how we handle schemas internally for MF and realized after deploying the changes that the best we could do was make sure that the events were firing - we had no idea how to inspect the pipeline, and everyone who did know was asleep/offline/etc (see below for more backstory). On Thu, Oct 24, 2013 at 10:21 AM, Jon Robson <jrobson(a)wikimedia.org> wrote: > I can confirm we are still logging. > In terms of stat1.wikimedia.org access I forget exactly how I did it > but you will need to talk to someone in analytics - maybe Dario to get > setup there. I'd recommend doing this sooner rather than later to > avoid this problem again. > > On Wed, Oct 23, 2013 at 7:31 PM, Arthur Richards > <arichards(a)wikimedia.org> wrote: > > > > On Wed, Oct 23, 2013 at 6:57 PM, Jon Robson <jrobson(a)wikimedia.org> > wrote: > >> > >> Note if the events are firing and there are no errors in the console > then > >> the change was successful :) If someone can double check they are > showing up > >> on stat1 though even better! > > > > > > Are there details published somewhere on how to do this? After Kaldari > got > > the changes out successuflly, we realized neither of us knew how to > check on > > stat1 nor could I quickly find docs. > > > > -- > > Arthur Richards > > Software Engineer, Mobile > > [[User:Awjrichards]] > > IRC: awjr > > +1-415-839-6885 x6687 > -- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687

8 11

Seconds variance in page view stats filenames
by Brian Keegan 22 Oct '13

22 Oct '13

Apologies if I missed some documentation or prior discussion about this, but is there a reason why the seconds field in the /pagecounts-raw/ dump files vary? It seems unnecessary to scrape and parse the html to get the true filenames (e.g., pagecounts-20131021-160013.gz) instead of being able to pass clean filenames (e.g., pagecounts-20131021-160000.gz) especially when there's no true precision needed at the second-level here. Is it unreasonable to request that these be renamed to a more consistent and clean format? Thanks! Brian

1 0

Number of non compressed requests?
by Matthew Walker 19 Oct '13

19 Oct '13

tldr; Do we have data on the number of compressed vs. uncompressed requests we serve? Hey all, I'm investigating a fundraising issue where it appears that banners that should be about the same size compressed, but which are differently sized upon decompression, show markedly different conversion rates (the only thing that's different about them is the name of the banner which affects content length). One of the angles I'm investigating is if perhaps we're serving a significant number of banners uncompressed; which would affect the amount of time it takes to appear on the site. If we have this data already, I can compare it to data that I'm going to take from the banner stream [1]. Alternate things I'm considering is if it takes the caching layer longer to retrieve serve certain banner content and/or cache keys. -- The Data -- For the truly curious; the two tests I've run so far that have led me down this path are: have two banners with the same content (cloned) but with different names. As names get substituted into the banners multiple times through keyword expansion the content lengths will be different. See how many clicks each banner gets. This is multivariate with the two variables being content length, and cache key. Cache key setup 1 (Long name has a worse spot in the cache): Short Name: 0.22% success rate (155300 samples) Long Name: 0.19% success rate (160800 samples) The 95% confidence interval has the long name performing from -31% to 3% worse than the short name with a power of 0.014. Cache key setup 2 (Long name has a better spot in the cache): Short Name: 0.20% success rate (294900 samples) Long Name: 0.19% success rate (309500 samples) The 95% CI here still has the long name performing worse; but with power that is effectively not useful. [1] https://gerrit.wikimedia.org/r/#/c/90667/ ~Matt Walker Wikimedia Foundation Fundraising Technology Team

3 3

Question around wikimetrics output
by Jessie Wild 19 Oct '13

19 Oct '13

Is there a way to link up the username IDs which are included in the output with the actual usernames? Thank you so much! -- *Jessie Wild Grantmaking Learning & Evaluation * *Wikimedia Foundation* * * Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us make it a reality! Donate to Wikimedia <https://donate.wikimedia.org/>

8 17

Fwd: [Wikidata-l] Statistics
by Erik Moeller 18 Oct '13

18 Oct '13

FYI, useful new stats :) We might want to build a directory of reports generated on ToolLabs somewhere in the analytics hub on mediawiki.org. Erik ---------- Forwarded message ---------- From: Gerard Meijssen <gerard.meijssen(a)gmail.com> Date: Thu, Oct 17, 2013 at 10:26 PM Subject: [Wikidata-l] Statistics To: WikiData-l <wikidata-l(a)lists.wikimedia.org> Hoi, I do not know if you have seen the statistics compiled by Magnus [1]. They are up to date and useful. I blogged about it [2]. As far as I am concerned, the biggest challenge we face is the lack of labels. Given that 280+ languages are represented in Wikidata it clearly demonstrates that Wikidata is useless as it is for most languages. Please tell me that I am wrong and explain why. Thanks, GerardM [1] http://tools.wmflabs.org/wikidata-todo/stats.php [2] http://ultimategerardm.blogspot.nl/2013/10/statistics-for-wikidata.html _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation

4 6

Importing data into Hadoop
by Dan Andreescu 18 Oct '13

18 Oct '13

Hi, I spoke to Dario today about investigating uses for our Hadoop cluster. This is an internal cluster but it's mirrored on labs so I'm posting to the public list in case people are interested in the technology and hearing what we're up to. The questions we need to answer are : - What's an easy way to import lots of data from MySQL without killing the source servers? We've used sqoop and drdee's sqoopy but these would hammer the prod servers too hard we think. - drdee mentioned a way to pass a comment with select statements to make them lower priority, is this documented somewhere? - Could we just stand up the MySQL backups and import them? - Could we import from the xml dumps? - Is there a way to do incremental importing once an initial load is done? Once we figure this out, the fun starts. What are some useful questions once we have access to the core mediawiki db tables across all projects?

3 4

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics October 2013