Analytics January 2014

analytics@lists.wikimedia.org

38 participants
24 discussions

Re: [Analytics] Visualizing Indic Wikipedia projects.

by sumandro

Erik, Thanks a lot for the appreciation. As Sajjad mentioned, we have already obtained a edit-per-location dataset from Evan (Rosen) that has the following column structure: *language,country,city,start,end,fraction,ts* *start* and *end* denote the beginning and ending date for counting the number of edits, and *ts* is time stamp. The *fraction*, however, gives a national ratio of edit activity, that is it gives the ratio of 'total edits from that city for that language Wikipedia project' divided 'total edits from that country for that language Wikipedia project'. Hence, it cannot be used to understand global edit contributions to a Wikipedia project (for a time period). It seems that the original data (from where this dataset is extracted) should also have the global fractions -- total edit from a city divided by total edit from the whole world, for a project, for a time period. Would you know if the global fractions can also be derived from the XML dumps? Or, even better, is the relevant raw data available in CSV form somewhere else? Bests, sumandro ------------- sumandro ajantriks.net On Wednesday 15 May 2013 12:32 AM, analytics-request(a)lists.wikimedia.org wrote: > Send Analytics mailing list submissions to > analytics(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/analytics > or, via email, send a message with subject or body 'help' to > analytics-request(a)lists.wikimedia.org > > You can reach the person managing the list at > analytics-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Analytics digest..." > > ---------------------------------------------------------------------- > > > Date: Tue, 14 May 2013 19:40:00 +0200 > From: "Erik Zachte" <ezachte(a)wikimedia.org> > To: "'A mailing list for the Analytics Team at WMF and everybody who > has an interest in Wikipedia and analytics.'" > <analytics(a)lists.wikimedia.org> > Subject: Re: [Analytics] Visualizing Indic Wikipedia projects. > Message-ID: <016f01ce50ca$0fe736b0$2fb5a410$(a)wikimedia.org> > Content-Type: text/plain; charset="iso-8859-1" > > Awesome work! I like the flexibility of the charts, easy to switch metrics > and presentation mode. > > > > 1. WMF has never captured ip->geo data on city level, but afaik this is > going to change with Kraken. > > > > 2. Total edits per article per year can be derived from the xml dumps. I may > have some csv data that come in handy. > > For edit wars you need track reverts on an per article basis, right? That > can also be derived from dumps. > > For long history you need full archive dumps and need to calc checksum per > revision text. (stub dumps have checksum but only for last year or two) > > > > Erik Zachte > > >

10 years, 1 month

the use of the templates: comparison between different wikipedias

by Yury Katkov

Hi everyone! Has anyone tried to observer how different wikipedias use the templates: how often, what's the average depth of template calls, etc? ----- Yury Katkov, WikiVote

10 years, 1 month

Tracking page creations, moves, and deletions

by Steven Walling

Hi all, On the Growth team, we (and by we, I mean Aaron Halfaker) have been doing a great deal of work to understand trends in new article creation,[1] particularly from the new user perspective. Along with this and our launch of the new Draft namespace, we've discovered that our current data sources for tracking page creations, moves, and deletions are far too slow and awkward to use on a daily or weekly basis. To solve this problem and answer on-going questions about how many page creators there are, how successful they are, and what workflows they use, we've created three new schemas: - https://meta.wikimedia.org/wiki/Schema:PageCreation - https://meta.wikimedia.org/wiki/Schema:PageDeletion - https://meta.wikimedia.org/wiki/Schema:PageMove We envision using similar to how we're using schemas like Schema:ServerSideAccountCreation and Schema:PrefUpdate. We will likely be implementing these in our team's next sprint, starting on February 5th, so if you have feedback please speak up soon. :) 1. https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation -- Steven Walling, Product Manager https://wikimediafoundation.org/

10 years, 2 months

Limn graph automation

by Jon Robson

I was a bit irritated yesterday to learn that we can automate the creation of Limn graphs and speed up the process. I had become so tired of manually copying and pasting existing graphs and manually editing them to work for a new graph that I knocked up a script to do this for me. The script simply took an SQL query and the config file and generates all the necessary JSON files for it so that it shows up on the Limn dashboard. With this script I was able to generate 5 graphs in the time it takes me to generate 1. However since uploading the script [1] I have now learnt other scripts like this exist. Please can we standardise on a way to generate these graphs (either locally or on the server) and detail it in the README to make this whole process of graph generation nicer for everyone involved? I've added some graphs (which should update soon) that show activity in the left navigation menu, on the watchlist page and on the diff page. We had this data so it seemed silly not to display it somewhere. When the data becomes available you'll notice that interestingly 'Home' link in the main menu is our most widely used feature. It will be great to see how that changes when search becomes available on special pages. Likewise random is a very widely used feature - we should continue experimenting with that and try and use it to engage new editors. [1] https://gerrit.wikimedia.org/r/#/c/110271/2/generate-graph.py [2] http://mobile-reportcard.wmflabs.org/#other-graphs-tab

10 years, 2 months

berkeley analytics stack

by Toby Negrin

One of the more interesting analytics stacks being built on top of Hadoop is coming out of UC Berkeley. https://amplab.cs.berkeley.edu/software/ Particularly interesting is SparkR, which they just blogged about here: https://amplab.cs.berkeley.edu/2014/01/26/large-scale-data-analysis-made-ea… We don't want to get ahead of ourselves; after all, we're just starting to get some page view data into HDFS, but it's important to understand that one of the reasons we like Hadoop is the ecosystem of open source tools built around it. -Toby

10 years, 2 months

Serverside Events not being logged for Echo

by Jon Robson

In the creation of this graph [1] - I noticed that for some reason no events for Thanks are being recorded to the database anymore. I raised a bug [2] but according to another bug [3] events on the server side are being logged to a file but no the database itself. According to the graph this happened January 9th - any ideas to why? [1] http://mobile-reportcard.wmflabs.org/#other-graphs-tab [2] https://bugzilla.wikimedia.org/show_bug.cgi?id=60550 [3] https://bugzilla.wikimedia.org/show_bug.cgi?id=60555

10 years, 2 months

Welcome Charles Salvia

by Toby Negrin

Hi Everyone, Please extend a warm welcome to the newest member of the Analytics team, Charles Salvia. We're really excited to have Charles on the team! In his own words: Charles Salvia is a software engineer/open-source enthusiast who has been programming computers since the days of MS-DOS. Charles worked for a (very small) startup, where he developed a search-engine and webcrawler from scratch, and spent plenty of time researching natural language processing, computational morphology, and machine learning, which eventually landed him a job working at Bloomberg working on web-crawlers and pattern recognition algorithms. Charles regularly posts on Stackoverflow, and contributes to the Boost C++ project, as well as WebKit. Charles can be contacted at csalvia(a)wikimedia.org. He'll be working out of New York. Welcome to the Foundation, Charles! -Toby

10 years, 3 months

Y jjjjuyug

by Fernando Siles

Enviado desde mi iPhone

10 years, 3 months

Unique registered bots drop in February 2013

by Dario Taraborelli

I noticed that the number of daily unique registered bots with at least 1 ns0 edit on enwiki saw a steep drop in February 2013. Does anybody have a clue about the cause, before I dig further into this data? https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0Ams-fyukCIlMdH… Thanks, Dario

10 years, 3 months

Doing housekeeping work on eventlogging - small outage

by Nuria Ruiz

Team: We are going to be doing some maintenance changes in the EventLogging database, you can expect a brief outage of about couple hours. Thanks, Nuria

10 years, 3 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics January 2014