Thanks a lot for the appreciation.
As Sajjad mentioned, we have already obtained a edit-per-location
dataset from Evan (Rosen) that has the following column structure:
*start* and *end* denote the beginning and ending date for counting the
number of edits, and *ts* is time stamp.
The *fraction*, however, gives a national ratio of edit activity, that
is it gives the ratio of 'total edits from that city for that language
Wikipedia project' divided 'total edits from that country for that
language Wikipedia project'. Hence, it cannot be used to understand
global edit contributions to a Wikipedia project (for a time period).
It seems that the original data (from where this dataset is extracted)
should also have the global fractions -- total edit from a city divided
by total edit from the whole world, for a project, for a time period.
Would you know if the global fractions can also be derived from the XML
dumps? Or, even better, is the relevant raw data available in CSV form
On Wednesday 15 May 2013 12:32 AM, analytics-request(a)lists.wikimedia.org
> Send Analytics mailing list submissions to
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
> You can reach the person managing the list at
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Analytics digest..."
> Date: Tue, 14 May 2013 19:40:00 +0200
> From: "Erik Zachte" <ezachte(a)wikimedia.org>
> To: "'A mailing list for the Analytics Team at WMF and everybody who
> has an interest in Wikipedia and analytics.'"
> Subject: Re: [Analytics] Visualizing Indic Wikipedia projects.
> Message-ID: <016f01ce50ca$0fe736b0$2fb5a410$(a)wikimedia.org>
> Content-Type: text/plain; charset="iso-8859-1"
> Awesome work! I like the flexibility of the charts, easy to switch metrics
> and presentation mode.
> 1. WMF has never captured ip->geo data on city level, but afaik this is
> going to change with Kraken.
> 2. Total edits per article per year can be derived from the xml dumps. I may
> have some csv data that come in handy.
> For edit wars you need track reverts on an per article basis, right? That
> can also be derived from dumps.
> For long history you need full archive dumps and need to calc checksum per
> revision text. (stub dumps have checksum but only for last year or two)
> Erik Zachte
I was a bit irritated yesterday to learn that we can automate the
creation of Limn graphs and speed up the process.
I had become so tired of manually copying and pasting existing graphs
and manually editing them to work for a new graph that I knocked up a
script to do this for me. The script simply took an SQL query and the
config file and generates all the necessary JSON files for it so that
it shows up on the Limn dashboard.
With this script I was able to generate 5 graphs in the time it takes
me to generate 1.
However since uploading the script  I have now learnt other scripts
like this exist. Please can we standardise on a way to generate these
graphs (either locally or on the server) and detail it in the README
to make this whole process of graph generation nicer for everyone
I've added some graphs (which should update soon) that show activity
in the left navigation menu, on the watchlist page and on the diff
page. We had this data so it seemed silly not to display it somewhere.
When the data becomes available you'll notice that interestingly
'Home' link in the main menu is our most widely used feature. It will
be great to see how that changes when search becomes available on
special pages. Likewise random is a very widely used feature - we
should continue experimenting with that and try and use it to engage
Please extend a warm welcome to the newest member of the Analytics team,
Charles Salvia. We're really excited to have Charles on the team!
In his own words:
Charles Salvia is a software engineer/open-source enthusiast who has been
programming computers since the days of MS-DOS. Charles worked for a (very
small) startup, where he developed a search-engine and webcrawler from
scratch, and spent plenty of time researching natural language processing,
computational morphology, and machine learning, which eventually landed him
a job working at Bloomberg working on web-crawlers and pattern recognition
algorithms. Charles regularly posts on Stackoverflow, and contributes to
the Boost C++ project, as well as WebKit. Charles can be contacted at
He'll be working out of New York. Welcome to the Foundation, Charles!