Hello everyone,
My name is Sajjad. Me and my colleague Sumandro are working on a project with The Centre for Internet and Society (http://cis-india.org/), India, to visualise the activities on Indic Wikipedia projects. We have already developed a few basic visualisations. You can see our work here: http://geohacker.github.io/indicwiki/
We are writing to you to ask for some help for our further work plans.
The next round of visualisation involves mainly two aspects:
1. Edits by geography - We wanted to see from where people are contributing to the Indic Wikipedia projects around the world. A city level data will be perfect for us to visualize this. Hence we want total edits for a Indic Wikipedia project from a particular location for a given period (preferably, per year)
2. Most edited articles and edit wars - We would like to explore the most edited articles per project and see if it is possible to visualise edit wars. For this we are looking for total edit counts (number of instances and volume of edit) for each article (given a threshold) for each Indic Wikipedia project per year.
Jessie Wild suggested that we write to this list for pointers in terms of the data.
Please let us know if any of you have worked with or if you can help us get hold of the relevant data.
Thank you so much!
Cheers, Sajjad.
-- Sajjad Anwar | http://sajjad.in | @geohacker
Hi Sajjad,
Thanks for your note. The WMF Language Engineering team is also starting to work on tooling our language extensions and tools for collecting a diverse set of metrics on various language wiki projects including Indic wikis.
We look forward to coordinating with you as well as let me know if you have any specific questions we can help you with. Is there any technical spec on the overall project + tools, apis you plan to use for collecting the metrics you've mentioned above.
Look forward to hearing from you. Best, Alolita
On Fri, May 10, 2013 at 3:38 AM, Sajjad Anwar me@sajjad.in wrote:
Hello everyone,
My name is Sajjad. Me and my colleague Sumandro are working on a project with The Centre for Internet and Society (http://cis-india.org/), India, to visualise the activities on Indic Wikipedia projects. We have already developed a few basic visualisations. You can see our work here: http://geohacker.github.io/indicwiki/
We are writing to you to ask for some help for our further work plans.
The next round of visualisation involves mainly two aspects:
- Edits by geography - We wanted to see from where people are
contributing to the Indic Wikipedia projects around the world. A city level data will be perfect for us to visualize this. Hence we want total edits for a Indic Wikipedia project from a particular location for a given period (preferably, per year)
- Most edited articles and edit wars - We would like to explore the
most edited articles per project and see if it is possible to visualise edit wars. For this we are looking for total edit counts (number of instances and volume of edit) for each article (given a threshold) for each Indic Wikipedia project per year.
Jessie Wild suggested that we write to this list for pointers in terms of the data.
Please let us know if any of you have worked with or if you can help us get hold of the relevant data.
Thank you so much!
Cheers, Sajjad.
-- Sajjad Anwar | http://sajjad.in | @geohacker
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Dear Alolita,
The work on tools for collecting different metrics about wiki projects sounds quite exciting. Please let us know if we can help in any way.
As for our work, as you can see from the project site on GitHub http://geohacker.github.io/indicwiki/, we have visualised static datasets (gathered from the Wikimedia Analytics Team) and did not use dynamic data sources. We have received (from WAT) datasets in CSV format and we are happy with that. It is also fine if the data is available through a database with API access.
We are looking for two sets of data: (1) city-wise annual edit contribution for each Indic Wikipedia project from across the world, and (2) list of articles with annual edit counts (unique incidents and total volume) for each Indic Wikipedia project.
We hope that these can be extracted from the database dump http://dumps.wikimedia.org/backup-index.html but we are not very familiar with it. It would be great if someone can give us hints about how to navigate the dump.
Or if someone has already collected/used similar datasets, it would be very helpful if you can suggest how to collect the above mentioned data.
Bests,
sumandro
-------------
sumandro ajantriks.net
On Friday 10 May 2013 08:06 PM, Alolita Sharma wrote:
Hi Sajjad,
Thanks for your note. The WMF Language Engineering team is also starting to work on tooling our language extensions and tools for collecting a diverse set of metrics on various language wiki projects including Indic wikis.
We look forward to coordinating with you as well as let me know if you have any specific questions we can help you with. Is there any technical spec on the overall project + tools, apis you plan to use for collecting the metrics you've mentioned above.
Look forward to hearing from you. Best, Alolita
On Fri, May 10, 2013 at 3:38 AM, Sajjad Anwar <me@sajjad.in mailto:me@sajjad.in> wrote:
Hello everyone, My name is Sajjad. Me and my colleague Sumandro are working on a project with The Centre for Internet and Society (http://cis-india.org/), India, to visualise the activities on Indic Wikipedia projects. We have already developed a few basic visualisations. You can see our work here: http://geohacker.github.io/indicwiki/ We are writing to you to ask for some help for our further work plans. The next round of visualisation involves mainly two aspects: 1. Edits by geography - We wanted to see from where people are contributing to the Indic Wikipedia projects around the world. A city level data will be perfect for us to visualize this. Hence we want total edits for a Indic Wikipedia project from a particular location for a given period (preferably, per year) 2. Most edited articles and edit wars - We would like to explore the most edited articles per project and see if it is possible to visualise edit wars. For this we are looking for total edit counts (number of instances and volume of edit) for each article (given a threshold) for each Indic Wikipedia project per year. Jessie Wild suggested that we write to this list for pointers in terms of the data. Please let us know if any of you have worked with or if you can help us get hold of the relevant data. Thank you so much! Cheers, Sajjad. -- Sajjad Anwar | http://sajjad.in | @geohacker _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/analytics
-- Alolita Sharma Director of Engineering Language Engineering (i18n/L10n) Wikimedia Foundation
Wow!
The motion chart reminds me a bit of gapminder;
Nice work!
*Be Bold! Sophie Österberg 0733-832670 sophie.osterberg@wikimedia.se*
*Every single contribution to Wikipedia is a gift of free knowledge to humanity. *
2013/5/10 Sajjad Anwar me@sajjad.in
Hello everyone,
My name is Sajjad. Me and my colleague Sumandro are working on a project with The Centre for Internet and Society (http://cis-india.org/), India, to visualise the activities on Indic Wikipedia projects. We have already developed a few basic visualisations. You can see our work here: http://geohacker.github.io/indicwiki/
We are writing to you to ask for some help for our further work plans.
The next round of visualisation involves mainly two aspects:
- Edits by geography - We wanted to see from where people are
contributing to the Indic Wikipedia projects around the world. A city level data will be perfect for us to visualize this. Hence we want total edits for a Indic Wikipedia project from a particular location for a given period (preferably, per year)
- Most edited articles and edit wars - We would like to explore the
most edited articles per project and see if it is possible to visualise edit wars. For this we are looking for total edit counts (number of instances and volume of edit) for each article (given a threshold) for each Indic Wikipedia project per year.
Jessie Wild suggested that we write to this list for pointers in terms of the data.
Please let us know if any of you have worked with or if you can help us get hold of the relevant data.
Thank you so much!
Cheers, Sajjad.
-- Sajjad Anwar | http://sajjad.in | @geohacker
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
Awesome work! I like the flexibility of the charts, easy to switch metrics and presentation mode.
1. WMF has never captured ip->geo data on city level, but afaik this is going to change with Kraken.
2. Total edits per article per year can be derived from the xml dumps. I may have some csv data that come in handy.
For edit wars you need track reverts on an per article basis, right? That can also be derived from dumps.
For long history you need full archive dumps and need to calc checksum per revision text. (stub dumps have checksum but only for last year or two)
Erik Zachte
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Sophie Österberg Sent: Tuesday, May 14, 2013 4:20 PM To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Visualizing Indic Wikipedia projects.
Wow!
The motion chart reminds me a bit of gapminder;
Nice work!
Be Bold! Sophie Österberg 0733-832670 sophie.osterberg@wikimedia.se
Every single contribution to Wikipedia is a gift of free knowledge to humanity.
2013/5/10 Sajjad Anwar me@sajjad.in
Hello everyone,
My name is Sajjad. Me and my colleague Sumandro are working on a project with The Centre for Internet and Society (http://cis-india.org/), India, to visualise the activities on Indic Wikipedia projects. We have already developed a few basic visualisations. You can see our work here: http://geohacker.github.io/indicwiki/
We are writing to you to ask for some help for our further work plans.
The next round of visualisation involves mainly two aspects:
1. Edits by geography - We wanted to see from where people are contributing to the Indic Wikipedia projects around the world. A city level data will be perfect for us to visualize this. Hence we want total edits for a Indic Wikipedia project from a particular location for a given period (preferably, per year)
2. Most edited articles and edit wars - We would like to explore the most edited articles per project and see if it is possible to visualise edit wars. For this we are looking for total edit counts (number of instances and volume of edit) for each article (given a threshold) for each Indic Wikipedia project per year.
Jessie Wild suggested that we write to this list for pointers in terms of the data.
Please let us know if any of you have worked with or if you can help us get hold of the relevant data.
Thank you so much!
Cheers, Sajjad.
-- Sajjad Anwar | http://sajjad.in | @geohacker
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Tue, May 14, 2013 at 11:10 PM, Erik Zachte ezachte@wikimedia.org wrote:
Awesome work! I like the flexibility of the charts, easy to switch metrics and presentation mode.
Thank you so much Erik. Glad that you liked it!
- WMF has never captured ip->geo data on city level, but afaik this is
going to change with Kraken.
Awesome. We did have some luck with this level of detail which was shared by Evan Rosen. The data had few problems in terms of how the aggregation was done, which we haven't figured how to resolve yet.
- Total edits per article per year can be derived from the xml dumps. I
may have some csv data that come in handy.****
For edit wars you need track reverts on an per article basis, right? That can also be derived from dumps.****
For long history you need full archive dumps and need to calc checksum per revision text. (stub dumps have checksum but only for last year or two)
Okay. I'll spend some time fishing the XML dumps and keep you guys posted. Meanwhile, if you can share the CSV data it would be fantastic, so that we can start doing something around this :) Thank you so much!
Cheers, Sajjad.