---------- Forwarded message ---------- From: Arjuna Rao Chavala arjunaraoc@gmail.com Date: 2013/1/22 Subject: Status of Indian language wikipedias in 2012 and priorities, Telugu WP report To: wmin-members wmin-members@googlegroups.com
Hi
Please check out my blogpost on Indian Wikipedia languages status in 2012 and priorities http://blog.wikimedia.in/2013/01/22/analysis-of-indian-language-wikipedias-f...
Report on Telugu Wikipedia for 2012 is available at http://upload.wikimedia.org/wikipedia/commons/4/41/Telugu_Wikipedia_-2012_Re...
Cheers Arjuna Rao Chavala
On Tue, Jan 22, 2013 at 4:35 PM, Arjuna Rao Chavala arjunaraoc@gmail.com wrote:
[snip]
Please check out my blogpost on Indian Wikipedia languages status in 2012 and priorities http://blog.wikimedia.in/2013/01/22/analysis-of-indian-language-wikipedias-f...
You write "As each one of us intuitively understand that number of page views and number of edits are correlated, I derived a metric called ‘Activity’ as their product" - would it be possible for you to explain why arriving at a product of these two factors is a measure of activity ?
Hi Sankarshan,
2013/1/22 sankarshan foss.mailinglists@gmail.com
On Tue, Jan 22, 2013 at 4:35 PM, Arjuna Rao Chavala arjunaraoc@gmail.com wrote:
[snip]
Please check out my blogpost on Indian Wikipedia languages status in 2012 and priorities
http://blog.wikimedia.in/2013/01/22/analysis-of-indian-language-wikipedias-f...
You write "As each one of us intuitively understand that number of page views and number of edits are correlated, I derived a metric called ‘Activity’ as their product" - would it be possible for you to explain why arriving at a product of these two factors is a measure of activity ?
Both are key measures of activity for the outcome of Wikipedia, that is sharing the sum of human knowledge. If there are not many page views resulting from less number of readers, there will be less enthusiasm on the part of Editors to contribute. If there are more page views, more people will be interested to become Editors. Given the nature of these metrics and different ranges, each one will not be a reliable measure by itself as it is the interaction in the Wikipedia eco-system that will be a more appropriate measure. I also heard during some of wiki interactions that when Chinese language wikipedia was banned in China, the number of editors fell a lot.
Note that I have considered overall database edits in the metric, which also includes the edits that could be made through bots. Bots can be used to improve the quality of Wikipedia by correcting for spelling mistakes apart from less useful interwiki links. By one estimate, bots are said to be contributing to 10% to 30% of edits in top ranked language wikipedias.[1]
Cheers Arjuna
[1] THE LIVES OF BOTS-R. STUART GEIGER in Critical Point of View-A wikipedia reader", Page 78http://p2pfoundation.net/Critical_Wikipedia_Reader Simple statistics indicate the growing influence of algorithmic actors on the editorial process:in terms of the raw number of edits to the English-language version of Wikipedia, automated bots are 17 of the top 20 most prolific editors 2 and collectively make about 16% of all edits to the encyclopedia project. 3 On other major language versions of the project, the percentage of edits made by bots ranges from around 10% (Japanese) to 30% (French).
--
sankarshan mukhopadhyay https://twitter.com/#!/sankarshan
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Arjuna,
Your approach to have an idea of a community's activity by combining some metrics is interesting. Also, to compare monthly averages instead of end of year performance is also good.
*Increase in database size, number of most active contributors (making more than 100 edits a month) and page views can be real indicators of growth and activity*. Is there any way to find database size? We don't have that and many other useful stats after May 2010. Check http://stats.wikimedia.org/EN/TablesWikipediaTA.htm for example.
Also, any inference based on page views should normalize it based on the population of native speaking people to have a better perspective of the community's performance compared to its potential / size.
Ravi
Hi,
2013/1/22 Ravishankar ravidreams@gmail.com
Arjuna,
Your approach to have an idea of a community's activity by combining some metrics is interesting. Also, to compare monthly averages instead of end of year performance is also good.
Thanks. One clarification. I took the sum of the metrics for the entire year rather than average.
*Increase in database size, number of most active contributors (making more than 100 edits a month) and page views can be real indicators of growth and activity*. Is there any way to find database size? We don't have that and many other useful stats after May 2010. Check http://stats.wikimedia.org/EN/TablesWikipediaTA.htm for example.
Yes and No, as any metric can be compromised. If database size is a parameter, too many stub articles may adversely affect the quality.
Number of most active contributors is too small a number in most Indian language Wikipedias. There has been debate about the % of people who contribute to Wikipedia in English. While Jimmy is reported to have said that it is small fraction which contributes the bulk, Aron Swartz did an analysis that the bulk of contribution is made by large number of wikipedians based on the bytes added metric. In wikipedia, I think the former is more true, though it could be different if there are people who are contributing stubs rather than reasonable sized articles.
You can get an account on Toolserver and obtain the database size information.
Also, any inference based on page views should normalize it based on the population of native speaking people to have a better perspective of the community's performance compared to its potential / size.
My analysis is more focussed on relative change in each language than absolute numbers. The graphs included the information for all the languages to get a general feel of the languages together.
It is not the population alone which can be a significant factor, as there are several other factors like the love for language and geographic distribution of its speakers. Population numbers could be considered, but then there is lot of variation depending on the sources.
An exhaustive study would be useful, if we can determine a independent measure (may be by annual survey) as an impact of Wikipedia in accomplishing its mission, then the various metrics available like edits, page views and their combinations can be tried using statistical algorithms to arrive at key parameters or parameter combinations.
I think it would be good for WMF to work on such a thing, as otherwise our focus could be lost by looking at simple metrics like active editor/page views.
Cheers Arjuna
On Tue, Jan 22, 2013 at 6:01 PM, Arjuna Rao Chavala arjunaraoc@gmail.com wrote:
Both are key measures of activity for the outcome of Wikipedia, that is sharing the sum of human knowledge. If there are not many page views resulting from less number of readers, there will be less enthusiasm on the part of Editors to contribute. If there are more page views, more people will be interested to become Editors. Given the nature of these metrics and different ranges, each one will not be a reliable measure by itself as it is the interaction in the Wikipedia eco-system that will be a more appropriate measure. I also heard during some of wiki interactions that when Chinese language wikipedia was banned in China, the number of editors fell a lot.
When I realized that "Activity" is a product of the entities I understood that the two entities are commutative. However, there was a particular aspect which puzzled me - the number of views/viewers is a function of the richness of the content. In other words, while that value can certainly be influenced by the language community, it cannot be controlled. Against that, the number of edits is a value that is under the sphere of control of a language community. And, within that, edits can perhaps be classified (in context of whether the data available facilitates that deep dive) as : human and bots. Within the human-edited subset, there are ways to visualize the trend of data edits. Kiran did a bit of this way back - http://jace.zaiki.in/tag/mwclient
TL;DR : the measure of activity could perhaps be accurately reflected when specific data points around editing are considered rather than using a relationship with views.
2013/1/22 sankarshan foss.mailinglists@gmail.com
On Tue, Jan 22, 2013 at 6:01 PM, Arjuna Rao Chavala arjunaraoc@gmail.com wrote:
Both are key measures of activity for the outcome of Wikipedia, that is sharing the sum of human knowledge. If there are not many page views resulting from less number of readers, there will be less enthusiasm on
the
part of Editors to contribute. If there are more page views, more people will be interested to become Editors. Given the nature of these metrics
and
different ranges, each one will not be a reliable measure by itself as
it
is the interaction in the Wikipedia eco-system that will be a more appropriate measure. I also heard during some of wiki interactions that when Chinese language wikipedia was banned in China, the number of
editors
fell a lot.
When I realized that "Activity" is a product of the entities I understood that the two entities are commutative. However, there was a particular aspect which puzzled me - the number of views/viewers is a function of the richness of the content. In other words, while that value can certainly be influenced by the language community, it cannot be controlled. Against that, the number of edits is a value that is under the sphere of control of a language community. And, within that, edits can perhaps be classified (in context of whether the data available facilitates that deep dive) as : human and bots. Within the human-edited subset, there are ways to visualize the trend of data edits. Kiran did a bit of this way back - http://jace.zaiki.in/tag/mwclient
TL;DR : the measure of activity could perhaps be accurately reflected when specific data points around editing are considered rather than using a relationship with views.
Rather than just the views, we need to look at the value felt by the viewers (may be by an annual survey) as a measure of the impact of Wikipedia's mission. In the absence of such a measure, we are looking at Page views as an outcome. As page views itself may depend upon edits and several other factors and influence other factors, I have considered it as an input itself, as it is an important contributor to the activity.
There can be several ways to compute the quality of the Wikipedia. But that is a different topic.
Cheers Arjuna
On Jan 23, 2013 11:58 AM, "Arjuna Rao Chavala" arjunaraoc@gmail.com wrote:
Rather than just the views, we need to look at the value felt by the viewers (may be by an annual survey) as a measure of the impact of Wikipedia's mission. In the absence of such a measure, we are looking at Page views as an outcome. As page views itself may depend upon edits and several other factors and influence other factors, I have considered it as an input itself, as it is an important contributor to the activity.
There can be several ways to compute the quality of the Wikipedia. But that is a different topic.
Cheers Arjuna
That is a very challenging thought, Arjuna. Online surveys are potentially quite weak, as evidenced by the Facebook voting process (for policy changes, not the Facebook 'like' feature, whose purpose itself is full of questions).
A possibility may be a continuous rating system, with radio buttons on each page, visible to logged in users (?), or actionable only by logged in users, rather than an optionable survey on some other page. I see difficulties with this as well, mind you, but throw it out as a suggestion.
On the plus side, adding such user-involving features may be a way to bring in more 'mature' community involvement in knowledge creation and dissemination. I am often struck by the manner in which some relatively half-baked 'improvement' gets to be quite popular while worthwhile technology lies unused. 'The fault, dear Brutus, is not in our stars, But in ourselves, that we are underlings', said the poet, but I hope we are part of the solution, not content to remain the problem.
Somewhere in the future, I see some kind of 'sharing' as a possibility, whereby individuals who have common interests can actively and dynamically connect through wikipedia pages. That might throw up an interesting metric as well.
2013/1/23 Vickram Crishna vvcrishna@radiophony.com
On Jan 23, 2013 11:58 AM, "Arjuna Rao Chavala" arjunaraoc@gmail.com wrote:
Rather than just the views, we need to look at the value felt by the viewers (may be by an annual survey) as a measure of the impact of Wikipedia's mission.
--cut--
That is a very challenging thought, Arjuna. Online surveys are potentially quite weak, as evidenced by the Facebook voting process (for policy changes, not the Facebook 'like' feature, whose purpose itself is full of questions).
A possibility may be a continuous rating system, with radio buttons on each page, visible to logged in users (?), or actionable only by logged in users, rather than an optionable survey on some other page. I see difficulties with this as well, mind you, but throw it out as a suggestion.
On the plus side, adding such user-involving features may be a way to bring in more 'mature' community involvement in knowledge creation and dissemination. I am often struck by the manner in which some relatively half-baked 'improvement' gets to be quite popular while worthwhile technology lies unused. 'The fault, dear Brutus, is not in our stars, But in ourselves, that we are underlings', said the poet, but I hope we are part of the solution, not content to remain the problem.
Article feedback toolhttp://blog.wikimedia.org/2012/12/20/article-feedback-new-research-and-next-steps/currently in beta could be the impact metric that may be suitable for this purpose.
Somewhere in the future, I see some kind of 'sharing' as a possibility, whereby individuals who have common interests can actively and dynamically connect through wikipedia pages. That might throw up an interesting metric as well.
I understand that part of the plans for Wikipedia is to make it more social.
Cheers Arjuna
There is a much better report is here http://shijualex.wordpress.com/2013/01/27/analysis-of-the-indic-language-statistical-report-2012/[1]and here http://shijualex.wordpress.com/2013/01/19/indic-language-wikipedias-statistical-report-2012-2/[2]by Shiju.
1. http://shijualex.wordpress.com/2013/01/27/analysis-of-the-indic-language-sta... 2. http://shijualex.wordpress.com/2013/01/19/indic-language-wikipedias-statisti...
On Thu, Jan 24, 2013 at 5:23 PM, Arjuna Rao Chavala arjunaraoc@gmail.comwrote:
2013/1/23 Vickram Crishna vvcrishna@radiophony.com
On Jan 23, 2013 11:58 AM, "Arjuna Rao Chavala" arjunaraoc@gmail.com wrote:
Rather than just the views, we need to look at the value felt by the viewers (may be by an annual survey) as a measure of the impact of Wikipedia's mission.
--cut--
That is a very challenging thought, Arjuna. Online surveys are potentially quite weak, as evidenced by the Facebook voting process (for policy changes, not the Facebook 'like' feature, whose purpose itself is full of questions).
A possibility may be a continuous rating system, with radio buttons on each page, visible to logged in users (?), or actionable only by logged in users, rather than an optionable survey on some other page. I see difficulties with this as well, mind you, but throw it out as a suggestion.
On the plus side, adding such user-involving features may be a way to bring in more 'mature' community involvement in knowledge creation and dissemination. I am often struck by the manner in which some relatively half-baked 'improvement' gets to be quite popular while worthwhile technology lies unused. 'The fault, dear Brutus, is not in our stars, But in ourselves, that we are underlings', said the poet, but I hope we are part of the solution, not content to remain the problem.
Article feedback toolhttp://blog.wikimedia.org/2012/12/20/article-feedback-new-research-and-next-steps/currently in beta could be the impact metric that may be suitable for this purpose.
Somewhere in the future, I see some kind of 'sharing' as a possibility, whereby individuals who have common interests can actively and dynamically connect through wikipedia pages. That might throw up an interesting metric as well.
I understand that part of the plans for Wikipedia is to make it more social.
Cheers Arjuna
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Hi Shiju,
Thanks for the analysis. We now know where we need to improve (i.e. develop older stubs). Thanks to Jyotis for pulling out the data. It'll be great if we could measure other quality indicators like number/percentage of articles with citations, average number of citations per article, etc. also.
- Sundar "That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted." - George Boole, quoted in Iverson's Turing Award Lecture
From: Ramesh N G rameshng@gmail.com To: Wikimedia India Community list wikimediaindia-l@lists.wikimedia.org Cc: Shiju Alex shijualexonline@gmail.com Sent: Sunday, January 27, 2013 10:23 PM Subject: Re: [Wikimediaindia-l] Fwd: Status of Indian language wikipedias in 2012 and priorities, Telugu WP report
There is a much better report is here [1]and here [2]by Shiju.
- http://shijualex.wordpress.com/2013/01/27/analysis-of-the-indic-language-sta...
- http://shijualex.wordpress.com/2013/01/19/indic-language-wikipedias-statisti...
On Thu, Jan 24, 2013 at 5:23 PM, Arjuna Rao Chavala arjunaraoc@gmail.com wrote:
2013/1/23 Vickram Crishna vvcrishna@radiophony.com
On Jan 23, 2013 11:58 AM, "Arjuna Rao Chavala" arjunaraoc@gmail.com wrote:
Rather than just the views, we need to look at the value felt by the viewers (may be by an annual survey) as a measure of the impact of Wikipedia's mission.
--cut--
That is a very challenging thought, Arjuna. Online surveys are potentially quite weak, as evidenced by the Facebook voting process (for policy changes, not the Facebook 'like' feature, whose purpose itself is full of questions).
A possibility may be a continuous rating system, with radio buttons on each page, visible to logged in users (?), or actionable only by logged in users, rather than an optionable survey on some other page. I see difficulties with this as well, mind you, but throw it out as a suggestion. On the plus side, adding such user-involving features may be a way to bring in more 'mature' community involvement in knowledge creation and dissemination. I am often struck by the manner in which some relatively half-baked 'improvement' gets to be quite popular while worthwhile technology lies unused. 'The fault, dear Brutus, is not in our stars, But in ourselves, that we are underlings', said the poet, but I hope we are part of the solution, not content to remain the problem.
Article feedback tool currently in beta could be the impact metric that may be suitable for this purpose. Somewhere in the future, I see some kind of 'sharing' as a possibility, whereby individuals who have common interests can actively and dynamically connect through wikipedia pages. That might throw up an interesting metric as well. I understand that part of the plans for Wikipedia is to make it more social.
Cheers Arjuna
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Isn't Konkani in incubation too? FN
-- FN Land +91-832-240-9490 Cell +91-982-212-2436 fn@goa-india.org Goa,1556's updated list of books available on and from Goa: http://www.scribd.com/doc/76671049/Goa1556-Catalogue-Books-from-Goa
On 22 January 2013 16:35, Arjuna Rao Chavala arjunaraoc@gmail.com wrote:
---------- Forwarded message ---------- From: Arjuna Rao Chavala arjunaraoc@gmail.com Date: 2013/1/22 Subject: Status of Indian language wikipedias in 2012 and priorities, Telugu WP report To: wmin-members wmin-members@googlegroups.com
Hi
Please check out my blogpost on Indian Wikipedia languages status in 2012 and priorities
http://blog.wikimedia.in/2013/01/22/analysis-of-indian-language-wikipedias-f...
Report on Telugu Wikipedia for 2012 is available at
http://upload.wikimedia.org/wikipedia/commons/4/41/Telugu_Wikipedia_-2012_Re...
Cheers Arjuna Rao Chavala
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Interesting new metric, Arjun. Although, like you said, it's difficult to capture "the health and vitality" of a wiki in a single metric. Apart from external factors, I'd consider the following to have an influence on the path any wiki takes:
1. The seed community - typically small wikis start with just one motivated editor. If that person puts the wiki above themselves, it helps in attracting other motivated editors. [We were extremely lucky to have the visionary and unassuming Mayooranathan as our first active editor.] And then the first few active editors (six if I have to put a number to it) set the culture for generations to come. If they truly got the wiki philosophy and are committed to open knowledge in their mother tongues, then the wiki's safely out of stunted growth path. If there's diversity at that stage, it helps a lot.
2. Organic vs inorganic growth: wikis that use inorganic means like bots and automated translation cautiously and with an intent to drive organic growth further will have "durable growth." Part of the reason for the success (so far) of Tamil and Malayalam wikis is that. Bangla (for example) seems to be taking that choice. Other wikis may be on the right track, but I might not know. Beyond these, relative growth rates of various wikis would depend on the external factors. It'll be nice to have a few case studies done by editors of their respective wikis along the lines of http://ta.wikipedia.org/s/bs8 will help us identify patterns. We can do number crunching at a higher level, but I'm sure editors of any wiki would have some special insider insights to share.
- Sundar
"That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted." - George Boole, quoted in Iverson's Turing Award Lecture
From: Arjuna Rao Chavala arjunaraoc@gmail.com To: Discussion list on Indian language projects of Wikimedia. wikimediaindia-l@lists.wikimedia.org Sent: Tuesday, January 22, 2013 4:35 PM Subject: [Wikimediaindia-l] Fwd: Status of Indian language wikipedias in 2012 and priorities, Telugu WP report
---------- Forwarded message ---------- From: Arjuna Rao Chavala arjunaraoc@gmail.com Date: 2013/1/22 Subject: Status of Indian language wikipedias in 2012 and priorities, Telugu WP report To: wmin-members wmin-members@googlegroups.com
Hi
Please check out my blogpost on Indian Wikipedia languages status in 2012 and priorities http://blog.wikimedia.in/2013/01/22/analysis-of-indian-language-wikipedias-f...
Report on Telugu Wikipedia for 2012 is available at http://upload.wikimedia.org/wikipedia/commons/4/41/Telugu_Wikipedia_-2012_Re...
Cheers Arjuna Rao Chavala
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
wikimediaindia-l@lists.wikimedia.org