Hi all, I am mayur a hindi wikipedian.I have prepared a survey to determine overall quality of a wikipedia project.Here is the link https://docs.google.com/document/d/1IFphBpq14eMUjoBcy0wk-WWQNTQ5vxpq_o285Dyr... for survey report. Here is the Summary for that
*Why did we do that survey?*
As we know there are lots of Indic wiki projects having different number of active users and article. However we simply differentiate them on behalf of number of articles or depth but that cannot give us an actual overall growth of that project because many projects are simply bot generated as we know about Nepal bhasha and Bishnupriya Manipuri Wikipedia. Simply a project growth can be measured from its article quality and number of articles both. A project having large no of articles having good quality of article has grown faster.
*How did we measure quality of a wiki project?*
Quality of a wiki project is simply depends upon how many articles are developed from their stubs root. Generally each article is started from a stub. Now quality of the article mainly depends upon how much time it has been edited and how many words are in it or how many long the article is? So we choose different factor which directly affects the quality of article. These are-
1) *Nr. of good articles (Articles having size>2 KB)*
Simply a project having more articles of size 2 KB has grown much in comparison to another. But this alone cannot decide quality of a project that’s why we included some another factors. We simply took the percentage of articles having size greater than 2 KB. We gave a marking scale of 150 to this.
2) *Nr. of Average articles (Articles having size>0.5 KB)*
Simply to filter the stubs we choose this criteria, we gave marking scale of 100 to this just less than above because article having size greater than 2 KB reflects much growth in comparison to 0.5 KB article.
3) *Average number of words in an article*
That was too tricky to calculate but we simply divided total number of words in a project by total nr. of articles in it for a rough estimate. To keep this in the marking scale of 100 we simply multiplied the output value by 0.1 In our formula.
4) *Avg. size of article (KB)*
That was also tricky to calculate but we simply divided total size of a project by total nr. of articles in it for a rough estimate. To keep this in the marking scale of 100 we simply multiplied the output value by 10 In our formula.
5) *Main space edit per total nr. Of articles*
That was also an important factor to know how frequently an article is being edited for being updated. Output value was already on a scale of 100 so we did not multiply of divided that value.
6) *Total Edits/article*
This is also an Important factor because it reflects how much extra edits (that includes categorization, image uploads and some other similar factors) are being performed in a wiki project. As Output value was already on a scale of 100 so we did not multiply of divided that value.
7) *Bot edits*
That was the most important factor because all the factor that we discussed can be gained at high value by running bots like in Nepal bhasha and Bishnupriya Manipuri Wikipedia. So we simply multiplied the overall score by percentage non bot edits. However bots edits in some extant are also a necessary part. So we just set the 50% bot edits as a cut off mark. For A wiki having more than 50% of bot edits we simply multiplied total score by (100-(bot edits if larger than 50%-50))
*Formula for Quality factor = (R*0.1+Q*10+E+F*1.5+P+O)*(100-N)/100*
*R- Average number of words in an article*
*Q- Avg. size of article (KB)*
*E- Article greater than 2kb*
*F-* *Article greater than 0.5kb*
*P-* *Total Edits/article*
*O-* *Mainspace edit per total nr. Of articles*
*N-* *Share of Bot edits (%)*
* *
*Overall Score = Number of articles * Quality factor*
* *
* *
**
*Thank you and Regards*
*Mayur*
Hoi, I wanted to have a look but I learned that I need permission. You can allow the world to see without giving permission to change the document I am sure. Thanks, GerardM
On 2 February 2011 11:28, mayur mayurdce@gmail.com wrote:
Hi all, I am mayur a hindi wikipedian.I have prepared a survey to determine overall quality of a wikipedia project.Here is the link https://docs.google.com/document/d/1IFphBpq14eMUjoBcy0wk-WWQNTQ5vxpq_o285Dyr... for survey report. Here is the Summary for that
*Why did we do that survey?*
As we know there are lots of Indic wiki projects having different number of active users and article. However we simply differentiate them on behalf of number of articles or depth but that cannot give us an actual overall growth of that project because many projects are simply bot generated as we know about Nepal bhasha and Bishnupriya Manipuri Wikipedia. Simply a project growth can be measured from its article quality and number of articles both. A project having large no of articles having good quality of article has grown faster.
*How did we measure quality of a wiki project?*
Quality of a wiki project is simply depends upon how many articles are developed from their stubs root. Generally each article is started from a stub. Now quality of the article mainly depends upon how much time it has been edited and how many words are in it or how many long the article is? So we choose different factor which directly affects the quality of article. These are-
- *Nr. of good articles (Articles having size>2 KB)*
Simply a project having more articles of size 2 KB has grown much in comparison to another. But this alone cannot decide quality of a project that’s why we included some another factors. We simply took the percentage of articles having size greater than 2 KB. We gave a marking scale of 150 to this.
- *Nr. of Average articles (Articles having size>0.5 KB)*
Simply to filter the stubs we choose this criteria, we gave marking scale of 100 to this just less than above because article having size greater than 2 KB reflects much growth in comparison to 0.5 KB article.
- *Average number of words in an article*
That was too tricky to calculate but we simply divided total number of words in a project by total nr. of articles in it for a rough estimate. To keep this in the marking scale of 100 we simply multiplied the output value by 0.1 In our formula.
- *Avg. size of article (KB)*
That was also tricky to calculate but we simply divided total size of a project by total nr. of articles in it for a rough estimate. To keep this in the marking scale of 100 we simply multiplied the output value by 10 In our formula.
- *Main space edit per total nr. Of articles*
That was also an important factor to know how frequently an article is being edited for being updated. Output value was already on a scale of 100 so we did not multiply of divided that value.
- *Total Edits/article*
This is also an Important factor because it reflects how much extra edits (that includes categorization, image uploads and some other similar factors) are being performed in a wiki project. As Output value was already on a scale of 100 so we did not multiply of divided that value.
- *Bot edits*
That was the most important factor because all the factor that we discussed can be gained at high value by running bots like in Nepal bhasha and Bishnupriya Manipuri Wikipedia. So we simply multiplied the overall score by percentage non bot edits. However bots edits in some extant are also a necessary part. So we just set the 50% bot edits as a cut off mark. For A wiki having more than 50% of bot edits we simply multiplied total score by (100-(bot edits if larger than 50%-50))
*Formula for Quality factor = (R*0.1+Q*10+E+F*1.5+P+O)*(100-N)/100*
*R- Average number of words in an article*
*Q- Avg. size of article (KB)*
*E- Article greater than 2kb*
*F-* *Article greater than 0.5kb*
*P-* *Total Edits/article*
*O-* *Mainspace edit per total nr. Of articles*
*N-* *Share of Bot edits (%)*
*Overall Score = Number of articles * Quality factor*
**
*Thank you and Regards*
*Mayur*
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Mayur might have forgot to set the privacy setting of the document to * Public*.
Hope he will change the privacy settings soon.
Shiju
On Wed, Feb 2, 2011 at 1:17 PM, Gerard Meijssen gerard.meijssen@gmail.comwrote:
Hoi, I wanted to have a look but I learned that I need permission. You can allow the world to see without giving permission to change the document I am sure. Thanks, GerardM
On 2 February 2011 11:28, mayur mayurdce@gmail.com wrote:
Hi all, I am mayur a hindi wikipedian.I have prepared a survey to determine overall quality of a wikipedia project.Here is the link https://docs.google.com/document/d/1IFphBpq14eMUjoBcy0wk-WWQNTQ5vxpq_o285Dyr... for survey report. Here is the Summary for that
*Why did we do that survey?*
As we know there are lots of Indic wiki projects having different number of active users and article. However we simply differentiate them on behalf of number of articles or depth but that cannot give us an actual overall growth of that project because many projects are simply bot generated as we know about Nepal bhasha and Bishnupriya Manipuri Wikipedia. Simply a project growth can be measured from its article quality and number of articles both. A project having large no of articles having good quality of article has grown faster.
*How did we measure quality of a wiki project?*
Quality of a wiki project is simply depends upon how many articles are developed from their stubs root. Generally each article is started from a stub. Now quality of the article mainly depends upon how much time it has been edited and how many words are in it or how many long the article is? So we choose different factor which directly affects the quality of article. These are-
- *Nr. of good articles (Articles having size>2 KB)*
Simply a project having more articles of size 2 KB has grown much in comparison to another. But this alone cannot decide quality of a project that’s why we included some another factors. We simply took the percentage of articles having size greater than 2 KB. We gave a marking scale of 150 to this.
- *Nr. of Average articles (Articles having size>0.5 KB)*
Simply to filter the stubs we choose this criteria, we gave marking scale of 100 to this just less than above because article having size greater than 2 KB reflects much growth in comparison to 0.5 KB article.
- *Average number of words in an article*
That was too tricky to calculate but we simply divided total number of words in a project by total nr. of articles in it for a rough estimate. To keep this in the marking scale of 100 we simply multiplied the output value by 0.1 In our formula.
- *Avg. size of article (KB)*
That was also tricky to calculate but we simply divided total size of a project by total nr. of articles in it for a rough estimate. To keep this in the marking scale of 100 we simply multiplied the output value by 10 In our formula.
- *Main space edit per total nr. Of articles*
That was also an important factor to know how frequently an article is being edited for being updated. Output value was already on a scale of 100 so we did not multiply of divided that value.
- *Total Edits/article*
This is also an Important factor because it reflects how much extra edits (that includes categorization, image uploads and some other similar factors) are being performed in a wiki project. As Output value was already on a scale of 100 so we did not multiply of divided that value.
- *Bot edits*
That was the most important factor because all the factor that we discussed can be gained at high value by running bots like in Nepal bhasha and Bishnupriya Manipuri Wikipedia. So we simply multiplied the overall score by percentage non bot edits. However bots edits in some extant are also a necessary part. So we just set the 50% bot edits as a cut off mark. For A wiki having more than 50% of bot edits we simply multiplied total score by (100-(bot edits if larger than 50%-50))
*Formula for Quality factor = (R*0.1+Q*10+E+F*1.5+P+O)*(100-N)/100*
*R- Average number of words in an article*
*Q- Avg. size of article (KB)*
*E- Article greater than 2kb*
*F-* *Article greater than 0.5kb*
*P-* *Total Edits/article*
*O-* *Mainspace edit per total nr. Of articles*
*N-* *Share of Bot edits (%)*
*Overall Score = Number of articles * Quality factor*
**
*Thank you and Regards*
*Mayur*
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Hi all, I forgot to change the setting for pdf file, A little mistake but sorry for that.Now anyone can see this survey :-)
Thank you and Regards
On Wed, Feb 2, 2011 at 5:47 PM, Gerard Meijssen gerard.meijssen@gmail.comwrote:
Hoi, I wanted to have a look but I learned that I need permission. You can allow the world to see without giving permission to change the document I am sure. Thanks, GerardM
On 2 February 2011 11:28, mayur mayurdce@gmail.com wrote:
Hi all, I am mayur a hindi wikipedian.I have prepared a survey to determine overall quality of a wikipedia project.Here is the link https://docs.google.com/document/d/1IFphBpq14eMUjoBcy0wk-WWQNTQ5vxpq_o285Dyr... for survey report. Here is the Summary for that
*Why did we do that survey?*
As we know there are lots of Indic wiki projects having different number of active users and article. However we simply differentiate them on behalf of number of articles or depth but that cannot give us an actual overall growth of that project because many projects are simply bot generated as we know about Nepal bhasha and Bishnupriya Manipuri Wikipedia. Simply a project growth can be measured from its article quality and number of articles both. A project having large no of articles having good quality of article has grown faster.
*How did we measure quality of a wiki project?*
Quality of a wiki project is simply depends upon how many articles are developed from their stubs root. Generally each article is started from a stub. Now quality of the article mainly depends upon how much time it has been edited and how many words are in it or how many long the article is? So we choose different factor which directly affects the quality of article. These are-
- *Nr. of good articles (Articles having size>2 KB)*
Simply a project having more articles of size 2 KB has grown much in comparison to another. But this alone cannot decide quality of a project that’s why we included some another factors. We simply took the percentage of articles having size greater than 2 KB. We gave a marking scale of 150 to this.
- *Nr. of Average articles (Articles having size>0.5 KB)*
Simply to filter the stubs we choose this criteria, we gave marking scale of 100 to this just less than above because article having size greater than 2 KB reflects much growth in comparison to 0.5 KB article.
- *Average number of words in an article*
That was too tricky to calculate but we simply divided total number of words in a project by total nr. of articles in it for a rough estimate. To keep this in the marking scale of 100 we simply multiplied the output value by 0.1 In our formula.
- *Avg. size of article (KB)*
That was also tricky to calculate but we simply divided total size of a project by total nr. of articles in it for a rough estimate. To keep this in the marking scale of 100 we simply multiplied the output value by 10 In our formula.
- *Main space edit per total nr. Of articles*
That was also an important factor to know how frequently an article is being edited for being updated. Output value was already on a scale of 100 so we did not multiply of divided that value.
- *Total Edits/article*
This is also an Important factor because it reflects how much extra edits (that includes categorization, image uploads and some other similar factors) are being performed in a wiki project. As Output value was already on a scale of 100 so we did not multiply of divided that value.
- *Bot edits*
That was the most important factor because all the factor that we discussed can be gained at high value by running bots like in Nepal bhasha and Bishnupriya Manipuri Wikipedia. So we simply multiplied the overall score by percentage non bot edits. However bots edits in some extant are also a necessary part. So we just set the 50% bot edits as a cut off mark. For A wiki having more than 50% of bot edits we simply multiplied total score by (100-(bot edits if larger than 50%-50))
*Formula for Quality factor = (R*0.1+Q*10+E+F*1.5+P+O)*(100-N)/100*
*R- Average number of words in an article*
*Q- Avg. size of article (KB)*
*E- Article greater than 2kb*
*F-* *Article greater than 0.5kb*
*P-* *Total Edits/article*
*O-* *Mainspace edit per total nr. Of articles*
*N-* *Share of Bot edits (%)*
*Overall Score = Number of articles * Quality factor*
**
*Thank you and Regards*
*Mayur*
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Hi, i observed that the link i provided had not a quality pdf inside it, plz update the link as https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srci...
On Wed, Feb 2, 2011 at 5:47 PM, Gerard Meijssen gerard.meijssen@gmail.comwrote:
Hoi, I wanted to have a look but I learned that I need permission. You can allow the world to see without giving permission to change the document I am sure. Thanks, GerardM
On 2 February 2011 11:28, mayur mayurdce@gmail.com wrote:
Hi all, I am mayur a hindi wikipedian.I have prepared a survey to determine overall quality of a wikipedia project.Here is the link https://docs.google.com/document/d/1IFphBpq14eMUjoBcy0wk-WWQNTQ5vxpq_o285Dyr... for survey report. Here is the Summary for that
*Why did we do that survey?*
As we know there are lots of Indic wiki projects having different number of active users and article. However we simply differentiate them on behalf of number of articles or depth but that cannot give us an actual overall growth of that project because many projects are simply bot generated as we know about Nepal bhasha and Bishnupriya Manipuri Wikipedia. Simply a project growth can be measured from its article quality and number of articles both. A project having large no of articles having good quality of article has grown faster.
*How did we measure quality of a wiki project?*
Quality of a wiki project is simply depends upon how many articles are developed from their stubs root. Generally each article is started from a stub. Now quality of the article mainly depends upon how much time it has been edited and how many words are in it or how many long the article is? So we choose different factor which directly affects the quality of article. These are-
- *Nr. of good articles (Articles having size>2 KB)*
Simply a project having more articles of size 2 KB has grown much in comparison to another. But this alone cannot decide quality of a project that’s why we included some another factors. We simply took the percentage of articles having size greater than 2 KB. We gave a marking scale of 150 to this.
- *Nr. of Average articles (Articles having size>0.5 KB)*
Simply to filter the stubs we choose this criteria, we gave marking scale of 100 to this just less than above because article having size greater than 2 KB reflects much growth in comparison to 0.5 KB article.
- *Average number of words in an article*
That was too tricky to calculate but we simply divided total number of words in a project by total nr. of articles in it for a rough estimate. To keep this in the marking scale of 100 we simply multiplied the output value by 0.1 In our formula.
- *Avg. size of article (KB)*
That was also tricky to calculate but we simply divided total size of a project by total nr. of articles in it for a rough estimate. To keep this in the marking scale of 100 we simply multiplied the output value by 10 In our formula.
- *Main space edit per total nr. Of articles*
That was also an important factor to know how frequently an article is being edited for being updated. Output value was already on a scale of 100 so we did not multiply of divided that value.
- *Total Edits/article*
This is also an Important factor because it reflects how much extra edits (that includes categorization, image uploads and some other similar factors) are being performed in a wiki project. As Output value was already on a scale of 100 so we did not multiply of divided that value.
- *Bot edits*
That was the most important factor because all the factor that we discussed can be gained at high value by running bots like in Nepal bhasha and Bishnupriya Manipuri Wikipedia. So we simply multiplied the overall score by percentage non bot edits. However bots edits in some extant are also a necessary part. So we just set the 50% bot edits as a cut off mark. For A wiki having more than 50% of bot edits we simply multiplied total score by (100-(bot edits if larger than 50%-50))
*Formula for Quality factor = (R*0.1+Q*10+E+F*1.5+P+O)*(100-N)/100*
*R- Average number of words in an article*
*Q- Avg. size of article (KB)*
*E- Article greater than 2kb*
*F-* *Article greater than 0.5kb*
*P-* *Total Edits/article*
*O-* *Mainspace edit per total nr. Of articles*
*N-* *Share of Bot edits (%)*
*Overall Score = Number of articles * Quality factor*
**
*Thank you and Regards*
*Mayur*
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Hi Mayur,
Interesting analysis. Thanks for the hard work.
Is there any academic reference based on which the formula was arrived or is it empirical?
One possible indicator for a Wikipedia's quality may be offline activisim / organizing as it indicates the community's rapport which is very vital for overall quality control. Things like meetups, releasing CD etc.,
Ravi
Thank you Mayur for you work, It could help our Bengali wikipedia grouth also.
On Wed, Feb 2, 2011 at 7:10 PM, Ravishankar ravidreams@gmail.com wrote:
Hi Mayur,
Interesting analysis. Thanks for the hard work.
Is there any academic reference based on which the formula was arrived or is it empirical?
One possible indicator for a Wikipedia's quality may be offline activisim / organizing as it indicates the community's rapport which is very vital for overall quality control. Things like meetups, releasing CD etc.,
Ravi
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Yes, I choose many important factors that was available in wikimedia stats, The formula was derived by me but as a past student of economics i tried to do my best, basically i marked each factor on the scale of 100 so that no factor alone may affect the whole survey.
Thank you
On Wed, Feb 2, 2011 at 7:10 PM, Ravishankar ravidreams@gmail.com wrote:
Hi Mayur,
Interesting analysis. Thanks for the hard work.
Is there any academic reference based on which the formula was arrived or is it empirical?
One possible indicator for a Wikipedia's quality may be offline activisim / organizing as it indicates the community's rapport which is very vital for overall quality control. Things like meetups, releasing CD etc.,
Ravi
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Good work Mayur. It will be more better if interested wikimedians write their analysis about this survey. Each wikimedian can derive various analysis from this report.
The quality factor can be improved/refined if we include/exclude various parameters. One important parameter that I would like to see as part of this is *the Localization statistics*.
For non-latin wikis the article size of 0.5 kb or 2 kb is negligible. A higher size need to be included (for the reports published at http://stats.wikimedia.org) for better comparison of non-latin language wikis.
Shiju
On Wed, Feb 2, 2011 at 7:18 PM, mayur mayurdce@gmail.com wrote:
Yes, I choose many important factors that was available in wikimedia stats, The formula was derived by me but as a past student of economics i tried to do my best, basically i marked each factor on the scale of 100 so that no factor alone may affect the whole survey.
Thank you
On Wed, Feb 2, 2011 at 7:10 PM, Ravishankar ravidreams@gmail.com wrote:
Hi Mayur,
Interesting analysis. Thanks for the hard work.
Is there any academic reference based on which the formula was arrived or is it empirical?
One possible indicator for a Wikipedia's quality may be offline activisim / organizing as it indicates the community's rapport which is very vital for overall quality control. Things like meetups, releasing CD etc.,
Ravi
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
On 2/2/2011 5:57 AM, Shiju Alex wrote:
Good work Mayur. It will be more better if interested wikimedians write their analysis about this survey. Each wikimedian can derive various analysis from this report.
The quality factor can be improved/refined if we include/exclude various parameters. One important parameter that I would like to see as part of this is *the Localization statistics*.
For non-latin wikis the article size of 0.5 kb or 2 kb is negligible. A higher size need to be included (for the reports published at http://stats.wikimedia.org) for better comparison of non-latin language wikis.
Shiju
Hi Shiju,
This may be useful:
http://stats.wikimedia.org/EN/TablesWikipediaML.htm#distribution
Cheers, Erik
On Wed, Feb 2, 2011 at 7:18 PM, mayur <mayurdce@gmail.com mailto:mayurdce@gmail.com> wrote:
Yes, I choose many important factors that was available in wikimedia stats, The formula was derived by me but as a past student of economics i tried to do my best, basically i marked each factor on the scale of 100 so that no factor alone may affect the whole survey. Thank you On Wed, Feb 2, 2011 at 7:10 PM, Ravishankar <ravidreams@gmail.com <mailto:ravidreams@gmail.com>> wrote: Hi Mayur, Interesting analysis. Thanks for the hard work. Is there any academic reference based on which the formula was arrived or is it empirical? One possible indicator for a Wikipedia's quality may be offline activisim / organizing as it indicates the community's rapport which is very vital for overall quality control. Things like meetups, releasing CD etc., Ravi _______________________________________________ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org <mailto:Wikimediaindia-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l _______________________________________________ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org <mailto:Wikimediaindia-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
On Wed, Feb 2, 2011 at 7:18 PM, mayur mayurdce@gmail.com wrote:
Yes, I choose many important factors that was available in wikimedia stats, The formula was derived by me but as a past student of economics i tried to do my best, basically i marked each factor on the scale of 100 so that no factor alone may affect the whole survey.
Thanks Mayur.
The result can vary based on weight and choice of factors. Nevertheless, it is worthy attempt to have a holistic view. Kudos.
Ravi
May be the foundation can consider a project with qualified statisticians or through academic grants to measure the health of a Wiki in a holistic way. Comparing by any single metric may not give a complete understanding.
Ravi
Mayur, This is really great work! This should be an Inspiration to improve on the weak areas of each wikipedia.
regards Ramesh NG
On Wed, Feb 2, 2011 at 3:58 PM, mayur mayurdce@gmail.com wrote:
Hi all, I am mayur a hindi wikipedian.I have prepared a survey to determine overall quality of a wikipedia project.Here is the link https://docs.google.com/document/d/1IFphBpq14eMUjoBcy0wk-WWQNTQ5vxpq_o285Dyr... for survey report. Here is the Summary for that
*Why did we do that survey?*
As we know there are lots of Indic wiki projects having different number of active users and article. However we simply differentiate them on behalf of number of articles or depth but that cannot give us an actual overall growth of that project because many projects are simply bot generated as we know about Nepal bhasha and Bishnupriya Manipuri Wikipedia. Simply a project growth can be measured from its article quality and number of articles both. A project having large no of articles having good quality of article has grown faster.
*How did we measure quality of a wiki project?*
Quality of a wiki project is simply depends upon how many articles are developed from their stubs root. Generally each article is started from a stub. Now quality of the article mainly depends upon how much time it has been edited and how many words are in it or how many long the article is? So we choose different factor which directly affects the quality of article. These are-
- *Nr. of good articles (Articles having size>2 KB)*
Simply a project having more articles of size 2 KB has grown much in comparison to another. But this alone cannot decide quality of a project that’s why we included some another factors. We simply took the percentage of articles having size greater than 2 KB. We gave a marking scale of 150 to this.
- *Nr. of Average articles (Articles having size>0.5 KB)*
Simply to filter the stubs we choose this criteria, we gave marking scale of 100 to this just less than above because article having size greater than 2 KB reflects much growth in comparison to 0.5 KB article.
- *Average number of words in an article*
That was too tricky to calculate but we simply divided total number of words in a project by total nr. of articles in it for a rough estimate. To keep this in the marking scale of 100 we simply multiplied the output value by 0.1 In our formula.
- *Avg. size of article (KB)*
That was also tricky to calculate but we simply divided total size of a project by total nr. of articles in it for a rough estimate. To keep this in the marking scale of 100 we simply multiplied the output value by 10 In our formula.
- *Main space edit per total nr. Of articles*
That was also an important factor to know how frequently an article is being edited for being updated. Output value was already on a scale of 100 so we did not multiply of divided that value.
- *Total Edits/article*
This is also an Important factor because it reflects how much extra edits (that includes categorization, image uploads and some other similar factors) are being performed in a wiki project. As Output value was already on a scale of 100 so we did not multiply of divided that value.
- *Bot edits*
That was the most important factor because all the factor that we discussed can be gained at high value by running bots like in Nepal bhasha and Bishnupriya Manipuri Wikipedia. So we simply multiplied the overall score by percentage non bot edits. However bots edits in some extant are also a necessary part. So we just set the 50% bot edits as a cut off mark. For A wiki having more than 50% of bot edits we simply multiplied total score by (100-(bot edits if larger than 50%-50))
*Formula for Quality factor = (R*0.1+Q*10+E+F*1.5+P+O)*(100-N)/100*
*R- Average number of words in an article*
*Q- Avg. size of article (KB)*
*E- Article greater than 2kb*
*F-* *Article greater than 0.5kb*
*P-* *Total Edits/article*
*O-* *Mainspace edit per total nr. Of articles*
*N-* *Share of Bot edits (%)*
*Overall Score = Number of articles * Quality factor*
**
*Thank you and Regards*
*Mayur*
Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
wikimediaindia-l@lists.wikimedia.org