On Fri, Jul 24, 2009 at 5:55 PM, Felipe Ortegaglimmer_phoenix@yahoo.es wrote:
You can check more precise figures and graphs in my thesis about general statistics for survivability for all logged editors and core editors (the top 10% most active editors in each month), from the beginning until Dec. 2007, in the top-ten language versions (at that time).
http://libresoft.es/Members/jfelipe/phd-thesis (page) http://libresoft.es/Members/jfelipe/thesis-wkp-quantanalysis (doc)
As for the percentages of users by age, education level, etc. my impression is that opinions from experienced community members are often well oriented. But they're only opinions. Until we get the results of the general survey, we won't have a clear picture of the current "recruitment" targets for all versions.
Nevertheless, according to our updates, it seems that the situation is not getting better from Jan 2008 onwards.
Great work, Felipe! I've seen mentioning of your work, but up to now, I didn't read that. Now, I looked into the highlights of your thesis and they are very informative. I am quoting some of the conclusions here:
Q5: What is the average lifetime of Wikipedia volunteer authors in the project?: The main conclusion we can infer from our survival analysis performed on the community of authors in the top ten Wikipedias is that there is an extraordinary high mortality rate in all languages. Actually, we show that the monthly number of deaths of logged authors in the top ten language versions surpassed the monthly number of new logged authors coming to contribute for the first time in a certain version. Therefore, the higher mortality rate, since the beginning of 2007, offers a possible explanation for the steady-state reached by the monthly number of contributions and monthly number of active pages in all versions during the same period. A significant proportion of authors (more than 50% in all versions) abandons the project after more than 200 days. Moreover, reaching the core group of very active authors does not ensures that those authors will exhibit better survivability since, in fact, more than 50% of them abandon that core of very active authors after less than 100 days (less than 30 in the case of the Portuguese and English Wikipedias). Complementing this findings, the application of the Cox proportional hazards model let us demonstrate that the participation of logged authors in FAs or talk pages has a significant positive impact to enhance the survivability of such contributors, being the contribution to both key types of pages the one presenting the higher enhancement effect over the average lifetime of authors.
Q7: Is it possible to infer, based on previous history data, any sustainability conditions affecting the top-ten Wikipedias in due course?: As a main conclusion, looking at the evolution of the key parameters already identified as relevant to explain the progress in time of the top ten Wikipedias and their communities, we find that those statistics describing the activity of logged authors tend to follow Pareto-like distributions that become, in general, more and more log-linear as time elapses. On the other hand, metrics describing articles has progressively lost the old Pareto-like shape for their distribution, reaching a lognormal shape during 2007 (probably, as a result of the stabilization of the number of logged authors in all versions, as well). The analysis of the evolution in time of contributions from the core of very active authors identified in each month of history of a certain language version, reveals that former core authors does not provide a comparable amount of effort to the level offered by new, even more active members of the core. Nevertheless, again the evolution parameters point out a somewhat delicate situation, since the monthly inequality level of the contributions from logged still maintains the same values as in previous years. Thus, this indicates that either the inequality of the distribution of revisions maintains the present level (in which case the authors would not be able to address so many articles than in previous years) or else, that the inequality level of this distribution will continue to grow, until core authors begin to find their natural limit in the maximum number of revisions performed and number of different articles reviewed.
5.1.2 Sustainability conditions
The main conclusion that we can infer from the overall results of our quantitative analysis is that there exists a severe risk in the top-ten language versions of Wikipedia, about maintaining their current activity level in due course. According to our graphs and numbers, the inequality level of the contributions from logged authors is becoming more and more biased towards the core of very active authors. At the same time, the monthly Gini coefficients show that the inequality level of contributions from logged authors has remained stable over time, at the cost of demanding more and more contributions from active authors to alleviate this deficit of monthly revisions.
Furthermore, we have seen that the distribution of the total number of revisions per author follows an upper truncated Pareto distribution. While more core authors begin to reach the upper limit of their human contribution capacity, we will see a point in the future of this language versions in which the steady-state of the monthly Gini coefficient will start to decrease. This situation would not pose a problem in itself, unless for the fact that we have demonstrated that the most significant part of the content creation effort in Wikipedia is not undertaken by casual, passing-by authors, but by members of the core of very active contributors.
On top of that, the lack of new core members seriously threaten the scalability of the top-ten language versions regarding the quality of their content. We have demonstrated in the analysis previously presented that the eldest, top-active contributors are responsible for the majority of revisions in FAs, as well. Since the number of core authors has reached a steady-state (due to the leverage in the total number of active authors per month), the group of authors providing the primary source of effort in the revision of quality articles has stalled. Without new core members, the number of different articles who would potentially become FAs can not expand, since we do not have enough revisors for that content. Since the total number of quality articles generated so far in the top-ten language editions is fairly low, we can conclude that this approach will not contribute to dynamize the creation of quality content in Wikipedia in due course. It is true that Wikipedia has succeeded to compete with other traditional encyclopaedias, namely Britannica [44], but if we do not have a clear strategy for making the creation of quality content in Wikipedia more agile, the project will not ever evolve from its current character of “good starting point to look for a quick introduction of a new topic, from which we can jump to more serious information sources”.
To conclude this section, it would be disappointing to avoid offering some insights about possible solutions for the top-ten Wikipedias to improve their current trend. Nevertheless, some of the knowledge needed to formulate such recommendations could be perfectly a matter for a doctoral thesis on its own, namely the causes driving Wikipedia authors to eventually join the core of very active users. Since we have not answered such questions, we can simply settle for enumerating direct countermeasures to alleviate these findings.
In the first place, incrementing the number of core authors should become a priority for the project, and as a first step, Wikipedia should focus increasing the number of monthly active authors. Indeed, donations campaigns are necessary to aid in the financial support of the project, but attracting new contributors or recovering older ones should be an equally important goal, given the current situation. Apparently, a lot of work still has to be done, not only to create new articles, broadening Wikipedia coverage, but also revising current articles to let them reach the FAs distinction at some point. Whether the influence of featuring some of these quality articles in the main page may have a direct influence in the number of revisions received, it is undoubtly that content featured in the main page of every language versions at least obtains superior visibility in the community. A good idea could then promote “candidate articles” on the main page, thus favoring the reception of new revisions. Many times, users do not know about the existence of articles until they are featured in the main page, or else, until they need to access them explicitly. In the same way, we recommend to display a “randomly selected” article (instead of the current approach of providing a simple link), to try and increase the number of revisions received in standard articles, as well.
Since the importance of the core of very active members has been demonstrated, thinking about possible tools to further automate their daily tasks, thus facilitating their normal activities, should also be taken into account. We know about current useful tools made with this goal in mind, but perhaps trying to recollect new ideas and suggestions from these users could be another option. Since Wikipedia is an open community, it would be quite difficult to further reduce vandalism, and the access of trolls and other undesirable contributors to articles and talk pages. Moreover, previous research works has demonstrated that these acts of vandalism against content or the community itself has been effectively controlled with the current approaches.
Finally, we can not ignore the potential benefits of large scale contributions coming from specific communities, specially from educational institutions at all levels. The potential applications of Wikipedia to learning environments has been also a matter of research, and some authors have shown that direct contribution approaches may have negative consequences for both the quality of content and the willingness of young authors to continue to contribute if the get strictly negative responses to their first revisions. All the same, semi-controlled strategies like providing a final version of the contribution, may have better effects for both the quality of content and maintaining the implication of young contributors. In this regard, providing special tools for highlighting these contributions could facilitate the work of experienced Wikipedia authors, who could then provide more focused comments.