Hello,
I have distinguished four ways of counting Wikipedians: - Wikimedia Statistics, with "Wikipedians", "active" and "very active users"; like often, Zachte's Statistics are great, but easily misleading. -Looking at user pages with babel lists; but not all active people have babel lists (or user pages or are registered), and some people's only edit at all is creating a user page with a babel list. Often there are many babel lists indicating level zero, sometimes even more than native speakers. - Asking Wikipedians about what they know or what they estimate. For that, a definition is important, of course, especially for the bigger WPs. The small ones have few fluctuation. - Counting them according to the edits people make.
I have tried to outline a workable definition, as I explained. My observations at Recent Changes show that in many tiny WPs (I call them Micro-WPs) most of the activity is vandalism, countervandalism and bot activity, mostly interwiki linking. The interwiki linking relates usually to "geographical stubs". This is true also for nearly all human "Foreign helpers": They took a picture of their home town, put it in Commons, and integrate it into articles of all language editions of that town (and the like). So, without the bot generated pseudo content there would hardly be any activity at all.
In my definition it is not important whether a foreign helper is a native speaker, he can also contribute with a lower level. If necessary, I look at the kind of edits. In nearly all cases it was very obvious whether the edit was made knowing the language or not. (Certainly if considered only editors with at least 10 edits.) For example, I am not a native speaker of Dutch, and do not often contribute to nl.WP, but according to my edits and my definition I am a "regular contributor" of nl.WP, not a Foreign helper.
Take vo.WP for example. According to WM Statistics, it has ca. 16 "very active users" a month. According to the babel lists, two persons indicate "level 2", and three "level" 1. 58 incidate "zero". Recent changes show that content contributions come only from the five people "knowing" Volapük.
My own concern with my definition is that it I should raise the minimum number of edits of a regular contributor. Also the period of observation should be longer. But that would make it more work to do the observation; counting ten edits is faster than using the "user edit counter". Maybe a developer could create a tool that simplifies the work, with a human being only to be needed for telling who is a content contributor and not a Foreign helper.
Ziko
P.S.: I must say that I find some reactions on this mailing list a little bit strange. I am simply asking what you think about my definition of a regular contributor, trying to get a better picture of Wikipedia language editions in comparison.
I am willing to explain what I mean by this or that expression, and I stand open for all kind of suggestions to improve the definition. (Yes, a definition is finally subjective and depends on the researcher's interests.) Although I have become familiar with a number of language editions, I believe that the members of this mailing list know al lot about the issue and have ideas; and I received some good ideas for which I am grateful.
But I do not see where I am "dividing the community" or "imagine it too simple". Of course I present things first in a short version, that does not mean that I have not thought them through before asking others. (Maybe I understood some remarks wrongly, and vice versa.)
2008/10/22 Han-Teng Liao (OII) han-teng.liao@oii.ox.ac.uk
Dear Ziko, No worries about limitations. The rule is usually simple. Acknowledge them or overcome them, but do not hide them. Still, I am not sure if your goal is a method to be applied by all Wikipedia researchers, you can do without strong empirical data. A universal method requires strong evidence, robust mechanism, or compelling story. May I suggest you if you know vls.WP version so well, you might want to start a model from that and collect necessary data for that particular version. Do not assume you will find no problem in the process. Since your methods seem to be very quantitative, you can try to start small from that. The time-edit distribution (71->80) explanation seems plausible, and that is exactly what I have suggested earlier about determining the threshold from the actual distribution. You might not have the whole distribution at this moment, but it sounds much better if you at least provide a concrete example to explain why you pick that number. Still, your definition will be much more definitive if you have solid overall data, previous study, etc. The more supporting material you have, the stronger the threshold number that you pick. (you then can change "may be" into "more likely") Again, as for the foreign helpers, I do think it depends on contexts and the questions you are asking. Try to think how do you apply that model into minority language or dialect on other Wikipedia projects. It is not as simple as you imagine to be, such as Latin, Hakka, etc. Also, since the machine-translated content across Wikipedia, though not allowed, is still quiet common. You have to define what do you mean by foreign helpers or native contributors. It is not totally impossible for a foreign helper to have a native account. Some foreign helpers may read but does not write, so their contribution pattern may be different. Having said that, I guess on this point you can simply say that it is not of your research interest and treat them as outliers (as in quantitative methods). Do remember to document that you do so as you do. Some people get offended, I guess, because you seem to make a hasty generalization and a strong definition without enough evidence. The first version you propose "I calculate...." is very problematic in this regard. Research is always a balance between making things forward and solid steps. The suggestions that I made are not designed to slow you down or stop you, but rather a warm reminder that you jump too fast. Reagle's research uses the self-reported category of "active users" can provide some dimension on self-perception. It might be interesting to see how the two dimensions (perceived and edit frequency) match or mismatch in the future. It is through reviewing previous work that you can make solid advance, though sometimes it is felt to be a drag.....
hanteng
Ziko van Dijk wrote:
Dear Han-Teng,
Thank you for the substantial answer, which helps me to go on.
My problem is that my technical skills are limited, and I am also looking for methods that can easily be applied by all Wikipedia researchers (and to all WPs). There is no problem to tell how many "regular contributors" vls.WP has, because they are only three guys who know each other well.
I have counted with the help of "Recent Changes", and looked closer at those Wikipedians who did at least one edit in one specific week. Otherwise I would not have known where to look. Maybe I should look longer that a week (like three months and then drop the six-months-ago-first-edit-criterion), but that would mean a lot of more work, at least in those bigger Wikipedias.
I have chosen a minimum of 10 edits because Wikimedia Statistics does so for "Wikipedians". It seems enough to see wether a person (usually an I.P.) shows interest only in one specific article he wants to set right, but is not interested in editing after that. By the way, if I would shorten the six months (first edit) to three, the number of regular contributors would raise from 71 to 80. May be suitable as well.
I consider only speakers of the language concerned because only they can contribute sence having text (it does not matter whether they contribute a lot of content, but that they can do). The Foreign Helpers are very important, but secondary. They would not "exist" if speakers of the language had not created content etc. One cannot do interwiki linking and anti-vandalism if there is no WP or no article.
Ziko
2008/10/22 Han-Teng Liao (OII) <han-teng.liao@oii.ox.ac.uk mailto:han-teng.liao@oii.ox.ac.uk>:
Put the philosophical questions aside, "analytical" categories
(rather than
social categories) should be linked to your research questions.
Analytical
categories should thus not be universal in this sense, but rather
are tied
back to your research questions.
I guess it is better to say, "I develop a way to define a 'regular contributor'....in eo.WP" rather than "I calculated a..." because it
is not
a pure math calculation but a definition with your own making (and the following credits AND responsibility).
The below is a point-to-point critique and suggestions...
- made at least one edit in that week
--It seems arbitrary to come up with a number within a certain time
frame.
Again, if you can come up with a distribution of edits over
contributors,
either through previous study or your study, that the contributors
who match
your profile have made 75% of the new edits in the past month (the time frame issue still needs to be sorted out about the frequency of
edits), it
will be much convincing....
- obviously speaks Esperanto (is no "foreign helper" like someone who
does Interwiki linking) --If your research question is about actual content contributor in the strict sense, then you might "exclude" those foreign helpers.
However, you
have take that as limitation because you might lose those who provide foreign links then have real impact on the content. To my limited experience in Chinese Wikipedia, these happen quiet often in entries
and
issues that involve East Asian or Sino-US context.
- made his first edit at least six months ago
--Again, it seems arbitrary. If you can come up a distribution of
users'
contribution over time (i.e. frequency), you might be able to develop a matrix that can include certain amount of people that you call "regular contributors). You have to acknowledge that you exclude the newbies
with
this because you, again, cite previous research or use common sense, suggesting most of the newbies are not becoming "regular contributors". Still if you do so, you have to follow up on your research to see
whether it
is true that those newbies do become "regular contributors" will not
have
significant impact on your results and analysis.
- made at least ten edits at all
--Again, it seems arbitrary. Find the overall profile. Define your questions. Determine the selection threshold and be ready to defend
your
picks with previous research or common sense.
Ziko van Dijk wrote:
Hello,
From time to time I ask myself (and others) what is a "regular
contributor" to a Wikipedia language edition. According to "Tell us about your Wikipedia" the definitions are quite different. At eo.WP I once checked a week long (in this August) who was making edits, and I calculated a "regular contributor" if someone
- made at least one edit in that week
- obviously speaks Esperanto (is no "foreign helper" like someone who
does Interwiki linking)
- made his first edit at least six months ago
- made at least ten edits at all
My result was: 71, compared to 141 "active users" and 50 "very active users" (Wikimedia Statistics, May 2008). What do you think about this definition? Kind regards Ziko van Dijk
-- Liao,Han-Teng DPhil student at the OII(web) needs you(blog) _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org
mailto:Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Ziko van Dijk NL-Silvolde
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- *Liao http://zhongwen.com/cgi-bin/zipux2.cgi?b5=%E5%BB%96,Han http://zhongwen.com/cgi-bin/zipux2.cgi?b5=%E6%BC%A2-Teng http://zhongwen.com/cgi-bin/zipux2.cgi?b5=%E9%A8%B0* DPhil student at the OII http://people.oii.ox.ac.uk/hanteng/about/(web) needs you http://people.oii.ox.ac.uk/hanteng/(blog)
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l