I think this conversation isn't going anywhere useful because everyone is using the same words but with different meanings. In particular "quality" ...

Edits can do a range of things (and often more than 1). The edits might relate to:
* the information content of an article (add/edit/remove propositions -- facts if you prefer -- about the topic, e.g. Fred Smith was born in 1770)
* the references (add/edit/delete the external sources that support a proposition)
* the presentation of the article, e.g. Structure of the article, spelling, grammar, appropriately nuanced selection among synonyms, clear prose, conformance to the Manual of Style, wikifying, etc

Each of these can have some kind of quality metrics attached (although most will be somewhat subjective -- "an article in the New York post is a better quality source than ...").

At the moment many in this conversation are using GA as the only quality metric. But I think we should see this as a goal not a binary "quality / not quality" metric. To achieve GA, clearly you need facts, verification, and presentation all in both quantity and quality.

Next who is an IP? Well, we know that IPs don't necessarily map to individual people and individual people do not map to a single IP. An IP edit might be done by someone who is a registered user (but too lazy to login -- I'm guilty of that), who may later become a registered user, or who may never be a registered user. 

I postulate that good faith IP edits are predominantly small edits of facts or localized edits of presentation (eg spelling). I postulate edits of logged-in users would be both large and small and involve facts, references and presentation, although clearly individual users may have their own particular profiles of edit behavior.

In particular to get an article to GA, you need one (or just a few people) to polish the writing (presentation). Getting a super-readable document with many voices is very difficult. Therefore I would expect that the final push to achieve GA would inevitably involved registered users and not IPs.

Also GA status is a concept and process that is very much "insider" knowledge about WP. Anonymous editors and low-activity editors are unlikely to have even heard about GA status so therefore are not going to be working toward it. Only the very active insiders would see it as their goal and therefore work towards it. So I think it is pointless to discuss contribution to quality in terms of who gets an article to GA status.

I think we do better to ask the question about the quality of an edit (or the set of edits done by a particular user) in terms of whether it adds "correct" information, references that support information, improves presentation. If someone adds a "fact" and that edit is later obliterated by a rewrite of a section but the information is retained (albeit in a different presentation), the original edit was still good quality even if it doesn't survive as a string of characters. I think the use of "edit survival" to measure the quality of an edit is failing to distinguish between information content and presentation, but I acknowledge that "edit survival" is easily measured and "information content survival" is not, but be cautious about using one as the proxy for another.

I think qualitative assessment of a set of randomly selected articles which analyses the contribution made by each individual edit in terms of:
* the quality of the article as it was immediately before and after the edit (immediate contribution)
* the quality of the article as it is today (overall contribution)

Is more likely to come up with better answers to the question of the contributions of anonymous edits, relative to low activity user editors, relative to high activity user editors.

For the purposes of this conversation, I am ignoring vandalism (and other bad faith behaviour) and edits to reverse them. 

Sent from my iPad

On 01/11/2012, at 10:08 AM, Laura Hale <laura@fanhistory.com> wrote:

On Thu, Nov 1, 2012 at 9:14 AM, Piotr Konieczny <piokon@post.pl> wrote:
I agree, having a high number of edit does not signify creating high quality content - it may only attest to the high use of semi-automated tools for minor edits.

I also don't dispute that anon's can contribute high quality content, and they do a lot of edits. My point was:
* anon's don't contribute significantly to most content on Wikipedia that gets peer reviewed (as Pierre noted, by that time they've probably registered anyway);
* hence majority of Wikipedia's GA+ content is not written by anonymous editors (but the GA+ content is only a small percentage of Wikipedia's total content);

Do you have any evidence for  anons don't contribute significantly to content that gets peer reviewed?  The reason it would appear they are not involved in processes is because more often than they expressly prohibited from doing so.  The implication here could be: IP addresses are contributing GA level content but regular contributors are not monitoring articles where IP addresses are doing lots of work and regular contributors are not supporting taking of the work to the highest level.

http://toolserver.org/~daniel/WikiSense/Contributors.php?wikilang=en&wikifam=.wikipedia.org&grouped=on&page=Samantha_Stosur is one of the more active articles (which is admittedly crap) with a high IP address ratio.  There are several highly active Wikipedia editors contributing to it. 463 of the 749 editors are IP addresses.  Still, total edits by registered editors outnumbers unregistered editors with 1,150 total edits to 1,175.  Despite this, the volume of contributors are not actually resulting in edits that work towards improving assessment.

A better analysis could be something like this: IP addresses are more likely to represent a large editing population on an article that has higher visibility and more traffic.  The quality of the contributions to these articles is universally poor for registered and unregistered users.  At the same time, wikipedia processes favour articles that have less visibility and where there is less inherent conflict.  The necessity of covering a topic comprehensively also serves as a barrier to taking these higher visibility articles to GA as this is a challenge, and serves as a discouraging factor for taking an article through processes.  GA, Peer Review and FAC favour more narrow topics that are less visible and get less traffic.  This type of article is likely to have a much small editing pool, and less likely to be found by IP address editors.  (Example: Tennis articles have more IP address edits than articles about sport shooting.)  This means IP addresses are less likely to be actively contributing to these articles.  As processes implicitly lock them out, there is little reason for these users to improve per guidelines on these less visible articles.

Laura Hale

twitter: purplepopple
blog: ozziesport.com

Wiki-research-l mailing list