Hi all, Hope to see many of you at Wikimania next week (yes, it's only one week away). I want to propose some time is carved out for a BOAF session for wiki researchers. Seems Friday and Sunday eves are free, or it could be Thursday before things get started. Here are some issues I'd love to talk to other folks about, please feel free to add: 1. Heuristics for recognizing patterns in edit histories. Most pressing is an algorithm to determine what constitutes an edit war, vandalism or any other type of "noise" in the system if one's measuring "substantive" edits. (This is hard - even the "I'll know it when I see it" method is problematic, as evidenced by the recent dispute with and departure of RickK.) Much of the research myself, Jakob Voss, Cathy Ma and others do depend on analyzing edit histories and drawing conclusions about article quality. So far, none of the research I've seen has "factored out" the effect of edit wars and vandalism. 2. Classifying types of edits, using diffs or edit summaries. There is a desire to qualitative lexical (spelling, punctuation), factual (numbers, dates), organizational (rearranging), prose (style, tense change), etc. What are the best practices in detecting and classifying these? 3. Comparative approaches to reserach and modelling article "clusters." Last year, while comparing entries in a print encyclopedia against the categories in Wikipedia, the toughest part was trying to match up the taxonomical classifications, and the variation in breakdown into subtopics. How are people dealing with this mapping? Please add to the list, and I will help assemble. -Andrew Lih University of Hong Kong
Hi Andrew,
A fine idea. Please add any discussion topics here, linking each section to a page devoted to that topic: http://meta.wikimedia.org/wiki/Wikimania:Discussions
Thursday, Friday, and Sunday evenings are pretty free. Friday there will be a few fun things going on. (Trivia contest, story-telling, etc)
SJ
On 7/29/05, Andrew Lih andrew.lih@gmail.com wrote:
Hi all,
Hope to see many of you at Wikimania next week (yes, it's only one week away).
I want to propose some time is carved out for a BOAF session for wiki researchers. Seems Friday and Sunday eves are free, or it could be Thursday before things get started.
Here are some issues I'd love to talk to other folks about, please feel free to add:
- Heuristics for recognizing patterns in edit histories. Most pressing is
an algorithm to determine what constitutes an edit war, vandalism or any other type of "noise" in the system if one's measuring "substantive" edits. (This is hard - even the "I'll know it when I see it" method is problematic, as evidenced by the recent dispute with and departure of RickK.) Much of the research myself, Jakob Voss, Cathy Ma and others do depend on analyzing edit histories and drawing conclusions about article quality. So far, none of the research I've seen has "factored out" the effect of edit wars and vandalism.
- Classifying types of edits, using diffs or edit summaries. There is a
desire to qualitative lexical (spelling, punctuation), factual (numbers, dates), organizational (rearranging), prose (style, tense change), etc. What are the best practices in detecting and classifying these?
- Comparative approaches to reserach and modelling article "clusters." Last
year, while comparing entries in a print encyclopedia against the categories in Wikipedia, the toughest part was trying to match up the taxonomical classifications, and the variation in breakdown into subtopics. How are people dealing with this mapping?
Please add to the list, and I will help assemble.
-Andrew Lih University of Hong Kong
Wiki-research-l mailing list Wiki-research-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wiki-research-l
Andrew Lih wrote:
Hi all,
Hope to see many of you at Wikimania next week (yes, it's only one week away).
I want to propose some time is carved out for a BOAF session for wiki researchers. Seems Friday and Sunday eves are free, or it could be Thursday before things get started.
Here are some issues I'd love to talk to other folks about, please feel free to add:
- Heuristics for recognizing patterns in edit histories. Most pressing
is an algorithm to determine what constitutes an edit war, vandalism or any other type of "noise" in the system if one's measuring "substantive" edits. (This is hard - even the "I'll know it when I see it" method is problematic, as evidenced by the recent dispute with and departure of RickK.) Much of the research myself, Jakob Voss, Cathy Ma and others do depend on analyzing edit histories and drawing conclusions about article quality. So far, none of the research I've seen has "factored out" the effect of edit wars and vandalism.
Revert wars and near-revert-wars are probably easier to algorithmically identify than other types of edit wars. How do we distinguish between the case of two very active editors working very pleasantly together in a back and forth session of mutual improvement and reinforcement versus two very active editors working unpleasantly together in a back and forth session of mutually reinforcing downward spiral of useless edits?
I think it's pretty hard to do... algorithmically.
As Andrew suggests, we all do this all the time in our own private evaluations of what is going on. We know that person X is a jerk, and a problematic editor, and so is person Y, so when we see them going crazy on an article, we know it is bad news. But if we see Angela and Andrew Lih both quickly and repeatedly editing an article, we know it is probably good news.
Aggressive edit wars aren't always destructive to the final article. Often articles improve dramatically when editors are forcing each other to document every assertion and push forward. The best ways to measure the progress of editting algorythmically are in
1. The number of links out. Growing articles, even hostille ones, tend to have increasing link density, as new concepts are added in. 2. The number of links in. Growing articles, even hostile ones, tend to have increasing traffic in from the article space. 3. Number of links from non wiki pages. If an article is getting a large number of links from talk pages which also have recent conjugate edits, this is a very good sign that discussion has broken down on the page. 4. Edit wars generate RFCs, talk page comments and so on. Also measure on the talk page the "chili ranking" of what is on the talk page. Destructive edit wars are accompanied by links to wiki policies or citations there of. Or in otherwords, the more often NPOV is mentioned in close proximity to other wiki policies, the more likely discussion has broken down. People who are editting well generally have better things to talk about, even if they are arguing about them.
The best way to find out if you don't have enough RAM, is to measure hard drive thrash. The best way to measure edit wars is by the amount of "thrash" that is being generated: disappearing links in or out, talk page links with conjugate edits, high correlation to mailing list in references, mentions of wikipolicies on talk page.
On Jul 29, 2005, at 9:27 PM, Jimmy Wales wrote:
Andrew Lih wrote:
Hi all,
Hope to see many of you at Wikimania next week (yes, it's only one week away).
I want to propose some time is carved out for a BOAF session for wiki researchers. Seems Friday and Sunday eves are free, or it could be Thursday before things get started.
Here are some issues I'd love to talk to other folks about, please feel free to add:
- Heuristics for recognizing patterns in edit histories. Most
pressing is an algorithm to determine what constitutes an edit war, vandalism or any other type of "noise" in the system if one's measuring "substantive" edits. (This is hard - even the "I'll know it when I see it" method is problematic, as evidenced by the recent dispute with and departure of RickK.) Much of the research myself, Jakob Voss, Cathy Ma and others do depend on analyzing edit histories and drawing conclusions about article quality. So far, none of the research I've seen has "factored out" the effect of edit wars and vandalism.
Revert wars and near-revert-wars are probably easier to algorithmically identify than other types of edit wars. How do we distinguish between the case of two very active editors working very pleasantly together in a back and forth session of mutual improvement and reinforcement versus two very active editors working unpleasantly together in a back and forth session of mutually reinforcing downward spiral of useless edits?
I think it's pretty hard to do... algorithmically.
As Andrew suggests, we all do this all the time in our own private evaluations of what is going on. We know that person X is a jerk, and a problematic editor, and so is person Y, so when we see them going crazy on an article, we know it is bad news. But if we see Angela and Andrew Lih both quickly and repeatedly editing an article, we know it is probably good news.
Wiki-research-l mailing list Wiki-research-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org