Hi all,
Hope to see many of you at Wikimania next week (yes, it's only one week away).
I want to propose some time is carved out for a BOAF session for wiki researchers. Seems Friday and Sunday eves are free, or it could be Thursday before things get started.
Here are some issues I'd love to talk to other folks about, please feel free to add:
1. Heuristics for recognizing patterns in edit histories. Most pressing is an algorithm to determine what constitutes an edit war, vandalism or any other type of "noise" in the system if one's measuring "substantive" edits. (This is hard - even the "I'll know it when I see it" method is problematic, as evidenced by the recent dispute with and departure of RickK.) Much of the research myself, Jakob Voss, Cathy Ma and others do depend on analyzing edit histories and drawing conclusions about article quality. So far, none of the research I've seen has "factored out" the effect of edit wars and vandalism.
2. Classifying types of edits, using diffs or edit summaries. There is a desire to qualitative
lexical (spelling, punctuation), factual (numbers, dates), organizational (rearranging), prose (style, tense change), etc. What are the best practices in detecting and classifying these?
3. Comparative approaches to reserach and modelling article "clusters." Last year, while comparing entries in a print encyclopedia against the categories in Wikipedia, the toughest part was trying to match up the taxonomical classifications, and the variation in breakdown into subtopics. How are people dealing with this mapping?
Please add to the list, and I will help assemble.
-Andrew Lih
University of Hong Kong