Hi all,
Hope to see many of you at Wikimania next week (yes, it's only one week
away).
I want to propose some time is carved out for a BOAF session for wiki
researchers. Seems Friday and Sunday eves are free, or it could be Thursday
before things get started.
Here are some issues I'd love to talk to other folks about, please feel
free to add:
1. Heuristics for recognizing patterns in edit histories. Most pressing is
an algorithm to determine what constitutes an edit war, vandalism or any
other type of "noise" in the system if one's measuring "substantive" edits.
(This is hard - even the "I'll know it when I see it" method is problematic,
as evidenced by the recent dispute with and departure of RickK.) Much of the
research myself, Jakob Voss, Cathy Ma and others do depend on analyzing edit
histories and drawing conclusions about article quality. So far, none of the
research I've seen has "factored out" the effect of edit wars and vandalism.
2. Classifying types of edits, using diffs or edit summaries. There is a
desire to qualitative
lexical (spelling, punctuation), factual (numbers, dates), organizational
(rearranging), prose (style, tense change), etc. What are the best practices
in detecting and classifying these?
3. Comparative approaches to reserach and modelling article "clusters."
Last year, while comparing entries in a print encyclopedia against the
categories in Wikipedia, the toughest part was trying to match up the
taxonomical classifications, and the variation in breakdown into subtopics.
How are people dealing with this mapping?
Please add to the list, and I will help assemble.
-Andrew Lih
University of Hong Kong