On Thu, Aug 20, 2009 at 6:36 PM, Robert Rohde rarohde@gmail.com wrote:
On Thu, Aug 20, 2009 at 2:10 PM, Anthonywikimail@inbox.org wrote:
"if one chooses a random page from Wikipedia right now, what is the probability of receiving a vandalized revision" The best way to answer
that
question would be with a manually processed random sample taken from a pre-chosen moment in time. As few as 1000 revisions would probably be sufficient, if I know anything about statistics, but I'll let someone
with
more knowledge of statistics verify or refute that. The results will
depend
heavily on one's definition of "vandalism", though.
Only in dreadfully obvious cases can you look at a revision by itself and know it contains vandalism. If the goal is really to characterize whether any vandalism has persisted in an article from any time in the past, then one really needs to look at the full edit history to see what has been changed / removed over time.
I wouldn't suggest looking at the edit history at all, just the most recent revision as of whatever moment in time is chosen. If vandalism is found, then and only then would one look through the edit history to find out when it was added.
Of course, this assumes a valid methodology. Using "admin rollback, the undo function, the revert bots, various editing tools, and commonly used phrases like "rv", "rvv", etc." to find vandalism is heavily skewed
toward
vandalism that doesn't last very long (or at least doesn't last very many edits). It's basically useless.
Yes, as I acknowledged above, "new vandalism".
"New vandalism" which has not yet been reverted wouldn't be included.
My personal interest is also skewed in that direction. If you don't like it and don't find it useful, feel free to ignore me and/or do your own analysis.
I do. I also feel free to criticize your methods publicly, since you decided to share them publicly.
Anyway, I provided my data point and described what I did so others could judge it for themselves. Regardless of your opinion, it addressed an issue of interest to me, and I would hope others also find some useful insight in it.
And I presented my criticism, which hopefully other will find some useful insight in as well.