[Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

Nathan nawrich at gmail.com
Fri Aug 28 02:07:52 UTC 2009


On Thu, Aug 27, 2009 at 9:47 PM, Anthony <wikimail at inbox.org> wrote:

> Just took a quick sample of 10 instances of vandalism to [[Ted Stevens]].
>  Of those 10 instances of vandalism, either 2 or 4 would not have been
> found
> by the automated tool described.  2 if every edit summary containing the
> word "vandalism" is counted as vandalism, and 4 if not.  The former would
> probably significantly overcount vandalism.
>
>
> http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=173527553&oldid=173381871
> (Removed
> vandalism)
>
> http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=180054904&oldid=179982198
> (rmv
> vandalism)
>
> http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=168486242&oldid=168438600
> no
> edit summary
>
> http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=162332870&oldid=162038733
> (yes
> it is funny, but this doesn't belong here)
>
> On Thu, Aug 27, 2009 at 9:31 PM, Thomas Dalton <thomas.dalton at gmail.com>
>  wrote:
>
> > 2009/8/28 Anthony <wikimail at inbox.org>:
> > > I suggested a better approach last time we had this thread: statistical
> > > sampling.
> >
> > This research was based on a sample. What are you talking about?
>
>
> I'm talking about taking a sample and examining it manually.  First, spend
> a
> few weeks coming up with an objective definition of vandalism.  Then pick
> 5,000 random article views from the http log, and publish the
> URL/date/time.
>  Then advertise the list all over the place (especially on sites like
> Wikipedia Review) asking people to find instances of vandalism in it.
>  People can use automated means which they then go through by hand to
> remove
> false positives, manual error checking, spot checking, whatever.  The
> number
> of confirmed instances of vandalism will grow for a while, and eventually
> will start to level off.
>
> May not be perfect, but it'll provide a lower bound on the amount of
> vandalism, at least.  Have a statistician tell us what our exact error
> bounds are.  And then prepare for a second study, improving on everything
> (the definition of "vandalism", the number of random article views, the
> amount of time to wait) based on what we learned.
>


Out of curiosity, Anthony, do you still refrain from editing Wikimedia
projects over licensing
issues? How long has it been, a year?

Nathan


More information about the foundation-l mailing list