[Foundation-l] Frequency of Seeing Bad Versions - now with traffic data
Nathan
nawrich at gmail.com
Fri Aug 28 02:07:52 UTC 2009
On Thu, Aug 27, 2009 at 9:47 PM, Anthony <wikimail at inbox.org> wrote:
> Just took a quick sample of 10 instances of vandalism to [[Ted Stevens]].
> Of those 10 instances of vandalism, either 2 or 4 would not have been
> found
> by the automated tool described. 2 if every edit summary containing the
> word "vandalism" is counted as vandalism, and 4 if not. The former would
> probably significantly overcount vandalism.
>
>
> http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=173527553&oldid=173381871
> (Removed
> vandalism)
>
> http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=180054904&oldid=179982198
> (rmv
> vandalism)
>
> http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=168486242&oldid=168438600
> no
> edit summary
>
> http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=162332870&oldid=162038733
> (yes
> it is funny, but this doesn't belong here)
>
> On Thu, Aug 27, 2009 at 9:31 PM, Thomas Dalton <thomas.dalton at gmail.com>
> wrote:
>
> > 2009/8/28 Anthony <wikimail at inbox.org>:
> > > I suggested a better approach last time we had this thread: statistical
> > > sampling.
> >
> > This research was based on a sample. What are you talking about?
>
>
> I'm talking about taking a sample and examining it manually. First, spend
> a
> few weeks coming up with an objective definition of vandalism. Then pick
> 5,000 random article views from the http log, and publish the
> URL/date/time.
> Then advertise the list all over the place (especially on sites like
> Wikipedia Review) asking people to find instances of vandalism in it.
> People can use automated means which they then go through by hand to
> remove
> false positives, manual error checking, spot checking, whatever. The
> number
> of confirmed instances of vandalism will grow for a while, and eventually
> will start to level off.
>
> May not be perfect, but it'll provide a lower bound on the amount of
> vandalism, at least. Have a statistician tell us what our exact error
> bounds are. And then prepare for a second study, improving on everything
> (the definition of "vandalism", the number of random article views, the
> amount of time to wait) based on what we learned.
>
Out of curiosity, Anthony, do you still refrain from editing Wikimedia
projects over licensing
issues? How long has it been, a year?
Nathan
More information about the foundation-l
mailing list