[WikiEN-l] Dealing with crap deletion nominations

David Gerard fun at thingy.apana.org.au
Mon Jan 23 00:24:56 UTC 2006


David Gerard (fun at thingy.apana.org.au) [060123 10:52]:

> Gmaxwell has left the project, which is annoying for many reasons, and this
> reason is because the scripts and such he was running are on the
> toolserver. But here's two diagrams to start you off:
> 
>     http://en.wikipedia.org/wiki/Image:Articles_distinct_histo.png
> 
> That's how many editors per article. Note that ALMOST ALL articles have one
> editor, maybe two. And only 200 or so have over 100 editors (123 over 1000
> editors), and almost all those are way overedited for the wiki process
> (e.g. G.W. Bush, which is THE busiest article on the wiki and arguably
> pathological).
> 
>    http://en.wikipedia.org/wiki/Image:Articles_distinct.png
> 
> - same thing as above in CDF. Breakdown of editors per article.


I just asked Kim's permission to quote from him on #wikimedia last night.
At the end is the URL with the original data, bz2 compressed (16MB file,
expands to about 70-odd MB).

I've deleted the words of everyone except Kim and myself. Also, Kim wants
to note that:

[23:58] <kim_register> you can take my irc discussion too if you like
[23:59] <kim_register> though note that I'm a lot less collected on irc
[23:59] <kim_register> so with disclaimer
[23:59] <kim_register> and I reserve the right to not mean half of what
I said ;-)

Note also that all credit for the heavy lifting on this one goes to
gmaxwell.


- d.



[23:41]  <kim_register> I have something cool to show him that gmaxwell
drew ^^;;
[00:03]  <kim_register>
http://en.wikipedia.org/wiki/Image:Articles_distinct_histo.png
[00:03]  <kim_register> BINGO!
[00:03]  <kim_register> most articles are only edited by a couple of users
[00:03]  <kim_register> do you see, do you SEE how far left that single
peak is?
[00:03]  <kim_register> it's almost invisible
[00:03]  <kim_register> OMG this is cool!
[00:03]  * kim_register raves
[00:04]  <kim_register> death to all the wikidemocracy folks
[00:04]  <kim_register> long live consensus!
[00:04]  <kim_register> (well, maybe not death to them, but...)
[...]
[00:05]  <kim_register>
http://en.wikipedia.org/wiki/Image:Articles_distinct.png
[00:05]  <kim_register> heres a [[CDF]] view
[00:06]  <kim_register> GOAAAL
[00:06]  <kim_register> I mean wooohooo
[00:06]  <kim_register> sorry
[00:06]  <kim_register> :-)
[00:06]  <kim_register> I'm so happy :-)
[00:07]  <kim_register> thank Gmaxwell too :-)
[00:19]  <kim_register>
http://en.wikipedia.org/wiki/Image:Articles_distinct_loghisto.png
[00:19]  <kim_register> here's a scaled one
[00:19]  <kim_register> it's LOG SCALED, so be DARN careful interpreting
it, if you're not used to log scaling
[00:20]  <kim_register> I'm noting how many editors edit each article
[00:20]  <kim_register> This basically shows that most articles have few
editors
[00:21]  <kim_register> brian0918, I'm using them to show that consensus
will still work
[00:21]  <kim_register> and the whole voting thing is silly
[00:21]  <kim_register> so we don't need to make drastic policy changes
"because the wiki is too big now"
[00:21]  <kim_register> all that is pure nonsense, is what these graphs
prove
[00:21]  <kim_register> the wiki segments conversations into small
managable areas that can be handled by consensus
[00:22]  <kim_register> brian0918, well, we'll do some more graphs
[00:22]  <kim_register> but these are pretty darn indicative already
[00:22]  <kim_register> I don't expect to see drastically different results
elsewhere
[00:22]  <kim_register> if it's this clear here
[00:22]  <kim_register> it's not going to differ by orders of magnitude
elsewhere :-)
[00:25]  <kim_register> oh gosh
[00:25]  <kim_register> anyway, I love these graphs
[00:27]  <kim_register>
http://en.wikipedia.org/wiki/Image:Articles_distinct_logcdf.png
[00:27]  <kim_register> tricky to read one
[00:27]  <kim_register> (01:27:05) nullc: well there really is only one
thing to say..
[00:27]  <kim_register> (01:27:12) Kim Bruning: yeah
[00:27]  <kim_register> (01:27:17) nullc: that articles have few editors.
[00:27]  <kim_register> (01:27:21) Kim Bruning: *grin*
[00:32]  <kim_register>     * 67.20% have been edited by less than 10
distinct Users/IPs.
[00:32]  <kim_register>     * 86.07% have been edited by less than 20
distinct Users/IPs.
[00:32]  <kim_register>     * 91.90% have been edited by less than 30
distinct Users/IPs.
[00:32]  <kim_register>     * 99.21% have been edited by less than 150
distinct Users/IPs.
[00:33]  <kim_register> 01:33:12) nullc: 72% of the articles have 11
editors or less.
[00:36]  <kim_register> so we'll throw out all articles with > 200 editors
[00:36]  <kim_register> that's maybe 100 or so
[00:36]  <kim_register> and then see the new graphs
[00:36]  <kim_register> this is kinda cool ;-)
[00:37]  <kim_register> I'm willing to bet that the >200 editor set also
has trouble with NPOV
[00:37]  <kim_register> as well as many other things
[00:46]  <kim_register> Ok, so based on this alone
[00:46]  <kim_register> normal wikipedia policy should be consensus
consensus consensus
[00:46]  <kim_register> large pages can best be split and transcluded
[00:47]  <kim_register> transcluded sections will survive AFD anyway, since
even those will have like 50 editors or so
[00:47]  <kim_register> this way we can keep a consistent and coherent
policy across the entire encyclopedia namespace
[00:47]  <kim_register> now we need to consistentize in wikipedia namespace
[00:47]  <kim_register> this has low priority in theory
[00:47]  <kim_register> but is going to be tricky in practice
[00:47]  <kim_register> but like WOOT
[00:47]  <kim_register> love those numbers!
[00:47]  <kim_register> yippee!
[00:48]  <kim_register> we don't need to change much at all, even though
we've grown so much!
[00:50]  <kim_register> well I need to get some cooperative fire support
[00:50]  <kim_register> you're already helping keep them down ;-)
[00:50]  <kim_register> I'll have to recruit some more :-)
[00:50]  <kim_register> I'll start one by one :-)
[01:18]  <DavidGerard> kim_register: that's fantastic to read
[01:19]  <DavidGerard> and squares with my experience also: articles are
either contentious or ghost towns
[01:19]  <DavidGerard> with a few in between
[01:25]  <kim_register> yes
[01:25]  <kim_register> we're going to have so much fun fixing policy this
year :-)
[01:25]  <kim_register> I'd love to have special dispensation from jimbo
though
[01:25]  <kim_register> otherwise it's going to be hell fighting through
wikinomic
[01:26]  <kim_register> alternately we can just start policy from scratch
[01:26]  <kim_register> though that too might have some downsides...
[01:26]  <kim_register> but basically, the graphs show that wikipedia
policy will still work
[01:26]  <kim_register> and that it scales well
[01:26]  <kim_register> our job is to make sure it STAYS that way ;-)
[01:29]  <kim_register> DavidGerard, I have the base dataset in my mailbox
now. Do you want me to forward you a copy?
[01:31]  <DavidGerard> nonono, i believe your graphs ;-) what I'm
interested in is your writeup
[01:31]  <DavidGerard> kim_register: jimbo is just a little tired of
wikinomic
[01:32]  <DavidGerard> the stupidity and shittiness of afd is now causing
wikipedia lots of real world problems
[01:32]  <DavidGerard> i.e., the foundation is getting a lot of mail from
the aggrieved
[01:32]  <DavidGerard> and then the fuckwits tried CFDing [[Category:Living
people]] or whatever it was called
[01:32]  <DavidGerard> and jimbo killed the CFD
[01:32]  <DavidGerard> and it was *recreated*
[01:32]  <DavidGerard> THREE TIMES
[01:33]  <DavidGerard> and jimbo saw what moronic shit was said on that CFD
[01:33]  <DavidGerard> and thunder and lightning came down from the
heavens.
[01:33]  <kim_register> hello mindspillage , greg is messing with lovely
pristine numbers. Maybe might be fun if you could help him out :-)
[01:33]  <kim_register> DavidGerard, *ghrin*
[01:33]  <kim_register> DavidGerard, what kinda writeup are you expecting
btw?
[01:33]  <DavidGerard> AFD is about to discover that the rest of wikipedia
does in fact have a few things to say about the standard of behaviour
there.
[01:34]  <kim_register> and I mean, there's a zillion bits of info :-)
[01:34]  <kim_register> that we haven't explored yet
[01:34]  <kim_register> but the graph with the single peak all the way left
..
[01:34]  <kim_register> that does it for me
[01:34]  <kim_register> DavidGerard, I'm going to propose killing all > 100
participant pages, one way or the other
[01:34]  <DavidGerard> kim_register: hah!
[01:34]  <kim_register> DavidGerard, still need data for wikimedia
namespace to be sure that's a great idea
[01:35]  <DavidGerard> permanent semiprotect?
[01:35]  <kim_register> no, I mean kill
[01:35]  <kim_register> as in mark as deprecated
[01:35]  <DavidGerard> hmm.
[01:35]  <kim_register> use a different system
[01:35]  <DavidGerard> :-O
[01:35]  <DavidGerard> gosh!
[01:35]  <DavidGerard> case my case please
[01:35]  <DavidGerard> case by case i mean
[01:35]  <DavidGerard> but may basically be a good idea
[01:36]  <kim_register> Well, there are roughly 100-200 cases to consider
[01:36]  <kim_register> guesstimating by current data
[01:36]  <kim_register> actually probably much less
[01:36]  <kim_register> articles as a general case can be split up and
transcluded
[01:37]  <kim_register> leaving only wikipedia namespace stuff
[01:37]  <kim_register> the latter hasn't been looked at
[01:37]  <kim_register> not yet
[01:37]  <kim_register> so these are prelim findings so far :-)
[01:37]  <kim_register> wikipedia namespace will need some case by case
work on splitting, yes :-)
[01:38]  <kim_register> for all the *FDs, we can replace the whole lot with
true wiki deletion I'll wager
[01:38]  <kim_register> some people will hate it :-)
[01:38]  <kim_register> mindspillage, we'd need to check stats on that :-)
[01:38]  <kim_register> DavidGerard, but it would be too early to say
things about wikipedia namespace
[01:39]  <kim_register> in any case, on articles, there should be no
problems even now
[01:39]  <DavidGerard> well. gosh!
[01:40]  <DavidGerard> i eagerly await your writeup ;-)
[01:42]  <kim_register> LOL
[01:42]  <kim_register> I HATE WRITEUPS
[01:43]  <DavidGerard> cut'n'paste your comments here. or get greg to write
it up ;-)
[01:47]  <DavidGerard> if you make preliminary notes and make your data
available (as available as you can), then that will be enough for fun
[01:47]  <kim_register> grin

The data file: http://bruning.xs4all.nl/~kim/kim_query1.bz2

[02:10]  <kim_register> page_id | distinct_editors | oldest_revision |
rev_count |
[02:10]  <kim_register> 2004_distinct | 2004_rev_count | 2003_distinct |
2003_rev_count |
[02:10]  <kim_register> 2002_distinct | 2002_rev_count
[02:11]  <kim_register> (this is the key to the bz2 file)




More information about the WikiEN-l mailing list