[WikiEN-l] Dealing with crap deletion nominations
David Gerard
fun at thingy.apana.org.au
Mon Jan 23 00:24:56 UTC 2006
David Gerard (fun at thingy.apana.org.au) [060123 10:52]:
> Gmaxwell has left the project, which is annoying for many reasons, and this
> reason is because the scripts and such he was running are on the
> toolserver. But here's two diagrams to start you off:
>
> http://en.wikipedia.org/wiki/Image:Articles_distinct_histo.png
>
> That's how many editors per article. Note that ALMOST ALL articles have one
> editor, maybe two. And only 200 or so have over 100 editors (123 over 1000
> editors), and almost all those are way overedited for the wiki process
> (e.g. G.W. Bush, which is THE busiest article on the wiki and arguably
> pathological).
>
> http://en.wikipedia.org/wiki/Image:Articles_distinct.png
>
> - same thing as above in CDF. Breakdown of editors per article.
I just asked Kim's permission to quote from him on #wikimedia last night.
At the end is the URL with the original data, bz2 compressed (16MB file,
expands to about 70-odd MB).
I've deleted the words of everyone except Kim and myself. Also, Kim wants
to note that:
[23:58] <kim_register> you can take my irc discussion too if you like
[23:59] <kim_register> though note that I'm a lot less collected on irc
[23:59] <kim_register> so with disclaimer
[23:59] <kim_register> and I reserve the right to not mean half of what
I said ;-)
Note also that all credit for the heavy lifting on this one goes to
gmaxwell.
- d.
[23:41] <kim_register> I have something cool to show him that gmaxwell
drew ^^;;
[00:03] <kim_register>
http://en.wikipedia.org/wiki/Image:Articles_distinct_histo.png
[00:03] <kim_register> BINGO!
[00:03] <kim_register> most articles are only edited by a couple of users
[00:03] <kim_register> do you see, do you SEE how far left that single
peak is?
[00:03] <kim_register> it's almost invisible
[00:03] <kim_register> OMG this is cool!
[00:03] * kim_register raves
[00:04] <kim_register> death to all the wikidemocracy folks
[00:04] <kim_register> long live consensus!
[00:04] <kim_register> (well, maybe not death to them, but...)
[...]
[00:05] <kim_register>
http://en.wikipedia.org/wiki/Image:Articles_distinct.png
[00:05] <kim_register> heres a [[CDF]] view
[00:06] <kim_register> GOAAAL
[00:06] <kim_register> I mean wooohooo
[00:06] <kim_register> sorry
[00:06] <kim_register> :-)
[00:06] <kim_register> I'm so happy :-)
[00:07] <kim_register> thank Gmaxwell too :-)
[00:19] <kim_register>
http://en.wikipedia.org/wiki/Image:Articles_distinct_loghisto.png
[00:19] <kim_register> here's a scaled one
[00:19] <kim_register> it's LOG SCALED, so be DARN careful interpreting
it, if you're not used to log scaling
[00:20] <kim_register> I'm noting how many editors edit each article
[00:20] <kim_register> This basically shows that most articles have few
editors
[00:21] <kim_register> brian0918, I'm using them to show that consensus
will still work
[00:21] <kim_register> and the whole voting thing is silly
[00:21] <kim_register> so we don't need to make drastic policy changes
"because the wiki is too big now"
[00:21] <kim_register> all that is pure nonsense, is what these graphs
prove
[00:21] <kim_register> the wiki segments conversations into small
managable areas that can be handled by consensus
[00:22] <kim_register> brian0918, well, we'll do some more graphs
[00:22] <kim_register> but these are pretty darn indicative already
[00:22] <kim_register> I don't expect to see drastically different results
elsewhere
[00:22] <kim_register> if it's this clear here
[00:22] <kim_register> it's not going to differ by orders of magnitude
elsewhere :-)
[00:25] <kim_register> oh gosh
[00:25] <kim_register> anyway, I love these graphs
[00:27] <kim_register>
http://en.wikipedia.org/wiki/Image:Articles_distinct_logcdf.png
[00:27] <kim_register> tricky to read one
[00:27] <kim_register> (01:27:05) nullc: well there really is only one
thing to say..
[00:27] <kim_register> (01:27:12) Kim Bruning: yeah
[00:27] <kim_register> (01:27:17) nullc: that articles have few editors.
[00:27] <kim_register> (01:27:21) Kim Bruning: *grin*
[00:32] <kim_register> * 67.20% have been edited by less than 10
distinct Users/IPs.
[00:32] <kim_register> * 86.07% have been edited by less than 20
distinct Users/IPs.
[00:32] <kim_register> * 91.90% have been edited by less than 30
distinct Users/IPs.
[00:32] <kim_register> * 99.21% have been edited by less than 150
distinct Users/IPs.
[00:33] <kim_register> 01:33:12) nullc: 72% of the articles have 11
editors or less.
[00:36] <kim_register> so we'll throw out all articles with > 200 editors
[00:36] <kim_register> that's maybe 100 or so
[00:36] <kim_register> and then see the new graphs
[00:36] <kim_register> this is kinda cool ;-)
[00:37] <kim_register> I'm willing to bet that the >200 editor set also
has trouble with NPOV
[00:37] <kim_register> as well as many other things
[00:46] <kim_register> Ok, so based on this alone
[00:46] <kim_register> normal wikipedia policy should be consensus
consensus consensus
[00:46] <kim_register> large pages can best be split and transcluded
[00:47] <kim_register> transcluded sections will survive AFD anyway, since
even those will have like 50 editors or so
[00:47] <kim_register> this way we can keep a consistent and coherent
policy across the entire encyclopedia namespace
[00:47] <kim_register> now we need to consistentize in wikipedia namespace
[00:47] <kim_register> this has low priority in theory
[00:47] <kim_register> but is going to be tricky in practice
[00:47] <kim_register> but like WOOT
[00:47] <kim_register> love those numbers!
[00:47] <kim_register> yippee!
[00:48] <kim_register> we don't need to change much at all, even though
we've grown so much!
[00:50] <kim_register> well I need to get some cooperative fire support
[00:50] <kim_register> you're already helping keep them down ;-)
[00:50] <kim_register> I'll have to recruit some more :-)
[00:50] <kim_register> I'll start one by one :-)
[01:18] <DavidGerard> kim_register: that's fantastic to read
[01:19] <DavidGerard> and squares with my experience also: articles are
either contentious or ghost towns
[01:19] <DavidGerard> with a few in between
[01:25] <kim_register> yes
[01:25] <kim_register> we're going to have so much fun fixing policy this
year :-)
[01:25] <kim_register> I'd love to have special dispensation from jimbo
though
[01:25] <kim_register> otherwise it's going to be hell fighting through
wikinomic
[01:26] <kim_register> alternately we can just start policy from scratch
[01:26] <kim_register> though that too might have some downsides...
[01:26] <kim_register> but basically, the graphs show that wikipedia
policy will still work
[01:26] <kim_register> and that it scales well
[01:26] <kim_register> our job is to make sure it STAYS that way ;-)
[01:29] <kim_register> DavidGerard, I have the base dataset in my mailbox
now. Do you want me to forward you a copy?
[01:31] <DavidGerard> nonono, i believe your graphs ;-) what I'm
interested in is your writeup
[01:31] <DavidGerard> kim_register: jimbo is just a little tired of
wikinomic
[01:32] <DavidGerard> the stupidity and shittiness of afd is now causing
wikipedia lots of real world problems
[01:32] <DavidGerard> i.e., the foundation is getting a lot of mail from
the aggrieved
[01:32] <DavidGerard> and then the fuckwits tried CFDing [[Category:Living
people]] or whatever it was called
[01:32] <DavidGerard> and jimbo killed the CFD
[01:32] <DavidGerard> and it was *recreated*
[01:32] <DavidGerard> THREE TIMES
[01:33] <DavidGerard> and jimbo saw what moronic shit was said on that CFD
[01:33] <DavidGerard> and thunder and lightning came down from the
heavens.
[01:33] <kim_register> hello mindspillage , greg is messing with lovely
pristine numbers. Maybe might be fun if you could help him out :-)
[01:33] <kim_register> DavidGerard, *ghrin*
[01:33] <kim_register> DavidGerard, what kinda writeup are you expecting
btw?
[01:33] <DavidGerard> AFD is about to discover that the rest of wikipedia
does in fact have a few things to say about the standard of behaviour
there.
[01:34] <kim_register> and I mean, there's a zillion bits of info :-)
[01:34] <kim_register> that we haven't explored yet
[01:34] <kim_register> but the graph with the single peak all the way left
..
[01:34] <kim_register> that does it for me
[01:34] <kim_register> DavidGerard, I'm going to propose killing all > 100
participant pages, one way or the other
[01:34] <DavidGerard> kim_register: hah!
[01:34] <kim_register> DavidGerard, still need data for wikimedia
namespace to be sure that's a great idea
[01:35] <DavidGerard> permanent semiprotect?
[01:35] <kim_register> no, I mean kill
[01:35] <kim_register> as in mark as deprecated
[01:35] <DavidGerard> hmm.
[01:35] <kim_register> use a different system
[01:35] <DavidGerard> :-O
[01:35] <DavidGerard> gosh!
[01:35] <DavidGerard> case my case please
[01:35] <DavidGerard> case by case i mean
[01:35] <DavidGerard> but may basically be a good idea
[01:36] <kim_register> Well, there are roughly 100-200 cases to consider
[01:36] <kim_register> guesstimating by current data
[01:36] <kim_register> actually probably much less
[01:36] <kim_register> articles as a general case can be split up and
transcluded
[01:37] <kim_register> leaving only wikipedia namespace stuff
[01:37] <kim_register> the latter hasn't been looked at
[01:37] <kim_register> not yet
[01:37] <kim_register> so these are prelim findings so far :-)
[01:37] <kim_register> wikipedia namespace will need some case by case
work on splitting, yes :-)
[01:38] <kim_register> for all the *FDs, we can replace the whole lot with
true wiki deletion I'll wager
[01:38] <kim_register> some people will hate it :-)
[01:38] <kim_register> mindspillage, we'd need to check stats on that :-)
[01:38] <kim_register> DavidGerard, but it would be too early to say
things about wikipedia namespace
[01:39] <kim_register> in any case, on articles, there should be no
problems even now
[01:39] <DavidGerard> well. gosh!
[01:40] <DavidGerard> i eagerly await your writeup ;-)
[01:42] <kim_register> LOL
[01:42] <kim_register> I HATE WRITEUPS
[01:43] <DavidGerard> cut'n'paste your comments here. or get greg to write
it up ;-)
[01:47] <DavidGerard> if you make preliminary notes and make your data
available (as available as you can), then that will be enough for fun
[01:47] <kim_register> grin
The data file: http://bruning.xs4all.nl/~kim/kim_query1.bz2
[02:10] <kim_register> page_id | distinct_editors | oldest_revision |
rev_count |
[02:10] <kim_register> 2004_distinct | 2004_rev_count | 2003_distinct |
2003_rev_count |
[02:10] <kim_register> 2002_distinct | 2002_rev_count
[02:11] <kim_register> (this is the key to the bz2 file)
More information about the WikiEN-l
mailing list