David Gerard (fun@thingy.apana.org.au) [060123 10:52]:
Gmaxwell has left the project, which is annoying for many reasons, and this reason is because the scripts and such he was running are on the toolserver. But here's two diagrams to start you off:
http://en.wikipedia.org/wiki/Image:Articles_distinct_histo.png
That's how many editors per article. Note that ALMOST ALL articles have one editor, maybe two. And only 200 or so have over 100 editors (123 over 1000 editors), and almost all those are way overedited for the wiki process (e.g. G.W. Bush, which is THE busiest article on the wiki and arguably pathological).
http://en.wikipedia.org/wiki/Image:Articles_distinct.png
- same thing as above in CDF. Breakdown of editors per article.
I just asked Kim's permission to quote from him on #wikimedia last night. At the end is the URL with the original data, bz2 compressed (16MB file, expands to about 70-odd MB).
I've deleted the words of everyone except Kim and myself. Also, Kim wants to note that:
[23:58] <kim_register> you can take my irc discussion too if you like [23:59] <kim_register> though note that I'm a lot less collected on irc [23:59] <kim_register> so with disclaimer [23:59] <kim_register> and I reserve the right to not mean half of what I said ;-)
Note also that all credit for the heavy lifting on this one goes to gmaxwell.
- d.
[23:41] <kim_register> I have something cool to show him that gmaxwell drew ^^;; [00:03] <kim_register> http://en.wikipedia.org/wiki/Image:Articles_distinct_histo.png [00:03] <kim_register> BINGO! [00:03] <kim_register> most articles are only edited by a couple of users [00:03] <kim_register> do you see, do you SEE how far left that single peak is? [00:03] <kim_register> it's almost invisible [00:03] <kim_register> OMG this is cool! [00:03] * kim_register raves [00:04] <kim_register> death to all the wikidemocracy folks [00:04] <kim_register> long live consensus! [00:04] <kim_register> (well, maybe not death to them, but...) [...] [00:05] <kim_register> http://en.wikipedia.org/wiki/Image:Articles_distinct.png [00:05] <kim_register> heres a [[CDF]] view [00:06] <kim_register> GOAAAL [00:06] <kim_register> I mean wooohooo [00:06] <kim_register> sorry [00:06] <kim_register> :-) [00:06] <kim_register> I'm so happy :-) [00:07] <kim_register> thank Gmaxwell too :-) [00:19] <kim_register> http://en.wikipedia.org/wiki/Image:Articles_distinct_loghisto.png [00:19] <kim_register> here's a scaled one [00:19] <kim_register> it's LOG SCALED, so be DARN careful interpreting it, if you're not used to log scaling [00:20] <kim_register> I'm noting how many editors edit each article [00:20] <kim_register> This basically shows that most articles have few editors [00:21] <kim_register> brian0918, I'm using them to show that consensus will still work [00:21] <kim_register> and the whole voting thing is silly [00:21] <kim_register> so we don't need to make drastic policy changes "because the wiki is too big now" [00:21] <kim_register> all that is pure nonsense, is what these graphs prove [00:21] <kim_register> the wiki segments conversations into small managable areas that can be handled by consensus [00:22] <kim_register> brian0918, well, we'll do some more graphs [00:22] <kim_register> but these are pretty darn indicative already [00:22] <kim_register> I don't expect to see drastically different results elsewhere [00:22] <kim_register> if it's this clear here [00:22] <kim_register> it's not going to differ by orders of magnitude elsewhere :-) [00:25] <kim_register> oh gosh [00:25] <kim_register> anyway, I love these graphs [00:27] <kim_register> http://en.wikipedia.org/wiki/Image:Articles_distinct_logcdf.png [00:27] <kim_register> tricky to read one [00:27] <kim_register> (01:27:05) nullc: well there really is only one thing to say.. [00:27] <kim_register> (01:27:12) Kim Bruning: yeah [00:27] <kim_register> (01:27:17) nullc: that articles have few editors. [00:27] <kim_register> (01:27:21) Kim Bruning: *grin* [00:32] <kim_register> * 67.20% have been edited by less than 10 distinct Users/IPs. [00:32] <kim_register> * 86.07% have been edited by less than 20 distinct Users/IPs. [00:32] <kim_register> * 91.90% have been edited by less than 30 distinct Users/IPs. [00:32] <kim_register> * 99.21% have been edited by less than 150 distinct Users/IPs. [00:33] <kim_register> 01:33:12) nullc: 72% of the articles have 11 editors or less. [00:36] <kim_register> so we'll throw out all articles with > 200 editors [00:36] <kim_register> that's maybe 100 or so [00:36] <kim_register> and then see the new graphs [00:36] <kim_register> this is kinda cool ;-) [00:37] <kim_register> I'm willing to bet that the >200 editor set also has trouble with NPOV [00:37] <kim_register> as well as many other things [00:46] <kim_register> Ok, so based on this alone [00:46] <kim_register> normal wikipedia policy should be consensus consensus consensus [00:46] <kim_register> large pages can best be split and transcluded [00:47] <kim_register> transcluded sections will survive AFD anyway, since even those will have like 50 editors or so [00:47] <kim_register> this way we can keep a consistent and coherent policy across the entire encyclopedia namespace [00:47] <kim_register> now we need to consistentize in wikipedia namespace [00:47] <kim_register> this has low priority in theory [00:47] <kim_register> but is going to be tricky in practice [00:47] <kim_register> but like WOOT [00:47] <kim_register> love those numbers! [00:47] <kim_register> yippee! [00:48] <kim_register> we don't need to change much at all, even though we've grown so much! [00:50] <kim_register> well I need to get some cooperative fire support [00:50] <kim_register> you're already helping keep them down ;-) [00:50] <kim_register> I'll have to recruit some more :-) [00:50] <kim_register> I'll start one by one :-) [01:18] <DavidGerard> kim_register: that's fantastic to read [01:19] <DavidGerard> and squares with my experience also: articles are either contentious or ghost towns [01:19] <DavidGerard> with a few in between [01:25] <kim_register> yes [01:25] <kim_register> we're going to have so much fun fixing policy this year :-) [01:25] <kim_register> I'd love to have special dispensation from jimbo though [01:25] <kim_register> otherwise it's going to be hell fighting through wikinomic [01:26] <kim_register> alternately we can just start policy from scratch [01:26] <kim_register> though that too might have some downsides... [01:26] <kim_register> but basically, the graphs show that wikipedia policy will still work [01:26] <kim_register> and that it scales well [01:26] <kim_register> our job is to make sure it STAYS that way ;-) [01:29] <kim_register> DavidGerard, I have the base dataset in my mailbox now. Do you want me to forward you a copy? [01:31] <DavidGerard> nonono, i believe your graphs ;-) what I'm interested in is your writeup [01:31] <DavidGerard> kim_register: jimbo is just a little tired of wikinomic [01:32] <DavidGerard> the stupidity and shittiness of afd is now causing wikipedia lots of real world problems [01:32] <DavidGerard> i.e., the foundation is getting a lot of mail from the aggrieved [01:32] <DavidGerard> and then the fuckwits tried CFDing [[Category:Living people]] or whatever it was called [01:32] <DavidGerard> and jimbo killed the CFD [01:32] <DavidGerard> and it was *recreated* [01:32] <DavidGerard> THREE TIMES [01:33] <DavidGerard> and jimbo saw what moronic shit was said on that CFD [01:33] <DavidGerard> and thunder and lightning came down from the heavens. [01:33] <kim_register> hello mindspillage , greg is messing with lovely pristine numbers. Maybe might be fun if you could help him out :-) [01:33] <kim_register> DavidGerard, *ghrin* [01:33] <kim_register> DavidGerard, what kinda writeup are you expecting btw? [01:33] <DavidGerard> AFD is about to discover that the rest of wikipedia does in fact have a few things to say about the standard of behaviour there. [01:34] <kim_register> and I mean, there's a zillion bits of info :-) [01:34] <kim_register> that we haven't explored yet [01:34] <kim_register> but the graph with the single peak all the way left .. [01:34] <kim_register> that does it for me [01:34] <kim_register> DavidGerard, I'm going to propose killing all > 100 participant pages, one way or the other [01:34] <DavidGerard> kim_register: hah! [01:34] <kim_register> DavidGerard, still need data for wikimedia namespace to be sure that's a great idea [01:35] <DavidGerard> permanent semiprotect? [01:35] <kim_register> no, I mean kill [01:35] <kim_register> as in mark as deprecated [01:35] <DavidGerard> hmm. [01:35] <kim_register> use a different system [01:35] <DavidGerard> :-O [01:35] <DavidGerard> gosh! [01:35] <DavidGerard> case my case please [01:35] <DavidGerard> case by case i mean [01:35] <DavidGerard> but may basically be a good idea [01:36] <kim_register> Well, there are roughly 100-200 cases to consider [01:36] <kim_register> guesstimating by current data [01:36] <kim_register> actually probably much less [01:36] <kim_register> articles as a general case can be split up and transcluded [01:37] <kim_register> leaving only wikipedia namespace stuff [01:37] <kim_register> the latter hasn't been looked at [01:37] <kim_register> not yet [01:37] <kim_register> so these are prelim findings so far :-) [01:37] <kim_register> wikipedia namespace will need some case by case work on splitting, yes :-) [01:38] <kim_register> for all the *FDs, we can replace the whole lot with true wiki deletion I'll wager [01:38] <kim_register> some people will hate it :-) [01:38] <kim_register> mindspillage, we'd need to check stats on that :-) [01:38] <kim_register> DavidGerard, but it would be too early to say things about wikipedia namespace [01:39] <kim_register> in any case, on articles, there should be no problems even now [01:39] <DavidGerard> well. gosh! [01:40] <DavidGerard> i eagerly await your writeup ;-) [01:42] <kim_register> LOL [01:42] <kim_register> I HATE WRITEUPS [01:43] <DavidGerard> cut'n'paste your comments here. or get greg to write it up ;-) [01:47] <DavidGerard> if you make preliminary notes and make your data available (as available as you can), then that will be enough for fun [01:47] <kim_register> grin
The data file: http://bruning.xs4all.nl/~kim/kim_query1.bz2
[02:10] <kim_register> page_id | distinct_editors | oldest_revision | rev_count | [02:10] <kim_register> 2004_distinct | 2004_rev_count | 2003_distinct | 2003_rev_count | [02:10] <kim_register> 2002_distinct | 2002_rev_count [02:11] <kim_register> (this is the key to the bz2 file)