(Sorry for the cross posting, but there are several groups who might find this interesting.)
For years now, it has been common for people to claim that "all the good editors are jumping ship" or "we are losing our best people". Generally, this has not proven to be true: people come and go, to be sure, but as some people drift away, others have drifted in. Whether the rate of burnout is "too high" or "too low" or "just right" is quite hard to say.
However, it ought to be possible to at least quantify what that rate actually is, by using the Erik Zachte statistics or a modification of them.
I would be fascinated if we could figure out such statistics as
"For any given edit, what is the average length of service of the editor?" "For any given edit, what is the median length of service of the editor?" These could be measured by either time since first edit, or total number of edits or (perhaps best) some weighted average of the edit history.
It would be nice to track that number over time... are we becoming "younger" as a community, "older" as a community? Staying about the same? Are old-timers sticking around longer than they used to, or jumping ship faster?
There are also a whole set of related questions around newbies:
Are newbies more likely to stick around, or less likely to stick around, than they were a year ago. Some people feel we are being overrun by newbies, others feel that we are becoming a more closed and cliqueish community which does not welcome newbies.
I would measure this by saying "Of people who made at least 100 edits a month ago, how many of them made at least 100 edits this month". And similar stats for "at least 10 edits". (Merely looking at "new accounts" would not be right, because we had a huge spike in new account creation when it became necessary to have an account to create a new page.)
--Jimbo
I would expect the "length of service" to be fairly short, if anons are counted.
Mark
On 04/06/06, Jimmy Wales jwales@wikia.com wrote:
(Sorry for the cross posting, but there are several groups who might find this interesting.)
For years now, it has been common for people to claim that "all the good editors are jumping ship" or "we are losing our best people". Generally, this has not proven to be true: people come and go, to be sure, but as some people drift away, others have drifted in. Whether the rate of burnout is "too high" or "too low" or "just right" is quite hard to say.
However, it ought to be possible to at least quantify what that rate actually is, by using the Erik Zachte statistics or a modification of them.
I would be fascinated if we could figure out such statistics as
"For any given edit, what is the average length of service of the editor?" "For any given edit, what is the median length of service of the editor?" These could be measured by either time since first edit, or total number of edits or (perhaps best) some weighted average of the edit history.
It would be nice to track that number over time... are we becoming "younger" as a community, "older" as a community? Staying about the same? Are old-timers sticking around longer than they used to, or jumping ship faster?
There are also a whole set of related questions around newbies:
Are newbies more likely to stick around, or less likely to stick around, than they were a year ago. Some people feel we are being overrun by newbies, others feel that we are becoming a more closed and cliqueish community which does not welcome newbies.
I would measure this by saying "Of people who made at least 100 edits a month ago, how many of them made at least 100 edits this month". And similar stats for "at least 10 edits". (Merely looking at "new accounts" would not be right, because we had a huge spike in new account creation when it became necessary to have an account to create a new page.)
--Jimbo
-- ####################################################################### # Office: 1-727-231-0101 | Free Culture and Free Knowledge # # http://www.wikipedia.org | Building a free world # #######################################################################
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Mark Williamson wrote:
I would expect the "length of service" to be fairly short, if anons are counted.
Yes, that would be interesting to know, but what I am thinking of is more like:
When a person becomes "active" (by some definition), how long do they tend to remain "active" (by that same definition).
I would say looking at this in a variety of ways would be useful. Anons pose special problems in interpreting the data, of course, and so probably we won't get much useful information out of those. But since we know that almost all people who are really part of the community get a user account, I suspect that looking at the activity patterns for user accounts would be the most interesting.
On 6/5/06, Jimmy Wales jwales@wikia.com wrote:
Yes, that would be interesting to know, but what I am thinking of is more like:
When a person becomes "active" (by some definition), how long do they tend to remain "active" (by that same definition).
I would say looking at this in a variety of ways would be useful. Anons pose special problems in interpreting the data, of course, and so probably we won't get much useful information out of those. But since we know that almost all people who are really part of the community get a user account, I suspect that looking at the activity patterns for user accounts would be the most interesting.
I thought of a simple mood metric: the percentage of wikipedians (active for at least two months) who made more edits this month than the previous month. It would be affected by seasons but it might be interesting.
Steve
Steve Bennett wrote:
I thought of a simple mood metric: the percentage of wikipedians
(active for at least two months) who made more edits this month than the previous month. It would be affected by seasons but it might be interesting.
Patterns by day of the week would also be interesting. My entirely subjective observation is that there are fewer people around on a Saturday night.
Ec
Do we have a database dump pre-formatted for this kind of statistics? That is, simply date, page, user and edit summary for every edit, without the actual diffs or current page content.
Steve
On 6/4/06, Jimmy Wales jwales@wikia.com wrote:
(Sorry for the cross posting, but there are several groups who might find this interesting.)
For years now, it has been common for people to claim that "all the good editors are jumping ship" or "we are losing our best people". Generally, this has not proven to be true: people come and go, to be sure, but as some people drift away, others have drifted in. Whether the rate of burnout is "too high" or "too low" or "just right" is quite hard to say.
However, it ought to be possible to at least quantify what that rate actually is, by using the Erik Zachte statistics or a modification of them.
I would be fascinated if we could figure out such statistics as
"For any given edit, what is the average length of service of the editor?" "For any given edit, what is the median length of service of the editor?" These could be measured by either time since first edit, or total number of edits or (perhaps best) some weighted average of the edit history.
It would be nice to track that number over time... are we becoming "younger" as a community, "older" as a community? Staying about the same? Are old-timers sticking around longer than they used to, or jumping ship faster?
There are also a whole set of related questions around newbies:
Are newbies more likely to stick around, or less likely to stick around, than they were a year ago. Some people feel we are being overrun by newbies, others feel that we are becoming a more closed and cliqueish community which does not welcome newbies.
I would measure this by saying "Of people who made at least 100 edits a month ago, how many of them made at least 100 edits this month". And similar stats for "at least 10 edits". (Merely looking at "new accounts" would not be right, because we had a huge spike in new account creation when it became necessary to have an account to create a new page.)
--Jimbo
-- ####################################################################### # Office: 1-727-231-0101 | Free Culture and Free Knowledge # # http://www.wikipedia.org | Building a free world # #######################################################################
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
"Steve Bennett" wrote:
Do we have a database dump pre-formatted for this kind of statistics? That is, simply date, page, user and edit summary for every edit, without the actual diffs or current page content.
Steve
I'd support such kind of dump, the history but without the text of the version.
Platonides wrote:
"Steve Bennett" wrote:
Do we have a database dump pre-formatted for this kind of statistics? That is, simply date, page, user and edit summary for every edit, without the actual diffs or current page content.
I'd support such kind of dump, the history but without the text of the version.
This is actually generated as a temporary file during the dump process, but they're not currently kept. If you guys want, I can have it keep them instead perhaps. The next one won't be for a few days, though.
-- brion vibber (brion @ pobox.com)
Brion Vibber brion@pobox.com writes:
Platonides wrote:
"Steve Bennett" wrote:
Do we have a database dump pre-formatted for this kind of statistics? That is, simply date, page, user and edit summary for every edit, without the actual diffs or current page content.
I'd support such kind of dump, the history but without the text of the version.
This is actually generated as a temporary file during the dump process, but they're not currently kept. If you guys want, I can have it keep them instead perhaps. The next one won't be for a few days, though.
I run some offline statistics for dawiki, and I could use such a dump. If article size could be included, it would be a nice bonus, but even without that, it would be a help.
Anders Wegge Jakobsen wrote:
I run some offline statistics for dawiki, and I could use such a dump. If article size could be included, it would be a nice bonus, but even without that, it would be a help.
The page table is available as an SQL dump and has the page_len field for current revisions. We don't have lengths stored for older revisions, though, sogetting the length means extracting the text (slow).
-- brion vibber (brion @ pobox.com)
Jimmy Wales wrote:
I would be fascinated if we could figure out such statistics as
"For any given edit, what is the average length of service of the editor?" "For any given edit, what is the median length of service of the editor?" These could be measured by either time since first edit, or total number of edits or (perhaps best) some weighted average of the edit history.
Turns out we can.
http://meta.wikimedia.org/wiki/Image:Days_since_first_edit.png
Average was easiest, so I did that as a first try. It shows an initial upward trend, stabilising in early 2005 at about 300 days.
How's that?
-- Tim Starling
Moin,
On Monday 05 June 2006 09:54, Tim Starling wrote:
Jimmy Wales wrote:
I would be fascinated if we could figure out such statistics as
"For any given edit, what is the average length of service of the editor?" "For any given edit, what is the median length of service of the editor?" These could be measured by either time since first edit, or total number of edits or (perhaps best) some weighted average of the edit history.
Turns out we can.
http://meta.wikimedia.org/wiki/Image:Days_since_first_edit.png
Average was easiest, so I did that as a first try. It shows an initial upward trend, stabilising in early 2005 at about 300 days. How's that?
Cool :)
Does that mean after about 300 days after account creation, people give up at editing?
best wishes,
tels
Tels wrote:
Moin,
On Monday 05 June 2006 09:54, Tim Starling wrote:
Jimmy Wales wrote:
I would be fascinated if we could figure out such statistics as
"For any given edit, what is the average length of service of the editor?" "For any given edit, what is the median length of service of the editor?" These could be measured by either time since first edit, or total number of edits or (perhaps best) some weighted average of the edit history.
Turns out we can.
http://meta.wikimedia.org/wiki/Image:Days_since_first_edit.png
Average was easiest, so I did that as a first try. It shows an initial upward trend, stabilising in early 2005 at about 300 days. How's that?
Cool :)
Does that mean after about 300 days after account creation, people give up at editing?
No, the average time before people abandon their accounts would be much shorter. Many accounts have never edited at all, I imagine there are also many accounts with only 1 or 2 edits. An average over accounts would be skewed towards this end. This is an average over edits, so it is skewed towards very active editors.
One interpretation would be to assume that most edits are performed by a core group of very active editors. Then you could say that the average age of a very active editor is about 300 days. You have to phrase it carefully, because there are two factors which limit this figure: attrition and growth. Editors getting tired and leaving will cause it to be reduced, as will an influx of new editors.
What would be really nice is if we could get a handle on these two factors separately.
-- Tim Starling
Tim Starling wrote:
Tels wrote:
Moin,
On Monday 05 June 2006 09:54, Tim Starling wrote:
Jimmy Wales wrote:
I would be fascinated if we could figure out such statistics as
"For any given edit, what is the average length of service of the editor?" "For any given edit, what is the median length of service of the editor?" These could be measured by either time since first edit, or total number of edits or (perhaps best) some weighted average of the edit history.
Turns out we can.
http://meta.wikimedia.org/wiki/Image:Days_since_first_edit.png
Average was easiest, so I did that as a first try. It shows an initial upward trend, stabilising in early 2005 at about 300 days. How's that?
Cool :)
Does that mean after about 300 days after account creation, people give up at editing?
No, the average time before people abandon their accounts would be much shorter. Many accounts have never edited at all, I imagine there are also many accounts with only 1 or 2 edits. An average over accounts would be skewed towards this end. This is an average over edits, so it is skewed towards very active editors.
One interpretation would be to assume that most edits are performed by a core group of very active editors. Then you could say that the average age of a very active editor is about 300 days. You have to phrase it carefully, because there are two factors which limit this figure: attrition and growth. Editors getting tired and leaving will cause it to be reduced, as will an influx of new editors.
What could also be interesting is the edit patterns which would graph the number of edits on each day of their editing life. How gradually does it tail off?
Also, to what extent do people effectively stop editing on en:wikipedia but still remain active on other projects?
Ec
On 6/5/06, Tim Starling t.starling@physics.unimelb.edu.au wrote:
Turns out we can.
http://meta.wikimedia.org/wiki/Image:Days_since_first_edit.png
Average was easiest, so I did that as a first try. It shows an initial upward trend, stabilising in early 2005 at about 300 days.
How's that?
But what does it all mean?? Are people staying with the project longer, or are newbies not sticking around?
A couple of possibilities: * If everything remains constant - no newbies, just the same oldbies editing at the same rate, it goes up. * If people edit for a year then never edit again, then it presumably remains flat, at around 150 days. *It would take a seriously strange set of events for that graph to come down (mass exodus of oldbies, and influx of newbies).
Bear in mind that the first year or so of data is a bit suspicious, because it wasn't *possible* for any 2 or 3 year oldbies to be editing. So I don't really see a lot of evolution in this data at all, most closely matching my second hypothesis above - people arive, stick around for (in this case, 2 years [1]), then piss off.
Steve [1] I came up with 2 years by presuming that if the average edit was by a 1 year old, then editors have a lifespan of 2 years before no longer editing. However, I'm not at all sure of my logic, and I can't explain it. :)
Tim Starling wrote:
Jimmy Wales wrote:
I would be fascinated if we could figure out such statistics as
"For any given edit, what is the average length of service of the editor?" "For any given edit, what is the median length of service of the editor?" These could be measured by either time since first edit, or total number of edits or (perhaps best) some weighted average of the edit history.
Turns out we can.
http://meta.wikimedia.org/wiki/Image:Days_since_first_edit.png
Average was easiest, so I did that as a first try. It shows an initial upward trend, stabilising in early 2005 at about 300 days.
How's that?
That's pretty cool. :)
--Jimbo
It'd be nice to have something like this outside of wikimedia though...
On Jun 5, 2006, at 5:51 AM, Jimmy Wales wrote:
Tim Starling wrote:
Jimmy Wales wrote:
I would be fascinated if we could figure out such statistics as
"For any given edit, what is the average length of service of the editor?" "For any given edit, what is the median length of service of the editor?" These could be measured by either time since first edit, or total number of edits or (perhaps best) some weighted average of the edit history.
Turns out we can.
http://meta.wikimedia.org/wiki/Image:Days_since_first_edit.png
Average was easiest, so I did that as a first try. It shows an initial upward trend, stabilising in early 2005 at about 300 days.
How's that?
That's pretty cool. :)
--Jimbo _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
I wrote:
Average was easiest, so I did that as a first try. It shows an initial upward trend, stabilising in early 2005 at about 300 days.
Here's a histogram series for the same data in the form of an animated GIF. It shows the distribution of editor age, averaged over a whole year, for years 2002-2005.
http://meta.wikimedia.org/wiki/Image:Days_since_first_edit_histogram.gif
-- Tim Starling
Thanks for posting the data, I replotted it as a set of four superimposed lines and here's my interpretation:
*Overall not much has changed between 2002 and now. Obviously now we have a "longer tail" but other than that the curves are very similar *2003 was a very un-newbie year. It had the sharpest dip in newbie contributions and a strong middle section (300-700 days) *2004 was more newbieish than 2005 - more contributions < 300 days, less < 700 days, and about the same afterwards.
In fact, breaking age ranges up from 0-300, 300-700 and 700+ (just because that's what the data shows), the years rank as follows (most contributions to least contributions): 0-300: 2002, 2004, 2005, 2003 300-700: 2003, 2005, 2004, 2002 700+: 2004, 2005, 2003, 2002
But I'm not really finding any major conclusions leaping out. It would be silly to talk of "drop off in newbie participation" because all these figures are proportional, not absolute.
Just for interest, here are the percentage contributions for 2005 divided into blocks of 100 days. Read the column on the left as "People with up to X days"
100 27.43 200 18.40 300 15.01 400 12.50 500 8.76 600 6.16 700 4.09 800 3.69 900 2.31 1000 1.64
I think someone else pointed out the medians, but just to double check, they are: 2002: 150 2003: 240 2004: 180 2005: 220
I calculated these semi-manually and it's quite possible my method was wonky. Basically, I tried to find the figure where half the occurrences were from younger editors, and half were older editors. No guarantees.
Steve
On 6/5/06, Tim Starling t.starling@physics.unimelb.edu.au wrote:
I wrote:
Average was easiest, so I did that as a first try. It shows an initial upward trend, stabilising in early 2005 at about 300 days.
Here's a histogram series for the same data in the form of an animated GIF. It shows the distribution of editor age, averaged over a whole year, for years 2002-2005.
http://meta.wikimedia.org/wiki/Image:Days_since_first_edit_histogram.gif
-- Tim Starling
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org