We have a new article in The Atlantic,
http://www.theatlantic.com/technology/archive/2012/10/surmounting-the-insurm...
(which btw I found following Dario's twitter, @ReaderMeter, which I recommend)
and this is still the same story of whether we achieved the limit of what can be written etc). Without going into details of this animated debate (I have smth to say, for instance, I just created two articles which have about a hundred red links, and the material to fill in these red links is available, but this will lead us away from the topic), I am curious, if anybody ever tried to estimate what is the possible number of notable topics for articles. On the short time scale, it should grow linearly with time, since we have new sports events, elections, TW shows, movies, books etc, and many persons who previously not been notable become notable. Thus, this number must be
N = a + b (t-2012),
where a is the number of topics notable now, t is the time in years, and b is the number of new topics which become notable every year.
Was there any research on what order of magnitude a and b have? I guess b must be in the order of dozens of thousands, since we are talking about people. What is a? Is it dominated by the number of species of insects, or cosmic bodies, or what?
I tried to ask this question several years ago in Russian Wikipedia, but there was no concluding answer.
Cheers Yaroslav
hi,
hmm...
Was there any research on what order of magnitude a and b have? I guess b
must be in the order of dozens of thousands, since we are talking about people. What is a? Is it dominated by the number of species of insects, or cosmic bodies, or what?
but what would be the unit of measurement? Also, per analogiam: new cameras' resolution is improving from year to year. When exactly should it stop? There's no easy answer, because all depends on how much you think you should be able to magnify a picture without pixelization.
I'd say that for all practical purposes Wikipedia will be saturated when the vast majority of searches is covered, and users find abundance of information for whatever topic they research, and this information is given to them at the exact level of sophistication they're ready to comprehend. We're waaay far from this ideal.
best,
dariusz
On Sun, 28 Oct 2012 11:08:29 +0100, Dariusz Jemielniak wrote:
hi,
hmm...
Was there any research on what order of magnitude a and b have? I guess b must be in the order of dozens of thousands, since we are talking about people. What is a? Is it dominated by the number of species of insects, or cosmic bodies, or what?
but what would be the unit of measurement? Also, per analogiam: new cameras resolution is improving from year to year. When exactly should it stop? Theres no easy answer, because all depends on how much you think you should be able to magnify a picture without pixelization.
Id say that for all practical purposes Wikipedia will be saturated when the vast majority of searches is covered, and users find abundance of information for whatever topic they research, and this information is given to them at the exact level of sophistication theyre ready to comprehend. Were waaay far from this ideal.
best,
dariusz
Hi Dariusz,
I do not understand your question. In my formula, a is measured in articled, and b is measured in articles per year, as detailed.
I believe there are two different issues. The first is what is the maximum possible number of articles (this is what I asked). For all practical purposes (manpower we have, time until Wikipedia will collapce and cease to exist, etc) we will only able to write a tiny part of them. This is why media are discussing questions like whether English Wikipedia will ever reach 5M articles. I think this is a much more complex issue which has to do with the editor retention dynamics and general lifetime of internet companies.
Cheers Yaroslav
I believe there are two different issues. The first is what is the maximum possible number of articles (this is what I asked). For all practical purposes (manpower we have, time until Wikipedia will collapce and cease to exist, etc) we will only able to write a tiny part of them. This is why media are discussing questions like whether English Wikipedia will ever reach 5M articles. I think this is a much more complex issue which has to do with the editor retention dynamics and general lifetime of internet companies.
Yup, my bad, misread - sorry :)
dj
On Sun, Oct 28, 2012 at 9:25 PM, Yaroslav M. Blanter putevod@mccme.ruwrote:
I believe there are two different issues. The first is what is the maximum possible number of articles (this is what I asked). For all practical purposes (manpower we have, time until Wikipedia will collapce and cease to exist, etc) we will only able to write a tiny part of them. This is why media are discussing questions like whether English Wikipedia will ever reach 5M articles. I think this is a much more complex issue which has to do with the editor retention dynamics and general lifetime of internet companies.
There is a lot of content missing. The maximum could actually be quite great. There is a fair amount of material just not adequately created to begin with. It isn't just new notable topics in terms of politicians, sport competitors, sports team seasons, hurricanes, elections, etc. that can grow. There are a huge fountain of articles not created about these in pre-existing literature. Beyond that, valid spin-off articles do not yet exist for many topics. (Within my own framework, there are few articles on women's sports in a country, and specific women's sports in a country.) [[Sport in Kiribati]] does not exist, nor does [[Women's sport in Kiribati]]. And this goes down... With the way English Wikipedia is structured, you could have an endless variety of these as topics get more and more filled in.
On Sun, 28 Oct 2012 21:58:10 +1100, Laura Hale wrote:
On Sun, Oct 28, 2012 at 9:25 PM, Yaroslav M. Blanter wrote:
I believe there are two different issues. The first is what is the maximum possible number of articles (this is what I asked). For all practical purposes (manpower we have, time until Wikipedia will collapce and cease to exist, etc) we will only able to write a tiny part of them. This is why media are discussing questions like whether English Wikipedia will ever reach 5M articles. I think this is a much more complex issue which has to do with the editor retention dynamics and general lifetime of internet companies.
There is a lot of content missing. The maximum could actually be quite great. There is a fair amount of material just not adequately created to begin with. It isnt just new notable topics in terms of politicians, sport competitors, sports team seasons, hurricanes, elections, etc. that can grow. There are a huge fountain of articles not created about these in pre-existing literature. Beyond that, valid spin-off articles do not yet exist for many topics. (Within my own framework, there are few articles on womens sports in a country, and specific womens sports in a country.) [[Sport in Kiribati]] does not exist, nor does [[Womens sport in Kiribati]]. And this goes down... With the way English Wikipedia is structured, you could have an endless variety of these as topics get more and more filled in.
Absolutely. As I mentioned, just today I created an article which contains about 50 redlinks, and these redlinked articles are clearly notable. The problem is that I currently seem to be the only editor on English Wikipedia qualified to write these articles, and I am more busy with other things, currently not available as well. I will probably not be able to accomplish even what I am doing not until Wikipedia ceases to exist or until I die, whatever comes earlier.
Cheers Yaroslav
Considering a, you have this fine study by Emijrp : http://en.wikipedia.org/wiki/User:Emijrp/All_human_knowledge
Apparently a would be roughly around 120 000 000.
As media coverage and scientific research become most efficient every year, I suspect that b is contantly growing, and follow a geometric progression. Yet I don't know how to figure that out in concrete: perhaps something like 100 000 * 1,05^n.
100 000 being a low estimation regarding the current growth of knowledge (new biological species, new people, new political issues, new scientific concepts and discoveries…).
PCL
We have a new article in The Atlantic,
http://www.theatlantic.com/technology/archive/2012/10/surmounting-the-insurm...
(which btw I found following Dario's twitter, @ReaderMeter, which I recommend)
and this is still the same story of whether we achieved the limit of what can be written etc). Without going into details of this animated debate (I have smth to say, for instance, I just created two articles which have about a hundred red links, and the material to fill in these red links is available, but this will lead us away from the topic), I am curious, if anybody ever tried to estimate what is the possible number of notable topics for articles. On the short time scale, it should grow linearly with time, since we have new sports events, elections, TW shows, movies, books etc, and many persons who previously not been notable become notable. Thus, this number must be
N = a + b (t-2012),
where a is the number of topics notable now, t is the time in years, and b is the number of new topics which become notable every year.
Was there any research on what order of magnitude a and b have? I guess b must be in the order of dozens of thousands, since we are talking about people. What is a? Is it dominated by the number of species of insects, or cosmic bodies, or what?
I tried to ask this question several years ago in Russian Wikipedia, but there was no concluding answer.
Cheers Yaroslav
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Sun, 28 Oct 2012 11:20:59 +0100, Pierre-Carl Langlais wrote:
Considering a, you have this fine study by Emijrp : http://en.wikipedia.org/wiki/User:Emijrp/All_human_knowledge
Apparently a would be roughly around 120 000 000.
As media coverage and scientific research become most efficient every year, I suspect that b is contantly growing, and follow a geometric progression. Yet I don't know how to figure that out in concrete: perhaps something like 100 000 * 1,05^n.
100 000 being a low estimation regarding the current growth of knowledge (new biological species, new people, new political issues, new scientific concepts and discoveries…).
PCL
Thanks, sounds reasonable.
Cheers Yaroslav
Re: the article. It seems to be one of a number of opinion pieces that uses the War of 1812 as its primary example. It must be some new scientific method: proof by War of 1812 :-)
But more seriously, I think the potential for new articles in Wikipedia is limited only by the definition of notability, for which the primary requirement is some good quality sources. So the more that is written, the more there is to write about in Wikipedia. Even if we restricted ourselves to new articles on topics notable prior to 2013 (say), we would still have enormous growth potential.
Generally Wikipedia has better coverage of contemporary topics than historical because the WWW provides easy access to more sources for topics of contemporary notability than for historic notability. But if every single episode of Seinfeld is notable (as it must be as each has a WP article!), then surely every book/song/poem/artwork that has ever been reviewed is notable too. and based on the apparent notability of current sports people and the results of what seems like every football season, tennis tournament, atheletics meet, etc, then surely history has plenty of equally notable articles on similar topics. Jousting tournaments in 1517 in Avignon, etc. What about race horses? A lot has been written on their pedigree, form and prospects for centuries. Lots of growth potential there too.
History has a wealth of new articles for Wikipedia of at least the same notability as current subjects. Whether anyone wants to write them or anyone want to read them, only time will tell. Notability doesn't necessarily make something interesting to a modern reader. But there is a massive "long tail" of historically notable topics that could be written about.
Sent from my iPad
On 28/10/2012, at 8:55 PM, "Yaroslav M. Blanter" putevod@mccme.ru wrote:
We have a new article in The Atlantic,
http://www.theatlantic.com/technology/archive/2012/10/surmounting-the-insurm...
(which btw I found following Dario's twitter, @ReaderMeter, which I recommend)
and this is still the same story of whether we achieved the limit of what can be written etc). Without going into details of this animated debate (I have smth to say, for instance, I just created two articles which have about a hundred red links, and the material to fill in these red links is available, but this will lead us away from the topic), I am curious, if anybody ever tried to estimate what is the possible number of notable topics for articles. On the short time scale, it should grow linearly with time, since we have new sports events, elections, TW shows, movies, books etc, and many persons who previously not been notable become notable. Thus, this number must be
N = a + b (t-2012),
where a is the number of topics notable now, t is the time in years, and b is the number of new topics which become notable every year.
Was there any research on what order of magnitude a and b have? I guess b must be in the order of dozens of thousands, since we are talking about people. What is a? Is it dominated by the number of species of insects, or cosmic bodies, or what?
I tried to ask this question several years ago in Russian Wikipedia, but there was no concluding answer.
Cheers Yaroslav
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I'm with Kerry.
By the way, 120 million notable articles are possible, but the estimate is far to be complete, so the real figure is greater for sure. I love these discussions.
2012/10/28 Kerry Raymond kerry.raymond@gmail.com
Re: the article. It seems to be one of a number of opinion pieces that uses the War of 1812 as its primary example. It must be some new scientific method: proof by War of 1812 :-)
But more seriously, I think the potential for new articles in Wikipedia is limited only by the definition of notability, for which the primary requirement is some good quality sources. So the more that is written, the more there is to write about in Wikipedia. Even if we restricted ourselves to new articles on topics notable prior to 2013 (say), we would still have enormous growth potential.
Generally Wikipedia has better coverage of contemporary topics than historical because the WWW provides easy access to more sources for topics of contemporary notability than for historic notability. But if every single episode of Seinfeld is notable (as it must be as each has a WP article!), then surely every book/song/poem/artwork that has ever been reviewed is notable too. and based on the apparent notability of current sports people and the results of what seems like every football season, tennis tournament, atheletics meet, etc, then surely history has plenty of equally notable articles on similar topics. Jousting tournaments in 1517 in Avignon, etc. What about race horses? A lot has been written on their pedigree, form and prospects for centuries. Lots of growth potential there too.
History has a wealth of new articles for Wikipedia of at least the same notability as current subjects. Whether anyone wants to write them or anyone want to read them, only time will tell. Notability doesn't necessarily make something interesting to a modern reader. But there is a massive "long tail" of historically notable topics that could be written about.
Sent from my iPad
On 28/10/2012, at 8:55 PM, "Yaroslav M. Blanter" putevod@mccme.ru wrote:
We have a new article in The Atlantic,
http://www.theatlantic.com/technology/archive/2012/10/surmounting-the-insurm...
(which btw I found following Dario's twitter, @ReaderMeter, which I
recommend)
and this is still the same story of whether we achieved the limit of
what can be written etc). Without going into details of this animated debate (I have smth to say, for instance, I just created two articles which have about a hundred red links, and the material to fill in these red links is available, but this will lead us away from the topic), I am curious, if anybody ever tried to estimate what is the possible number of notable topics for articles. On the short time scale, it should grow linearly with time, since we have new sports events, elections, TW shows, movies, books etc, and many persons who previously not been notable become notable. Thus, this number must be
N = a + b (t-2012),
where a is the number of topics notable now, t is the time in years, and
b is the number of new topics which become notable every year.
Was there any research on what order of magnitude a and b have? I guess
b must be in the order of dozens of thousands, since we are talking about people. What is a? Is it dominated by the number of species of insects, or cosmic bodies, or what?
I tried to ask this question several years ago in Russian Wikipedia, but
there was no concluding answer.
Cheers Yaroslav
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Doesn't Russell's paradox apply here? Surely an appropriate lemma would be that no finite number of notable subjects can ever be given.
Fae
wiki-research-l@lists.wikimedia.org