Date: Mon, 14 Feb 2011 03:16:12 +0000 From: Ian Woollard ian.woollard@gmail.com Subject: [WikiEN-l] Rating the English wikipedia
This encyclopedia has been rated as C-Class on the project's quality scale. This encyclopedia has been checked against the following criteria for B-Class status:
<snip>
- Coverage and accuracy: criterion not met (currently 3.5 million
of an estimated 4.4 million articles)
<snip>
You think there are only 4.4 million possible topics? Based on what criteria? Stevertigo also thought this in the essay Wikipedia:Concept limit, which I tagged as [citation needed]. There are probably tens of millions of potentially notable topics, if not hundreds of millions. However, we're better at deleting new articles than writing them and writing a new article that will survive these days requires more detailed research than in years gone by.
On 14 February 2011 20:04, Fences&Windows fences_and_windows@yahoo.co.uk wrote:
Date: Mon, 14 Feb 2011 03:16:12 +0000 From: Ian Woollard ian.woollard@gmail.com Subject: [WikiEN-l] Rating the English wikipedia
This encyclopedia has been rated as C-Class on the project's quality scale. This encyclopedia has been checked against the following criteria for B-Class status:
<snip> >2. Coverage and accuracy: criterion not met (currently 3.5 million >of an estimated 4.4 million articles) <snip>
You think there are only 4.4 million possible topics? Based on what criteria? Stevertigo also thought this in the essay Wikipedia:Concept limit, which I tagged as [citation needed]. There are probably tens of millions of potentially notable topics, if not hundreds of millions. However, we're better at deleting new articles than writing them and writing a new article that will survive these days requires more detailed research than in years gone by.
I agree. There are far more than 4.4 million possible topics. Consider all the human settlements that we could write articles about. There could well be millions of those (I really don't know how many there are).
On 14 February 2011 20:04, Fences&Windows fences_and_windows@yahoo.co.uk wrote:
From: Ian Woollard ian.woollard@gmail.com
- Coverage and accuracy: criterion not met (currently 3.5 million
of an estimated 4.4 million articles)
You think there are only 4.4 million possible topics? Based on what criteria?
I recall someone (Ray Saintonge?) working out there'd be at least 20 million, just going on placenames and politicians that are currently in all the large WPs. Anyone got a link on hand to that?
- d.
On Mon, Feb 14, 2011 at 3:17 PM, David Gerard dgerard@gmail.com wrote:
On 14 February 2011 20:04, Fences&Windows fences_and_windows@yahoo.co.uk wrote:
From: Ian Woollard ian.woollard@gmail.com
- Coverage and accuracy: criterion not met (currently 3.5 million
of an estimated 4.4 million articles)
You think there are only 4.4 million possible topics? Based on what criteria?
I recall someone (Ray Saintonge?) working out there'd be at least 20 million, just going on placenames and politicians that are currently in all the large WPs. Anyone got a link on hand to that?
Perhaps http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialize...
On 14 February 2011 20:48, Gwern Branwen gwern0@gmail.com wrote:
On Mon, Feb 14, 2011 at 3:17 PM, David Gerard dgerard@gmail.com wrote:
I recall someone (Ray Saintonge?) working out there'd be at least 20 million, just going on placenames and politicians that are currently in all the large WPs. Anyone got a link on hand to that?
Perhaps http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialize...
That's the one!
There's a *heck* of a lot still to be written.
- d.
There are two approaches to predicting the size of Wikipedia, one based on working out how many articles would meet the general notability guideline, the other charting how we have grown and extrapolating the curve.
I'm not totally convinced at the 20 million theory based on articles in different Wikipedias that aren't interwiki linked. I suspect that a bit more work at finding intrawiki links would chip away at that, I know from the death anomalies project http://meta.wikimedia.org/wiki/Death_anomalies_table that we are still adding intrawiki links, and I'm pretty sure that we've added a lot in the 18 months since the 20 million prediction was made. So the potential size of the pedia might be less than twenty million, but I'm pretty sure it is many millions more than the 3.55 million we currently have. Provided we keep our notability policy and if we can rein in the deletionists, there are a lot of notable topics that don't have articles yet.
There was an extrapolation of the trend done in 2007 that predicted we'd peak at 3.5 million http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia%27s_growth#Logist...
We are currently 1% above that and still growing.
The 4.4million prediction comes from the Gompertz model http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
But the vulnerability of that model, as with any extrapolation, is that the thing you are modelling can change. If something like WYSIWYG editing were to bring in a new wave of editors then the model would break and it would be possible to think in terms of how many potential articles qualify.
WereSpielChequers
On 14 February 2011 21:54, David Gerard dgerard@gmail.com wrote:
On 14 February 2011 20:48, Gwern Branwen gwern0@gmail.com wrote:
On Mon, Feb 14, 2011 at 3:17 PM, David Gerard dgerard@gmail.com wrote:
I recall someone (Ray Saintonge?) working out there'd be at least 20 million, just going on placenames and politicians that are currently in all the large WPs. Anyone got a link on hand to that?
Perhaps http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialize...
That's the one!
There's a *heck* of a lot still to be written.
- d.
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
On 14/02/2011 22:31, WereSpielChequers wrote:
<snip>
If something like WYSIWYG editing were to bring in a new wave of editors then the model would break and it would be possible to think in terms of how many potential articles qualify.
I think there is a point here. There are certainly a number of valid topics without articles in enWP (a million is a good enough figure), but the question is how many people will (a) think they should be written, and then (b) do something about it. The demographics of "new editors" have something to do with (a). We certainly need new editors upgrading our older articles where that has not been done, also (which is on-topic for the thread).
Much of this discussion seems to work still with a rather primitive model of how editors assign themselves to tasks. Among tasks is seeing what the encyclopedia needs by direct inspection of existing content.
Charles
On 14/02/2011, David Gerard dgerard@gmail.com wrote:
On 14 February 2011 20:48, Gwern Branwen gwern0@gmail.com wrote:
Perhaps http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialize...
Oh riiiiiiiiiiiiiiiiiiiight. So back in 2006, Piotrus claims that there should be 400 million articles.
It turns out he based this essentially only on biographies. In Poland.
Quick sanity check: that's about one bio article for every twentieth person alive on the entire planet. And these would be encyclopedically *notable* people would they?
We can easily see that that's not going to happen, even allowing for the fact that lots of people have died already, most people just aren't that notable, and the current population completely swamps historical populations.
OK, so how did this happen? So I checked back through the history of the article. The first claim was that it essentially needs 400 million biographies of people. It turns out that the 400 million was based on dividing 30 into 1000 to get 0.3% and then dividing that into the biographies in the English Wikipedia. But... 30 in 1000 is 3%. So he's already out by a factor of 10. That's bad enough. So now we're down to 40 million.
His next error is assuming that the English Wikipedia is off by a factor of 33 on its biographies *worldwide*, as opposed to having a blind patch on Poland.
So let's look at this. The biographical encyclopedia that he mentions has 25,000 entries. Poland has 38 million people. So less than 1 person in a thousand is notable in Poland according to this encyclopedia.
I then checked the British biography 'Who's who'. They have about 30,000 entries, but that's only about 1 person in 2000 in Great Britain, so even less.
But again, roughly 1 person in 1000.
The world population is currently about 7 billion.
So if it's as high as 1 in a 1000 then that's about 7 million articles, and to be honest in reality it's probably a *lot* less, a lot of people globally do things like subsistence level farming, and are thus far less likely to be notable. So even that is excessively favourable.
I would guess we're looking at a few million biographies needed, worldwide at the very most. And sure, there's probably other biographical encyclopedias out there, and they may list a few more that Who's who misses, but that kind of thing depends on notability as to whether they'd survive AFDs in a general encyclopedia.
Anyway, so I stop there. Even 40 million appears completely unsupportable. It looks like it's off again by about another order of magnitude.
So, to sum up, this article's claim of 400 million is just based on simple and obvious arithmetic logical errors, and seems to be two orders of magnitude too high.
- d.
On 15 February 2011 04:00, Ian Woollard ian.woollard@gmail.com wrote:
Anyway, so I stop there. Even 40 million appears completely unsupportable. It looks like it's off again by about another order of magnitude.
Oh really?
People have been keeping records for a long time. Western Europe has very comprehensive records going back 200 years. More patchy records strech back about 8000 years.
When you consider the number of politicians, military leaders, aristocracy, industrialists, sportspeople, scientists, writers, artists, musicians, performers and general hangers on there have been in that time it's quite a lot of people.
How many is probably impossible to calulate. There are various attack lines "how many people does it take to make a person notable" or random sampling of the electoral roll would be one way to make a start but as far as I'm aware we haven't done so. We can establish a lower bound since the Thomson-Gale's Biography Resource Center contains over 1,335,000 biographies.
On Tue, Feb 15, 2011 at 4:33 AM, geni geniice@gmail.com wrote:
We can establish a lower bound since the Thomson-Gale's Biography Resource Center contains over 1,335,000 biographies.
The 2007 edition of the ODNB (British biographical history) has "50,113 biographical articles covering 54,922 lives". What criteria are used for the Thomson-Gale's Biography Resource Center? We don't have an article on that, though we do have this:
http://en.wikipedia.org/wiki/Biography_and_Genealogy_Master_Index
"The Biography and Genealogy Master Index (BGMI) was a printed reference index, and is currently a proprietary database published by the Gale Research Company. The database indexes more than 15 million individuals, living and deceased, covered in more than 1700 biographical reference sources."
It that something different?
http://www.gale.cengage.com/servlet/BrowseSeriesServlet?region=9&imprint...
Carcharoth
On 15 February 2011 11:22, Carcharoth carcharothwp@googlemail.com wrote:
On Tue, Feb 15, 2011 at 4:33 AM, geni geniice@gmail.com wrote:
We can establish a lower bound since the Thomson-Gale's Biography Resource Center contains over 1,335,000 biographies.
The 2007 edition of the ODNB (British biographical history) has "50,113 biographical articles covering 54,922 lives". What criteria are used for the Thomson-Gale's Biography Resource Center? We don't have an article on that, though we do have this:
http://en.wikipedia.org/wiki/Biography_and_Genealogy_Master_Index
"The Biography and Genealogy Master Index (BGMI) was a printed reference index, and is currently a proprietary database published by the Gale Research Company. The database indexes more than 15 million individuals, living and deceased, covered in more than 1700 biographical reference sources."
It that something different?
http://www.gale.cengage.com/servlet/BrowseSeriesServlet?region=9&imprint...
Carcharoth
It's something listed at:
http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons
On 15 February 2011 04:33, geni geniice@gmail.com wrote:
On 15 February 2011 04:00, Ian Woollard ian.woollard@gmail.com wrote:
Anyway, so I stop there. Even 40 million appears completely unsupportable. It looks like it's off again by about another order of magnitude.
Oh really?
Yeah, really. That page claims we only have 3% of notable Poles. Are you really, seriously, telling me we only have 3% of ALL notable biographies??? Because that's what that page is assuming to calculate that 40 million.
People have been keeping records for a long time. Western Europe has very comprehensive records going back 200 years. More patchy records strech back about 8000 years.
Yup.
When you consider the number of politicians, military leaders, aristocracy, industrialists, sportspeople, scientists, writers, artists, musicians, performers and general hangers on there have been in that time it's quite a lot of people.
How many is probably impossible to calulate.
It's not impossible to calculate, you look at the counts from an encyclopedias of famous people. And they very typically list historical people as well as living people.
--
geni
On 15 February 2011 16:19, Ian Woollard ian.woollard@gmail.com wrote:
Yeah, really. That page claims we only have 3% of notable Poles. Are you really, seriously, telling me we only have 3% of ALL notable biographies??? Because that's what that page is assuming to calculate that 40 million.
It's possible. Our coverage of say British MPs starts to fall apart pre-20th century.
It's not impossible to calculate, you look at the counts from an encyclopedias of famous people. And they very typically list historical people as well as living people.
But they all hit dead tree limitations. Sure you can chose a very narrow focus book like the alphabet of the saints. So it seem pretty likely that up until 1992 Southampton FC had a bit over 700 players about which it would be possible to write something about. But such books don't really exist for far areas.
Still assuming players play for an average of 2 clubs (remember players didn't used to move around as much) you are looking at about 28 000 english male football bios up until 1992.
But how many captains of the royal navy are notable? How many knights? Mayors?
So while yes it may be possible for some individual areas as to how many bios there could be more generally I don't think can be done.
On 15/02/2011, geni geniice@gmail.com wrote:
On 15 February 2011 16:19, Ian Woollard ian.woollard@gmail.com wrote:
Yeah, really. That page claims we only have 3% of notable Poles. Are you really, seriously, telling me we only have 3% of ALL notable biographies??? Because that's what that page is assuming to calculate that 40 million.
It's possible. Our coverage of say British MPs starts to fall apart pre-20th century.
But should each MP necessarily have his own biography?
It's not impossible to calculate, you look at the counts from an encyclopedias of famous people. And they very typically list historical people as well as living people.
But they all hit dead tree limitations.
Then they're not capable of being reliably sourced.
Sure you can chose a very narrow focus book like the alphabet of the saints. So it seem pretty likely that up until 1992 Southampton FC had a bit over 700 players about which it would be possible to write something about. But such books don't really exist for far areas.
Then there's no sources, and no biography.
Still assuming players play for an average of 2 clubs (remember players didn't used to move around as much) you are looking at about 28 000 english male football bios up until 1992.
Only if they're notable, and reliably sourced. I don't think they're notable enough to have their own article simply for having played.
But how many captains of the royal navy are notable? How many knights? Mayors?
Indeed.
So while yes it may be possible for some individual areas as to how many bios there could be more generally I don't think can be done.
So you're saying that you don't know; and it's not a lot of use is it?
-- geni
On 15 February 2011 18:17, Ian Woollard ian.woollard@gmail.com wrote:
On 15/02/2011, geni geniice@gmail.com wrote:
On 15 February 2011 16:19, Ian Woollard ian.woollard@gmail.com wrote:
Yeah, really. That page claims we only have 3% of notable Poles. Are you really, seriously, telling me we only have 3% of ALL notable biographies??? Because that's what that page is assuming to calculate that 40 million.
It's possible. Our coverage of say British MPs starts to fall apart pre-20th century.
But should each MP necessarily have his own biography?
It's not impossible to calculate, you look at the counts from an encyclopedias of famous people. And they very typically list historical people as well as living people.
But they all hit dead tree limitations.
Then they're not capable of being reliably sourced.
Of course they are. It's just the sources are things other than encyclopedias of famous people
Only if they're notable, and reliably sourced. I don't think they're notable enough to have their own article simply for having played.
In practice yes they are. Local newspapers tend to use their local sports teams as filler.
So you're saying that you don't know; and it's not a lot of use is it?
No I'm saying it wasn't possible to know. You were the one who claimed it was.
On 15/02/2011 18:17, Ian Woollard wrote:
On 15/02/2011, genigeniice@gmail.com wrote:
On 15 February 2011 16:19, Ian Woollardian.woollard@gmail.com wrote:
Yeah, really. That page claims we only have 3% of notable Poles. Are you really, seriously, telling me we only have 3% of ALL notable biographies??? Because that's what that page is assuming to calculate that 40 million.
It's possible. Our coverage of say British MPs starts to fall apart pre-20th century.
But should each MP necessarily have his own biography?
Arguably the answer is "yes", back to the 16th century at least. There has actually been quite a lot of havoc onsite over stub MP biographies during the past year, but it transpires that there are pretty good sources back to 1660, and usually adequate sources in the century leading up to that (if you work at it). The ODNB took a decision not to include all MPs (it says somewhere, in terms that suggest that it was a decision that did at least require a moment's thought). Some parliaments of Henry VIII are apparently lacking lists of MPs, but after then it seems like a good use of WP to collate this information.
Charles
On 15 February 2011 20:18, Charles Matthews charles.r.matthews@ntlworld.com wrote:
Arguably the answer is "yes", back to the 16th century at least. There has actually been quite a lot of havoc onsite over stub MP biographies during the past year, but it transpires that there are pretty good sources back to 1660, and usually adequate sources in the century leading up to that (if you work at it). The ODNB took a decision not to include all MPs (it says somewhere, in terms that suggest that it was a decision that did at least require a moment's thought).
There is a project (even longer-running and slower-burning than the ODNB) to construct a reference work covering all MPs, at least as much as they're known, along with various other bits and pieces:
http://www.histparl.ac.uk/about.html
In the past sixty years, they've managed to cover a little over half the timeframe in twenty-eight (!) volumes. I have never seen their work, I admit, but I'd be intrigued to...
On Tue, Feb 15, 2011 at 8:56 PM, Andrew Gray andrew.gray@dunelm.org.ukwrote:
There is a project (even longer-running and slower-burning than the ODNB) to construct a reference work covering all MPs, at least as much as they're known, along with various other bits and pieces:
Or try http://en.wikipedia.org/wiki/History_of_Parliament for some explanation of its history.
In the past sixty years, they've managed to cover a little over half the timeframe in twenty-eight (!) volumes. I have never seen their work, I admit, but I'd be intrigued to...
I have the CD-Rom containing the volumes published up to 1998 and 12 volumes published since then are on a shelf just above the computer. They are very interesting studies, delving very deep into manuscript sources and using as their sources letters between various senior politicians preserved in the archives. They concentrate only on the subjects' Parliamentary and political activities, so for example the only mention of the diary of Samuel Pepys (MP for Castle Rising 1673-79, Harwich 1679 and 1685-88) is that Pepys stopped writing it before he became an MP.
On 15 February 2011 04:00, Ian Woollard ian.woollard@gmail.com wrote:
I then checked the British biography 'Who's who'. They have about 30,000 entries, but that's only about 1 person in 2000 in Great Britain, so even less.
This is actually quite an interesting angle to come at the problem from.
Who's Who has 34,210 people in it (the selection process is "notable" by their standards, "related to the UK", though this is sometimes stretched, and currently living). Their "legacy archive", of people who were at some point included since publication began c. 1900, is larger; it runs to 89,763 names - thus a total of ~124,000 people, of whom 28% are currently alive.
But that's, of course, an undercount of all people "notable and related to the UK".
* Firstly, Who's Who has gaps; it has an idiosyncratic and, historically, quite old-fashioned selection process. My current work is on the sort of person that stuffy establishment reference works thrived on, but I find perhaps 20% of them aren't covered. * Secondly, the gaps involve systemic biases; to consider one we can easily check for, only 13% of the "current" biographies are women, and a tiny 4% of the "old" biographies are. * Thirdly - perhaps the biggest element - notability didn't begin with the people still breathing in 1900. The Who's Who figures don't reflect the long tail of historical biographies from the past; a conservative estimate might be to double or triple the figures.
After making appropriate adjustments for these, we find that the data suggests there might be 400,000 potentially suitable biographies out there within the broad geographical remit of Who's Who; expanding that to the world as a whole would begin to push the high seven figures.
Or, to look at it another way... we currently have around half a million BLPs from around the world. *Without* correcting for the long tail of dead people, then our known coverage of BLPs would suggest there should be around 1,800,000 total "possible" biographies. If we *do* make a corresponding adjustment, then the expected total comes in at three to four million biographies. And, of course, we have known gaps in our BLP coverage, suggesting the total number would come out higher...
We currently have around 900,000 biographies. So even by a *highly conservative* estimate, taking for the sake of argument that we have 100% coverage of living biographies and that the number of people notable before the late nineteenth century was trivial, there'd still be, at the very least, a million notable past biographies still waiting to be written...
On Mon, Feb 14, 2011 at 9:54 PM, David Gerard dgerard@gmail.com wrote:
There's a *heck* of a lot still to be written.
On that topic, I came across this interesting essay:
http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia_extended_growth
It tries to project to the year 2025!
Carcharoth
On 16/02/2011, Carcharoth carcharothwp@googlemail.com wrote:
I came across this interesting essay:
http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia_extended_growth
It tries to project to the year 2025!
And fails spectacularly. The extended growth model seems pretty inaccurate, very over-optimistic:
http://en.wikipedia.org/wiki/File:Enwikipediagrowthcomparison.PNG
That graph hasn't been updated recently, but other graphs show that the Gompertz model is still tracking about as well as any simple model could do:
http://en.wikipedia.org/wiki/File:EnwikipediagrowthGom.PNG
although even that is looking perhaps very slightly pessimistic, but it's too early to be absolutely sure.
But we can certainly I think, say with some justification, that the extended growth model is significantly off the mark.
Carcharoth
On 16/02/2011 23:56, Carcharoth wrote:
On Mon, Feb 14, 2011 at 9:54 PM, David Gerarddgerard@gmail.com wrote:
There's a *heck* of a lot still to be written.
On that topic, I came across this interesting essay:
http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia_extended_growth
It tries to project to the year 2025!
I'd be interested in any discussion at all on the amount of useful material out there (on the Web) and how it is changing. It is a fact that there are more and more reliable sources posted that can be used to create articles. This is a factor that affects directly what actually gets written, as opposed to what potentially might be a topic to write about.
I think we just don't know how much will be around in 2025 that could support our work, either in the form of public domain reference material, or respectable scholarly webpages to which we can link. Extrapolations leaving out this factor aren't worth as much as they might be.
Charles
Even if the online resources didn't improve, and we could really do with a big improvement in parts of the developing world, as long as the Internet continues to be updated we can expect a steady flow of new articles. Sports, Politics, popular culture and science are all going to generate new articles for the foreseeable future. We currently have half a million biographies of living people, assuming we keep our current notability standards and coverage levels, then to keep that number stable we can expect at least ten thousand more each year. So even without filling in the historical gaps there will be a steady increase in the total number of biographies on the pedia. Large gaps in our coverage of people who retired pre-Internet are slowly being filled in from the obituary pages, and that could continue for decades. Every year there will be new films, books, natural disasters and sports events. So if we still have an editor community to write them, we can expect a steady flow of new articles.
I think we need a model of article growth that blends two elements, multiple bell curves showing the process of initially populating the pedia with various subjects, and an annual input of new articles on newly notable subjects. I expect that on many subjects of interest to our first wave of editors - computing, milhist, contemporary western popular culture and the geography of the English speaking parts of the developed world we have already gone quite away over the top of the bell. But there are other bell curves that we are at much earlier stages of. Judging from the newpages I've seen in the last few months populated places in the Indian subcontinent is very much on the fast rising side of the bell curve. The bell curves of species, astronomical objects, chemicals, genes and chemicals are all in their early stages. In future as new editors come on board or existing editors acquire new enthusiasms we can expect that yet unwritten areas of the pedia will go through their own bell curve expansions.
We still have a huge influx of new editors, though very few stick around. I suspect the ultimate size of the pedia depends at least as much on the way we treat new editors as it does on the availability of easily accessible sources.
WereSpielChequers
On 17 February 2011 09:38, Charles Matthews charles.r.matthews@ntlworld.com wrote:
On 16/02/2011 23:56, Carcharoth wrote:
On Mon, Feb 14, 2011 at 9:54 PM, David Gerarddgerard@gmail.com wrote:
There's a *heck* of a lot still to be written.
On that topic, I came across this interesting essay:
http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia_extended_growth
It tries to project to the year 2025!
I'd be interested in any discussion at all on the amount of useful material out there (on the Web) and how it is changing. It is a fact that there are more and more reliable sources posted that can be used to create articles. This is a factor that affects directly what actually gets written, as opposed to what potentially might be a topic to write about.
I think we just don't know how much will be around in 2025 that could support our work, either in the form of public domain reference material, or respectable scholarly webpages to which we can link. Extrapolations leaving out this factor aren't worth as much as they might be.
Charles
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
On 17 February 2011 10:54, WereSpielChequers werespielchequers@gmail.com wrote:
I think we need a model of article growth that blends two elements, multiple bell curves showing the process of initially populating the pedia with various subjects, and an annual input of new articles on newly notable subjects.
Sigmoid with a linear limit, i.e. more or less what we see?
- d.
On 02/17/11 2:54 AM, WereSpielChequers wrote:
Even if the online resources didn't improve, and we could really do with a big improvement in parts of the developing world, as long as the Internet continues to be updated we can expect a steady flow of new articles. Sports, Politics, popular culture and science are all going to generate new articles for the foreseeable future. We currently have half a million biographies of living people, assuming we keep our current notability standards and coverage levels, then to keep that number stable we can expect at least ten thousand more each year. So even without filling in the historical gaps there will be a steady increase in the total number of biographies on the pedia. Large gaps in our coverage of people who retired pre-Internet are slowly being filled in from the obituary pages, and that could continue for decades. Every year there will be new films, books, natural disasters and sports events. So if we still have an editor community to write them, we can expect a steady flow of new articles.
I missed reading this thread when it was active, but my own estimate of what still needs to be done in historical biographies alone is quite high. For most of its 177 years of publication "The Gentleman's Magazine". provided a steady diet of obituaries. If it averaged 1000 pages a year that's well over 170,000 pages of material.I now also have the first 60 years of "Notes and Queries"; it was the kind of publication that a 19th century Wikipedian would have loved to work on. It includes all sorts of fascinating oddball material. "Who's Who" was followed by "Who Was Who" for deceased persons, but there were also more narrowly focused versions for different places, and different subject areas. Out of curiosity I looked up one surname in the Spanish language "Enciclopedia universal illustrada" Of the 30 persons with that surname enwp only had articles on 2, eswp only 1. What do we do with such things as the drawings of the proposed new gaol at Bury-St. Edmonds in the August 1801 issue of "The Gentleman's Magazine"? (Does it even still exist?) Then there's the endless stream of books that were reviewed in a wide range of 19th century periodicals. The reviews themselves are as worth reading as the books, because they often contrasted a number of publications around a chosen theme. An estimate of 20,000,000 English Wikipedia articles seems increasingly conservative. The amount of work to be done is enormous even without having to fight with the notability police.
Ec
On Wed, Jul 20, 2011 at 10:17 AM, Ray Saintonge saintonge@telus.net wrote:
I missed reading this thread when it was active, but my own estimate of what still needs to be done in historical biographies alone is quite high.
I agree, but some level of selectivity is needed. I now try and maintain a list of articles I failed to find when looking for information, and also of articles that are on other language Wikipedias but not the English one. I'll post some of those at the end.
For most of its 177 years of publication "The Gentleman's Magazine". provided a steady diet of obituaries. If it averaged 1000 pages a year that's well over 170,000 pages of material.
A good start would be a listing along with how long the obituaries are. You might find some are very short. The obvious thing to focus on is ones where other sources exist, and keep the others as a project list for now.
<snip>
What do we do with such things as the drawings of the proposed new gaol at Bury-St. Edmonds in the August 1801 issue of "The Gentleman's Magazine"? (Does it even still exist?)
You would first look for it in other sources, and then add it to the history section or article for Bury-St. Edmonds. Not all material will lend itself to a new article, and corroboration with other sources is important.
Then there's the endless stream of books that were reviewed in a wide range of 19th century periodicals. The reviews themselves are as worth reading as the books, because they often contrasted a number of publications around a chosen theme.
Eh. I'm less enthusiastic about book reviews. I'd transcribe them into Wikisource and link them from the books they review (if the books have articles, and if not, then move on).
An estimate of 20,000,000 English Wikipedia articles seems increasingly conservative. The amount of work to be done is enormous even without having to fight with the notability police.
Sometimes other sites are better suited to some material. I would start with Wikisource for some of the material you have mentioned.
Anyway, a few examples of missing articles:
Gunnarea capensis (marine polychaete worm) Laboratoire Souterrain à Bas Bruit (LSBB, French research ) Giovanni da Vigo (1450-1525, Italian surgeon)
The latter two have articles on the French (fr) and Italian (it) Wikipedia, so could be dealt with by translation efforts, but nothing on the first example. Some of the more obscure branches of the tree of life are replete with redlinks.
Carcharoth
On 07/20/11 4:23 AM, Carcharoth wrote:
On Wed, Jul 20, 2011 at 10:17 AM, Ray Saintongesaintonge@telus.net wrote:
I missed reading this thread when it was active, but my own estimate of what still needs to be done in historical biographies alone is quite high.
I agree, but some level of selectivity is needed. I now try and maintain a list of articles I failed to find when looking for information, and also of articles that are on other language Wikipedias but not the English one. I'll post some of those at the end.
"Level of selectivity" too easily becomes an excuse for exclusion. Some of us feel that comprehensiveness is closer to the core values of Wikipedia.
For most of its 177 years of publication "The Gentleman's Magazine". provided a steady diet of obituaries. If it averaged 1000 pages a year that's well over 170,000 pages of material.
A good start would be a listing along with how long the obituaries are. You might find some are very short. The obvious thing to focus on is ones where other sources exist, and keep the others as a project list for now.
Some are indeed too short to warrant individual articles. Perhaps the entire content of an issue's obituary (The publication uses the singular to refer to the entire collection of death notices in an issue.) needs to be added to Wikisource. I am looking at the October 1801 issue where there are many such stubs, as with an entry for August 16: "A poor old man, named Threadaway belonging to the workhouse at Newington, Surrey, employed in brewing beer for the use of the house, by some accident fell into the boiling liquor, and was scalded to death." This one is not likely to ever be expanded, but others easily have more useful information.
What do we do with such things as the drawings of the proposed new gaol at Bury-St. Edmonds in the August 1801 issue of "The Gentleman's Magazine"? (Does it even still exist?)
You would first look for it in other sources, and then add it to the history section or article for Bury-St. Edmonds. Not all material will lend itself to a new article, and corroboration with other sources is important.
Corroboration from other sources should not always be such a necessity. When we are dealing with 200-year old information that corroboration is not such an easy task. Even when it exists it is not easily accessible, or will take a great deal of effort to track down. Sometimes you just need to trust your single source on the basis of your experience with the reliability of the source. Corroboration can wait for some other day, though our one source still needs to be fully identified.
Then there's the endless stream of books that were reviewed in a wide range of 19th century periodicals. The reviews themselves are as worth reading as the books, because they often contrasted a number of publications around a chosen theme.
Eh. I'm less enthusiastic about book reviews. I'd transcribe them into Wikisource and link them from the books they review (if the books have articles, and if not, then move on).
I would be less interested in the reviews than the books themselves. It is the books themselves that need articles.
An estimate of 20,000,000 English Wikipedia articles seems increasingly conservative. The amount of work to be done is enormous even without having to fight with the notability police.
Sometimes other sites are better suited to some material. I would start with Wikisource for some of the material you have mentioned.
Anyway, a few examples of missing articles:
Gunnarea capensis (marine polychaete worm) Laboratoire Souterrain à Bas Bruit (LSBB, French research ) Giovanni da Vigo (1450-1525, Italian surgeon)
The latter two have articles on the French (fr) and Italian (it) Wikipedia, so could be dealt with by translation efforts, but nothing on the first example. Some of the more obscure branches of the tree of life are replete with redlinks.
Absolutely! We can always easily find missing articles on an individual basis. It's the scope that's overwhelming.
Ec
On 20/07/2011 10:17, Ray Saintonge wrote:
I missed reading this thread when it was active, but my own estimate of what still needs to be done in historical biographies alone is quite high.
Yes, that is one area where the material seems available to do much more.
An estimate of 20,000,000 English Wikipedia articles seems increasingly conservative. The amount of work to be done is enormous even without having to fight with the notability police.
On the other hand, the number of active Wikipedians who know where their next 1000 articles are coming from is quite small, IMX. The emphasis on enWP is hardly on being prolific: quality is more highly rated than quantity. That may not be wrong, of course, but to some extent these things are a matter of personal taste, and should remain so. We could do with better support of the "good stub" concept, I think: probably an example of "tacit knowledge" about the site, in that editors who have been around for a while know what that means, while the manual pages have a different slant.
All discussions of the "notability" concept we use seem to end up with the generally broken nature of the thing. It is just that there is no snappy replacement. WP:GNG is a bit objectionable in the insistence on "secondary sources"; it is not completely silly but is not that helpful either when you start pushing the limits.
Charles
On 07/26/11 3:13 AM, Charles Matthews wrote:
On 20/07/2011 10:17, Ray Saintonge wrote:
I missed reading this thread when it was active, but my own estimate of what still needs to be done in historical biographies alone is quite high.
Yes, that is one area where the material seems available to do much more.
An estimate of 20,000,000 English Wikipedia articles seems increasingly conservative. The amount of work to be done is enormous even without having to fight with the notability police.
On the other hand, the number of active Wikipedians who know where their next 1000 articles are coming from is quite small, IMX. The emphasis on enWP is hardly on being prolific: quality is more highly rated than quantity. That may not be wrong, of course, but to some extent these things are a matter of personal taste, and should remain so. We could do with better support of the "good stub" concept, I think: probably an example of "tacit knowledge" about the site, in that editors who have been around for a while know what that means, while the manual pages have a different slant.
All discussions of the "notability" concept we use seem to end up with the generally broken nature of the thing. It is just that there is no snappy replacement. WP:GNG is a bit objectionable in the insistence on "secondary sources"; it is not completely silly but is not that helpful either when you start pushing the limits.
Perhaps this requires a clearer description of what is essential to a good stub.
The WP:GNG is opaque and bureaucratic. It is not suitable to much of the 19th century material that I have. "Notes and Queries is a fascinating publication where the readership answered questions posed by others. Providing other sources for this could be extremely difficult, and none of it comes close to being subject to BLP requirements.
People who rate quality as more important than quantity fail to see the negative aspects of their condition. A simple "caveat lector" can be more reliable than any guarantee of accuracy.
Ec
On 27/07/2011 08:49, Ray Saintonge wrote:
On 07/26/11 3:13 AM, Charles Matthews wrote:
On 20/07/2011 10:17, Ray Saintonge wrote:
I missed reading this thread when it was active, but my own estimate of what still needs to be done in historical biographies alone is quite high.
Yes, that is one area where the material seems available to do much more.
An estimate of 20,000,000 English Wikipedia articles seems increasingly conservative. The amount of work to be done is enormous even without having to fight with the notability police.
On the other hand, the number of active Wikipedians who know where their next 1000 articles are coming from is quite small, IMX. The emphasis on enWP is hardly on being prolific: quality is more highly rated than quantity. That may not be wrong, of course, but to some extent these things are a matter of personal taste, and should remain so. We could do with better support of the "good stub" concept, I think: probably an example of "tacit knowledge" about the site, in that editors who have been around for a while know what that means, while the manual pages have a different slant.
All discussions of the "notability" concept we use seem to end up with the generally broken nature of the thing. It is just that there is no snappy replacement. WP:GNG is a bit objectionable in the insistence on "secondary sources"; it is not completely silly but is not that helpful either when you start pushing the limits.
Perhaps this requires a clearer description of what is essential to a good stub.
I think a discussion of the nature of "good stubs", in relation though to what we know (or rather guess) about the "long tail" of reference material that is "out there" in some form, sounds like an interesting one to have, and not one I recall having before. Basically there are things that (a) people could want to look up, (b) for which "footnote"-style answers exist and are verifiable, and (c) could appear at that sort of length in WP, where they would be an asset rather than an embarrassment. And we still don't know that much about the whole population of such things.
The WP:GNG is opaque and bureaucratic. It is not suitable to much of the 19th century material that I have. "Notes and Queries is a fascinating publication where the readership answered questions posed by others. Providing other sources for this could be extremely difficult, and none of it comes close to being subject to BLP requirements.
Yes, a kind of reference desk for those of largely antiquarian interests in the 19th century (and onwards). The GNG has plenty wrong with it in some topic areas, which is why specialised notability guides are written. I don't think it has yet come up in the form "for historical/antiquarian purposes, what is the minimum adequate kind of answer to a query?".
One day I suppose we'll have an overview of "topic policy" based on a census of actual "topics". I think we'll have to get through our second decade before worrying about that, though.
Charles
On 07/27/11 2:42 AM, Charles Matthews wrote:
On 27/07/2011 08:49, Ray Saintonge wrote:
On 07/26/11 3:13 AM, Charles Matthews wrote:
On 20/07/2011 10:17, Ray Saintonge wrote:
I missed reading this thread when it was active, but my own estimate of what still needs to be done in historical biographies alone is quite high.
Yes, that is one area where the material seems available to do much more.
An estimate of 20,000,000 English Wikipedia articles seems increasingly conservative. The amount of work to be done is enormous even without having to fight with the notability police.
On the other hand, the number of active Wikipedians who know where their next 1000 articles are coming from is quite small, IMX. The emphasis on enWP is hardly on being prolific: quality is more highly rated than quantity. That may not be wrong, of course, but to some extent these things are a matter of personal taste, and should remain so. We could do with better support of the "good stub" concept, I think: probably an example of "tacit knowledge" about the site, in that editors who have been around for a while know what that means, while the manual pages have a different slant.
All discussions of the "notability" concept we use seem to end up with the generally broken nature of the thing. It is just that there is no snappy replacement. WP:GNG is a bit objectionable in the insistence on "secondary sources"; it is not completely silly but is not that helpful either when you start pushing the limits.
Perhaps this requires a clearer description of what is essential to a good stub.
I think a discussion of the nature of "good stubs", in relation though to what we know (or rather guess) about the "long tail" of reference material that is "out there" in some form, sounds like an interesting one to have, and not one I recall having before. Basically there are things that (a) people could want to look up, (b) for which "footnote"-style answers exist and are verifiable, and (c) could appear at that sort of length in WP, where they would be an asset rather than an embarrassment. And we still don't know that much about the whole population of such things.
In the shorter obituary notices of Gentleman's Magazine the information often follows a predictable pattern. To the extent that it is within predefined parameters it could fit well in a "List of ..." article. If a particular entry goes beyond that there is a strong argument that it warrants a stub article of its own. The notion that a second source be provided is often unsound. While there is always the possibility of hoax entries in these old magazines, such entries would still be a tiny segment of the overall content. The majority of contributors, then as now, do so in good faith. A stub from one of these broadly based national publications, will often only be mirrored in a local history that had a very small circulation. Those who complain about these stubs, are often unwilling to track down even relatively common references.
The WP:GNG is opaque and bureaucratic. It is not suitable to much of the 19th century material that I have. "Notes and Queries is a fascinating publication where the readership answered questions posed by others. Providing other sources for this could be extremely difficult, and none of it comes close to being subject to BLP requirements
Ec
On 14 February 2011 20:48, Gwern Branwen gwern0@gmail.com wrote:
Perhaps http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialize...
I think that page is more a test of how good we are at interwiki linking than anything else. The trend it shows is far too fast to be explained by new articles being written, it must be explained by old articles being linked to.