It is frequently said that Wikipedia is not paper. Specifically, "Wikipedia is not a paper encyclopedia. This means that there is no practical limit to number of topics we can cover other than verifiability and the other points presented on this page."
But paper is not paper, either. That is, paper encyclopedias are NOT physically limited in size. Some encylopedias (Columbia) have one volume. Some have more. The first edition of the Encyclopedia Britannica had three volumes; the Eleventh Edition had 29. The current Britannica 3 has 32 volumes.
(By the way, the Britannica states, rather hyperbolically, that those 32 volumes offer "a boundless range of information.")
Is the print Britannica limited to 32 volumes by some kind of physical law? Certainly not. In fact, tens of thousands of households that purchase print encyclopedias wisely or foolish subscribe to yearbook programs, often for many years, until they get tired of gluing little cross-reference stickers into their volumes. So the number of books on the shelf actually grows.
But there is a practical limit of about thirty volumes for a print publication, isn't there? No, there isn't. The existence proof is any journal. Journals can and do grow linearly, year after year, into long rows of bound volumes which libraries, if not homes, manage to find room for on their shelves. I am sure that some homes have more than 30 bound-volumes-worth of the National Geographic neatly stacked up in attics or basements.
So what DOES set the limit to what an encyclopedia can include? It is not any physical characteristic, whether measured in quarto leaves or in bytes.
It is that little detail, "verifiability and the other points presented on this page."
The limit to what an encyclopedia can include is governed basically by the available labor of editors to integrate, synthesize, verify, copy-edit, and fact-check.
What this tells me is that it should be possible to get some kind of reasonable estimate of an appropriate size for Wikipedia by estimating the number of work-hours WIkipedias volunteers put in, and comparing it with the number of work-hours available to the Britannica.
If we're putting in three times as much work, we should be able to cover three times as much content.
If we try to cover more content than the Britannica without putting in more work than the Britannica, then our reach is exceeding our grasp.
I have no idea how to even begin estimating these numbers, but I think it would be instructive to try.
On 04/10/05, Daniel P. B. Smith dpbsmith@verizon.net wrote:
But there is a practical limit of about thirty volumes for a print publication, isn't there? No, there isn't. The existence proof is any journal. Journals can and do grow linearly, year after year, into long rows of bound volumes which libraries, if not homes, manage to find room for on their shelves. I am sure that some homes have more than 30 bound-volumes-worth of the National Geographic neatly stacked up in attics or basements.
Newspapers (not that many places still keep bound copies) are an even worse case.
Or bibliographies. The British Museum "Catalogue of Printed Books" to 1900 was 95 volumes; to 1905 was another 13-volume supplement. (Four million books, if you're wondering). The Library of Congress /Catalog/ was 167 volumes for 1899-1942 (covering two million books; it was far less because it reproduced the actual catalog cards) - and 1942-7 was another forty. The Bibliotheque Nationale "Catalogue général" ran to 172 volumes as of 1948, and had only got up to 'Sim-' in the alphabet!
"Wikipedia is not paper" is a very good principle for some things - for stylistic issues, for the wonderful ability to massively crosslink, for categorisation and backlinks and the ability to scrawl marginal notes everywhere - but I concur, it's not that meaningful a guideline for inclusion issues.
-- - Andrew Gray andrew.gray@dunelm.org.uk
On 10/4/05, Andrew Gray shimgray@gmail.com wrote:
On 04/10/05, Daniel P. B. Smith dpbsmith@verizon.net wrote:
But there is a practical limit of about thirty volumes for a print publication, isn't there? No, there isn't. The existence proof is any journal. Journals can and do grow linearly, year after year, into long rows of bound volumes which libraries, if not homes, manage to find room for on their shelves. I am sure that some homes have more than 30 bound-volumes-worth of the National Geographic neatly stacked up in attics or basements.
Newspapers (not that many places still keep bound copies) are an even worse case.
Or bibliographies. The British Museum "Catalogue of Printed Books" to 1900 was 95 volumes; to 1905 was another 13-volume supplement. (Four million books, if you're wondering). The Library of Congress /Catalog/ was 167 volumes for 1899-1942 (covering two million books; it was far less because it reproduced the actual catalog cards) - and 1942-7 was another forty. The Bibliotheque Nationale "Catalogue général" ran to 172 volumes as of 1948, and had only got up to 'Sim-' in the alphabet!
National Geographic, newpapers, and bibliographies, all have a greater breadth of information than Wikipedia, though.
If I put all the information from the Tampa Tribune into Wikipedia, even ignoring the time-sensitive information, don't you think most of it would be nominated for deletion? Do we have an information in Wikipedia on [[Tampa Bay SalsaFest]], [[Ray Perdomo]], [[Marco Santi]], or [[Town 'N Country Hospital]]? I couldn't find any. Maybe the Tampa Bay Tribune is too local, but I'm sure the same could be said of the stories in the USA Today.
I'm also not so sure the number of work hours put in by Wikipedians isn't many many times the number of work hours put in by the staff of the USA Today. It would be interesting to try to guestimate these figures, I suppose.
On 10/4/05, Daniel P. B. Smith dpbsmith@verizon.net wrote:
If we try to cover more content than the Britannica without putting in more work than the Britannica, then our reach is exceeding our grasp.
But we are and it isn't. If you think we lack a decent article about the culture of Thailand, go write about the culture of Thailand. But let's not hamper the efforts of the umpteen other people who have something to say about Pokemon.
So what DOES set the limit to what an encyclopedia can include? It is not any physical characteristic, whether measured in quarto leaves or in bytes.
You don't think it has anything to do with the cost of publishing?
The limit to what an encyclopedia can include is governed basically
by the available labor of editors to integrate, synthesize, verify, copy-edit, and fact-check.
I think you make an excellent point, with regard to the true limit of what Wikipedia can include, but I disagree that that carries to what a dead-tree encyclopedia can include.
The thing is, under the current process it takes more labor to delete an article than it would to integrate, synthesize, verify, copy-edit, and fact check it.
This is one reason I think something like a pure-wiki deletion system is the best way to handle deletion. If deletion is relatively easy to undo, then we don't have to waste so much time making sure we get it right. Even if a deleted article were only kept around for a week, or a month, for review by any interested party, I think we'd waste a lot less time.
Here's a plan. What if we lower the threshold for speedy deletion, but keep speedily deleted articles viewable by all logged in users for one week?
What this tells me is that it should be possible to get some kind of
reasonable estimate of an appropriate size for Wikipedia by estimating the number of work-hours WIkipedias volunteers put in, and comparing it with the number of work-hours available to the Britannica.
This also neglects the fact that people are more likely to spend time working on some things than working on others. Allow people to contribute information about pokemon, and you're going to get more people willing to contribute (as sick as that is).
If we're putting in three times as much work, we should be able to
cover three times as much content.
If we try to cover more content than the Britannica without putting in more work than the Britannica, then our reach is exceeding our grasp.
Perhaps, but throwing away a large portion of that work isn't going to resolve the problem. I think your primary premise is flawed here. Wikipedia is not paper, and that means a lot. Besides the cost of publishing, there's the ease of searching, and the ability to use technical tools to exclude certain areas. Think about it this way: if we excluded all the non-notable or unfinished articles from everyone except those specifically looking for it, what is the harm in having them?
The only thing I can think of is that you want to force editors to edit certain things. And I don't think that's a very good idea.
I have no idea how to even begin estimating these numbers, but I
think it would be instructive to try.
On 10/5/05, Anthony DiPierro wikispam@inbox.org wrote:
Here's a plan. What if we lower the threshold for speedy deletion, but keep speedily deleted articles viewable by all logged in users for one week?
Increases the pressure on admins. -- geni
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
geni wrote:
On 10/5/05, Anthony DiPierro wikispam@inbox.org wrote:
Here's a plan. What if we lower the threshold for speedy deletion, but keep speedily deleted articles viewable by all logged in users for one week?
Increases the pressure on admins.
I'd be happy to accept that pressure.
- - Ryan
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Anthony DiPierro wrote:
Here's a plan. What if we lower the threshold for speedy deletion, but keep speedily deleted articles viewable by all logged in users for one week?
Lower? Certainly it's ok now, other then the controversial new copyvio stuff. And most speedily deleted articles either should not be viewable, or just wouldn't be very useful. Almost all speedied articles look something like "zzzzzzzzzzzzzzzzzzzzzzz" (which needs to be removed very quickly, as we will run out of these letters!), or otherwise are patent nonsense. And there are attack pages, etc, etc.
Additionally, since we have been talking about wikipedia not being paper, and paper not being paper, and publishing costs, etc...don't forget that there could (and SHOULD) eventually be a print version of wikipedia. If the foundation doesn't do it, someone else probably will. Of course, to be comprehensive enough in the mainstream topic areas that would be expected in an encyclopedia, we will also gain a lot of extra stuff that would not be expected in an encyclopedia.
No way would we fit in the 30 volumes of Britannica for this hypothetical print release! Anyway, what if we had a feature in the Wikipedia 1.0 idea, where we could rate how useful the inclusion of an article in a print version would be. This would allow anyone making a print version, be it the foundation, or someone else, to trim wikipedia easier. Certainly you could do it by hand, but eek. that's huge. With our current database dumps, it would already not be unreasonable to make a script to automatically remove articles with stub tags in them. Obviously these would be worthless in a print version.
What do you all think?
- -- Phroziac | /"\ Encrypted Email Preferred | \ / ASCII Ribbon Campaign OpenPGP key ID: 0xC2AF5417 | X Against HTML email & vCards http://tinyurl.com/anya2 | / \
On 10/4/05, Phroziac phroziac@gmail.com wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Anthony DiPierro wrote:
Here's a plan. What if we lower the threshold for speedy deletion, but
keep
speedily deleted articles viewable by all logged in users for one week?
Lower? Certainly it's ok now, other then the controversial new copyvio stuff. And most speedily deleted articles either should not be viewable, or just wouldn't be very useful. Almost all speedied articles look something like "zzzzzzzzzzzzzzzzzzzzzzz" (which needs to be removed very quickly, as we will run out of these letters!), or otherwise are patent nonsense. And there are attack pages, etc, etc.
Obviously the threshold for speedy deletion isn't OK now if we still have all those articles nominated for VFD. Does it help to keep patent nonsense viewable by logged in non-admins for a week? Well, actually, I'd say yes, it does, because it makes the process more open. But either way, I certainly don't see the harm. We're only talking about a week.
Additionally, since we have been talking about wikipedia not being
paper, and paper not being paper, and publishing costs, etc...don't forget that there could (and SHOULD) eventually be a print version of wikipedia. If the foundation doesn't do it, someone else probably will. Of course, to be comprehensive enough in the mainstream topic areas that would be expected in an encyclopedia, we will also gain a lot of extra stuff that would not be expected in an encyclopedia.
There will eventually be a print version of a subset of Wikipedia. Most likely a very small subset of it. As for whether or not there should be one, I'm actually not so sure of it. Yeah, Jimbo wants to hand them out to people in third world countries, but maybe it'll be possible, by the time the print version is ready, to just include the digital version on the $100 laptops being handed out by that MIT project (http://laptop.media.mit.edu/). I guess you could argue that dead trees are less expensive, but are they really that much less expensive? How much would it cost just to print a single volume 1200 page encyclopedia? I'm thinking $50 or so in heavy bulk, but maybe I have no clue what I'm talking about. Then add in the distribution costs, and maybe we'd be better off just hitching a ride on a laptop.
Incidently, has someone from Wikimedia talked to the MIT group about including Wikipedia on the laptops? They'd be kind of crazy not to - I think the two projects fit together perfectly.
No way would we fit in the 30 volumes of Britannica for this
hypothetical print release! Anyway, what if we had a feature in the Wikipedia 1.0 idea, where we could rate how useful the inclusion of an article in a print version would be. This would allow anyone making a print version, be it the foundation, or someone else, to trim wikipedia easier. Certainly you could do it by hand, but eek. that's huge. With our current database dumps, it would already not be unreasonable to make a script to automatically remove articles with stub tags in them. Obviously these would be worthless in a print version.
What do you all think?
Actually, when I think about such a question, I come to the conclusion that it's too much work to be worth it. Even for a 1200 page encyclopedia (I use this figure because I have one in front of me), most articles wind up being about a paragraph long. That would be an insane amount of work cutting down all those articles. You could include fewer articles, and make them longer, but I doubt you'd be able to include any of the Wikipedia articles in their entirety without producing a focussed encyclopedia (e.g. encyclopedia of baseball players) rather than a general purpose one.
I used to be really keen on the whole print encyclopedia thing. I started to reconsider when I realized you could buy an encyclopedia from Goodwill (a thrift store) for less than money than you could print one (even in bulk, I paid $3 for this 1200 page encyclopedia in front of me). After tying this in with the MIT Media Lab's $100 laptop project, and thinking about it all right now, I'm now convinced that a print Wikipedia is a bad idea.
Anthony
On 05/10/05, Anthony DiPierro wikispam@inbox.org wrote:
Incidently, has someone from Wikimedia talked to the MIT group about including Wikipedia on the laptops? They'd be kind of crazy not to - I think the two projects fit together perfectly.
Perhaps not. A key cost control in the MIT project is to not include vast amounts of storage space - see http://laptop.media.mit.edu/faq.html - which suggests that including a gig or three of content wouldn't be viable.
(Of course, one of the corrolaries of the "instant mesh network" idea they're playing with would be to have a single higher-grade machine on the site which serves as a library; there may be some potential there)
-- - Andrew Gray andrew.gray@dunelm.org.uk
On 5 Oct 2005, at 01:11, Anthony DiPierro wrote:
There will eventually be a print version of a subset of Wikipedia. Most likely a very small subset of it. As for whether or not there should be one, I'm actually not so sure of it. Yeah, Jimbo wants to hand them out to people in third world countries, but maybe it'll be possible, by the time the print version is ready, to just include the digital version on the $100 laptops being handed out by that MIT project (http:// laptop.media.mit.edu/). I guess you could argue that dead trees are less expensive, but are they really that much less expensive? How much would it cost just to print a single volume 1200 page encyclopedia? I'm thinking $50 or so in heavy bulk, but maybe I have no clue what I'm talking about. Then add in the distribution costs, and maybe we'd be better off just hitching a ride on a laptop.
Incidently, has someone from Wikimedia talked to the MIT group about including Wikipedia on the laptops? They'd be kind of crazy not to
- I think
the two projects fit together perfectly.
The small print for the MIT laptop saying 1GB means not 1GB RAM but 1GB storage. No space for wikipedia. They believe that P2P comms will be enough to get stuff and so it wont have a hard drive, and I no DVD either. Remember it is a media lab project, so has to be slightly impractical in some way.
Justinc
Justin Cormack wrote:
On 5 Oct 2005, at 01:11, Anthony DiPierro wrote:
Incidently, has someone from Wikimedia talked to the MIT group about including Wikipedia on the laptops? They'd be kind of crazy not to - I think the two projects fit together perfectly.
The small print for the MIT laptop saying 1GB means not 1GB RAM but 1GB storage. No space for wikipedia. They believe that P2P comms will be enough to get stuff and so it wont have a hard drive, and I no DVD either. Remember it is a media lab project, so has to be slightly impractical in some way.
Justinc
How about seeding a slightly different 100MB subset of WP on each laptop, with the highest rank 10,000 articles on all of them, and then (say) one of 20 disjoint pre-selected subsets of other articles? When the laptops link together in their peer-to-peer wireless network, a class of 30 children is highly likely to have a few GB of Wikipedia available to them: user-visited articles could be replicated on the user's local machine, so they don't lose anything they've read when they disassociate from the p-p mesh, and as different groups of people meet and browse, popular articles can be transferred by diffusion to new users.
Come to that, p-p diffusion could work for many other kinds of useful free-distribution documents outside of Wikipedia, in effect creating a p-p Web that does not need online Internet access to work, but can feed itself from it when available.
I wonder if that is a wacky enough idea to appeal to the Media Lab? I can imagine lots of serious gotchas^W^W interesting research topics (reputation systems, digitally signed updates to content?) to be found along the way towards making it workable.
-- Neil
Phroziac wrote:
No way would we fit in the 30 volumes of Britannica for this hypothetical print release! Anyway, what if we had a feature in the Wikipedia 1.0 idea, where we could rate how useful the inclusion of an article in a print version would be. This would allow anyone making a print version, be it the foundation, or someone else, to trim wikipedia easier. Certainly you could do it by hand, but eek. that's huge. With our current database dumps, it would already not be unreasonable to make a script to automatically remove articles with stub tags in them. Obviously these would be worthless in a print version.
What do you all think?
Just for comparison, the current edition of the EB has about 44M words in 32 volumes. As of July 13, the English-language Wikipedia contained 649,000 articles, and a total of roughly 224M words.
Wikipedia currently has over 750,000 articles, so assuming that article size has not reduced, it probably has around 258M words. This is almost six times the size of the EB, and would take at least 187 volumes of EB-equivalent size.
In my opinion, an article ranking system would be an ideal way to start collecting data for trying to place articles in rank order for inclusion in a fixed amount of space.
One interesting possibility is, in addition to user rankings, using the number of times the article's title is mentioned on the web -- the Google test -- as an extra input to any hypothetical ranking system.
For example, using this very crude test:
"America" -- 1,260,000,000 "Papua New Guinea" -- 68,400,000 "gallbladder" -- 2,670,000 "Basement Jaxx" -- 2,320,000 "Hilbert space" -- 1,770,000 "catecholamine" -- 1,200,000 "Xenu" -- 595,000 "Horatio Nelson" -- 403,000 "Toad the Wet Sprocket" -- 354,000 [!] "lutefisk" -- 200,000 "Weebl and Bob" -- 169,000 "Wallace and Futuna" -- 531, but "Wallace et Futuna" -- 20,400 "Beaker folk" -- 777, but "Beaker People" -- 16,700
but, on the other hand,
"Bokak Atoll" -- 498, but "Taongi Atoll" -- 1,140 1715 "riot act" -- 943 1714 "riot act" -- 718 <a minor British celebrity of the 1970s> -- 714 "renifleurism" -- 275 "2-Hydroxyglutaricaciduria" -- 66
Now, this ranking procedure is not perfect: the Wallace and Futuna islands clearly shouldn't be left out of any encyclopedia, and porn stars will be wildly over-ranked due to search-spamming -- but at least it gives a start to establishing the fame or notoriety of any given subject. Given the apparent Zipf distribution, perhaps the logarithm of the Google page count would be an appropriate measure: "America" would score 9, "Papua New Guinea" 7.4, "Wallace and/et Futuna" 3.4, and "2-Hydroxyglutaricaciduria" 1.8, using logs to base 10.
Other measures might be to look only at .gov/.gov.uk, or .edu/.ac.uk etc. sites, to gain some idea of relative governmental or academic interest in these subjects, perhaps as a measure of seriousness (interestingly, Toad the Wet Sprocket still get 14 hits in .gov sites).
Still, it would be an interesting exercise to look up all current articles and their redirects. Does anyone have a Google API account charged with approximately 1.2 million searches? At one search a second, we could have the figures ready in about two weeks.
-- Neil * *
Neil Harris wrote:
Phroziac wrote:
No way would we fit in the 30 volumes of Britannica for this hypothetical print release! Anyway, what if we had a feature in the Wikipedia 1.0 idea, where we could rate how useful the inclusion of an article in a print version would be. This would allow anyone making a print version, be it the foundation, or someone else, to trim wikipedia easier. Certainly you could do it by hand, but eek. that's huge. With our current database dumps, it would already not be unreasonable to make a script to automatically remove articles with stub tags in them. Obviously these would be worthless in a print version.
In my opinion, an article ranking system would be an ideal way to start collecting data for trying to place articles in rank order for inclusion in a fixed amount of space.
One interesting possibility is, in addition to user rankings, using the number of times the article's title is mentioned on the web -- the Google test -- as an extra input to any hypothetical ranking system.
The thing to remember if a ranking system is used is that it is a tool rather than a solution. It can point to problem articles that need work. We don't need to be limited to a single algorithm for evaluating an article. The Google test can be added, but so can others too.
Ec
On 10/5/05, Anthony DiPierro wikispam@inbox.org wrote:
So what DOES set the limit to what an encyclopedia can include? It is not any physical characteristic, whether measured in quarto leaves or in bytes.
You don't think it has anything to do with the cost of publishing?
The limit to what an encyclopedia can include is governed basically
by the available labor of editors to integrate, synthesize, verify, copy-edit, and fact-check.
I think you make an excellent point, with regard to the true limit of what Wikipedia can include, but I disagree that that carries to what a dead-tree encyclopedia can include.
It should not be forgotten that WP costs money too. Everything from maintaining the servers to purchasing bandwidth to employing Brian. And entirely sourced from generous donations from the public.
Not that I'm advocating anything here, I'm just pointing out that while Wiki is not paper there are real physical limits on WP's size (albeit vastly greater than a traditional encyclopaedia).
-- Stephen Bain stephen.bain@gmail.com
However, compared to paper, it is cheap. I adimit you have to buy thousands dollar machines but hard drive spaces are cheap. Since text don't take much room, you can filled more and more and more. Maintaining it online are cheaper than printing thousands of book. So in the end, wikipedia have more space than those print encyclopedia books, who spends millions on millions on millions of dollars printing, researchings, hiring, etc.
On 10/4/05, Stephen Bain stephen.bain@gmail.com wrote:
On 10/5/05, Anthony DiPierro wikispam@inbox.org wrote:
So what DOES set the limit to what an encyclopedia can include? It is not any physical characteristic, whether measured in quarto leaves or in bytes.
You don't think it has anything to do with the cost of publishing?
The limit to what an encyclopedia can include is governed basically
by the available labor of editors to integrate, synthesize, verify, copy-edit, and fact-check.
I think you make an excellent point, with regard to the true limit of
what
Wikipedia can include, but I disagree that that carries to what a
dead-tree
encyclopedia can include.
It should not be forgotten that WP costs money too. Everything from maintaining the servers to purchasing bandwidth to employing Brian. And entirely sourced from generous donations from the public.
Not that I'm advocating anything here, I'm just pointing out that while Wiki is not paper there are real physical limits on WP's size (albeit vastly greater than a traditional encyclopaedia).
-- Stephen Bain stephen.bain@gmail.com _______________________________________________ WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l
This is one reason I think something like a pure-wiki deletion system is the best way to handle deletion. If deletion is relatively easy to undo, then we don't have to waste so much time making sure we get it right. Even if a deleted article were only kept around for a week, or a month, for review by any interested party, I think we'd waste a lot less time.
Deletion is already easy to undo. It's called "VFU". All people need to learn how to properly pursuade people the original discussion was faulty or caused out of process deletion or that the situation regarding its subject changed.
--Mgm
On 10/5/05, MacGyverMagic/Mgm macgyvermagic@gmail.com wrote:
This is one reason I think something like a pure-wiki deletion system is the best way to handle deletion. If deletion is relatively easy to undo, then we don't have to waste so much time making sure we get it right. Even if a deleted article were only kept around for a week, or a month, for review by any interested party, I think we'd waste a lot less time.
Deletion is already easy to undo. It's called "VFU". All people need to learn how to properly pursuade people the original discussion was faulty or caused out of process deletion or that the situation regarding its subject changed.
Unless you're an admin, you don't even really know what's been deleted without going through the process of petitioning someone else to see what was there. It's not easy to even know what's deleted, much less get it undeleted. If I wanted to go on a salvage hunt through articles deleted, I'd have to find an awfully patient admin to handle my stack of requests.
(Not to imply I like "pure wiki deletion, because I don't.) -- Michael Turley User:Unfocused
Deletion is already easy to undo. It's called "VFU". All people need to learn how to properly pursuade people the original discussion was faulty or caused out of process deletion or that the situation regarding its subject changed.
It's hard to persuade people that an article should be undeleted when you can't even view the article that was deleted.
--Mgm
Anthony
Daniel P. B. Smith wrote:
It is frequently said that Wikipedia is not paper. Specifically, "Wikipedia is not a paper encyclopedia. This means that there is no practical limit to number of topics we can cover other than verifiability and the other points presented on this page."
But paper is not paper, either. That is, paper encyclopedias are NOT physically limited in size. Some encylopedias (Columbia) have one volume. Some have more. The first edition of the Encyclopedia Britannica had three volumes; the Eleventh Edition had 29. The current Britannica 3 has 32 volumes.
The 12th edition from 1922 also had 32 volumes.
(By the way, the Britannica states, rather hyperbolically, that those 32 volumes offer "a boundless range of information.")
And we run out of bounds from the boundless
Is the print Britannica limited to 32 volumes by some kind of physical law? Certainly not. In fact, tens of thousands of households that purchase print encyclopedias wisely or foolish subscribe to yearbook programs, often for many years, until they get tired of gluing little cross-reference stickers into their volumes. So the number of books on the shelf actually grows.
But there is a practical limit of about thirty volumes for a print publication, isn't there? No, there isn't. The existence proof is any journal. Journals can and do grow linearly, year after year, into long rows of bound volumes which libraries, if not homes, manage to find room for on their shelves. I am sure that some homes have more than 30 bound-volumes-worth of the National Geographic neatly stacked up in attics or basements.
So what DOES set the limit to what an encyclopedia can include? It is not any physical characteristic, whether measured in quarto leaves or in bytes.
IIRC there was a time when the Britannica was sold door-to-door in communities where reading was not a routine practice. One had to impress the neighbours. So along with the books you would receive a lovely wooden bookcase to contain them. The set of books had to fit in the bookcase, with a little room left over for the next few yearbooks that could be part of the subsciption.
Daniel P. B. Smith wrote:
It is frequently said that Wikipedia is not paper. Specifically, "Wikipedia is not a paper encyclopedia. This means that there is no practical limit to number of topics we can cover other than verifiability and the other points presented on this page."
But paper is not paper, either. That is, paper encyclopedias are NOT physically limited in size. Some encylopedias (Columbia) have one volume. Some have more. The first edition of the Encyclopedia Britannica had three volumes; the Eleventh Edition had 29. The current Britannica 3 has 32 volumes.
If you read Collison's history (in refs for [[encyclopedia]]), you'll see that size was more often limited by economics than anything else; it took more capital than most publishers had on hand to pay all the contributors, typesetters, and printers before the first set was sold. Subscriptions and incremental releases were among the strategies to cope with this problem, and a number of publishers were bankrupted by their encyclopedia projects.
The problem is still with proprietary encyclopedia makers; they already know their price points (nobody will pay $2700 for a super-duper EB on CD, for instance), so they have to constrain their annual update to whatever they can afford, and reuse as much old material as possible; compare 1911EB entries to present-day, many have only the wording updated (and the references deleted, tsk tsk).
That's not to say WP shouldn't have limits - after all, the OED definition of "compendium" specifically mentions "condensation" and "summary" as definining characteristics, so an encyclopedia that is a "compendium of knowledge" needs to leave out *something*. On the other hand, our mission statement specifically says "sum of all human knowledge", doesn't say anything about leaving the less-notable parts out. I think a lot of our AfD debate ultimately stems from the inconsistency between definition and mission statement.
Stan