Jimmy Wales wrote:
As a quick followup, Britannica also claims that the 32 volume print encyclopedia has 44 million words.
44,000,000 / 75,000 = 586 words per article
It seems clear to me that we are already "in the ballpark" of the size of Britannica. Quality is, of course, an entirely different question. I think we are often superior and often drastically inferior. I susppect that our coverage contains strange and conspicuous 'holes' if we went through it via a "top down" approach, i.e. take lists of major topics and see if we've covered them.
We already have many "List of ... topics" articles, as well as many other lists that can serve as most wanted lists. We already have a mechanically generated list of "Wanted pages" that works sometimes. A human generated most wanted list would also be very welcome if it's not allowed to become so long that it's useless.
When I first joined Wikipedia I made extensive use of the "Wanted pages" just to find things to do. A more experienced Wikipedian never runs short of things to do, but even then looking at the Wanted pages for something a little different can break monotonous habits. In other words "Wanted pages" is a lot more important to newbies than to veterans; nothing works better for retaining new contributors than to feel that they can contribute something that somebody else wants. It would be nice to see that feature working a little better than has been the case in recent months.
As I said in my previous response, I see a CD edition as a snapshot of WP at a given moment in time. I would suggest that there be a tag for a "print approved" version of any article This is a version that has been reviewed for such things as liabellous comments, copyright infringements or adherence to NPOV. The snapshot would be a collection of all the latest print approved versions stripped of dead links. WP1.0 will clearly be much smaller than its successors, but that's fine too because we can continue with a maintainable promise for better things to come.
Try as we might it is inevitable that unwanted material will creep into the published versions. This could be minimized by having a lot of paid staff to do nothing but checking for this stuff, but this is not a cost-effective strategy consistent with our volunteer nature. To minimize the damage from such incidents, I would recommend short production runs that would ensure that supplies would be exhausted before the expiry of any safe harbor periods. The next production run would have the formally questioned material excised while the matter is being reviewed; it can always be re-inserted at a later time if the challenge turns out to be groundless. Being seen as acting quickly on these problems is a lot more important than going to extraordinary lengths to remove things where we only guess that it '''may''' be offending. Obviously offending material would still regularly be excluded.
Ec