[WikiEN-l] Wikipedia reaches 3 millionth article
David Gerard
dgerard at gmail.com
Wed Aug 19 13:15:46 UTC 2009
2009/8/19 Carcharoth <carcharothwp at googlemail.com>:
> Sure. It will take time. :-)
> But once done, you will have space for more!
> 200,000 pages at 10 pages a day is 20,000 days, which is 54.79 years.
> You might need to crowdsource the scanning.
There's cutting the binding off and auto-feeding the stack of pages
into a scanner-photocopier. This destroys the books, but is very
efficient.
> How do Google Books and libraries and Project Gutenberg and others do
> mass scanning and OCR of books? Do they use lots of money and funding
> to pay lots of people to do lots of scanning on lots of machines, or
> do they automate it in some way?
I believe they have machines to turn pages, and something to figure
out the distorted photo of the book and render it how it would look as
a flat page.
- d.
More information about the WikiEN-l
mailing list