[WikiEN-l] Wikipedia reaches 3 millionth article

David Gerard dgerard at gmail.com
Wed Aug 19 13:15:46 UTC 2009


2009/8/19 Carcharoth <carcharothwp at googlemail.com>:

> Sure. It will take time. :-)
> But once done, you will have space for more!
> 200,000 pages at 10 pages a day is 20,000 days, which is 54.79 years.
> You might need to crowdsource the scanning.


There's cutting the binding off and auto-feeding the stack of pages
into a scanner-photocopier. This destroys the books, but is very
efficient.


> How do Google Books and libraries and Project Gutenberg and others do
> mass scanning and OCR of books? Do they use lots of money and funding
> to pay lots of people to do lots of scanning on lots of machines, or
> do they automate it in some way?


I believe they have machines to turn pages, and something to figure
out the distorted photo of the book and render it how it would look as
a flat page.


- d.



More information about the WikiEN-l mailing list