Andrew Turvey wrote:
We had a discussion at a recent Wikimedia UK board
potentially buying some digitisation equipment which could be
used to generate content for the Wikimedia projects. This recent
email to the EN-WP list sparked my interest.
Does anyone have any experience with equipment like this, and
could you recommend anything? Any idea what the price range and
quality typically is?
Also, is anyone else in the Wikimedia community currently doing
I'm on the board of Wikimedia Sverige (Sweden), and also the
founder (in 1992) of Project Runeberg, the Scandinavian offspring
of Project Gutenberg. The Swedish-language Wikisource isn't doing
much, because Project Runeberg still does a lot of book scanning.
Its archive now contains images of 550,000 book pages,
corresponding to nearly 30 linear metres of shelving.
Book digitization is a matter of using the right tools for each
job. Much depends on the kind of book and the kind of labour. If
you use unpaid volunteers, you can afford slower equipment. If you
need to pay your staff, any equipment that speeds up the work will
quickly pay its own cost, including those very expensive
"professional" book scanners. On the scale of Google Book Search,
aiming to digitize millions of books, it pays off to let Google
engineers work on developing even faster equipment, just like
Google develops its own Linux-based storage architecture.
It's hard to measure the usefulness of a digitized book, since
Wikisource (and Project Runeberg) doesn't have any income. (And
neither has Google Book Search, I believe.) If your success is
measured in how much money you spend (as some charities have it),
it is very easy to invest a lot of money, without much result.
The worst you can do is to spend a lot to digitize something that
is already available for free download. Look around first.
You should start to think of what do you want to achieve? Is
there some book, or genre of books, that would be really useful
for Wikipedia to have on Wikisource? Anything British that all
those American projects haven't already covered? For us from
non-English speaking countries, it's far easier. Very little has
been digitized, so there is a lot to do. The most useful thing is
to digitize an old encyclopedia, just like that 11th edition of
Encyclopaedia Britannica (from 1911).
Now, encyclopedias are common items in used bookstores or online
auctions. You can buy 20 volumes for 200 euro, or even cheaper.
At that price, the best investment is a paper cutter (or ask a
print shop to help you) and a two-sided (duplex) sheet-feeding
scanner, such as the Canon DR-2050C or Fujitsu Scansnap s510.
OCR software might be included with the scanner. Or you can buy
for 160 euro.
The total investment would be less than 1000 euro (scanner + OCR
software + 20 volume encyclopedia + ask a print shop to cut the
spines). After this, you only need hours of volunteer work.
That's how I digitized the "New Student's Reference Work" (from
1914, 5 volumes, some 2500 pages) for Wikisource in 2005, only to
show that Wikisource could be used that way.
Here are some old pictures (with Swedish text, from 2001),
These scanners and that OCR software are not open source products,
but neither is my digital camera, and I use that to produce free
pictures for Wikimedia Commons. I know there have been many
attempts to make free OCR software, but is it any good?
In fact, if you have books where you can't afford to cut the
spine, maybe some rare thing that you only find in a library, a 10
megapixel digital camera is very useful. You need to experiment a
little with tripod stands and good lights. You only need to open
the book at 90 degrees, to get a good view of a page, which is
much friendlier to the book than an old flatbed scanner (and
faster). If you have two cameras, you can shoot left pages with
one, and right pages with the other. That's in fact how the
fastest modern "book scanners" work. Google builds their own, and
so can you. Some radical ideas are found on http://bkrpr.org/
Again, a pair of digital cameras is a total investment of less
than 1000 euro. That's a good starting range. You can achieve a
lot, and learn even more, in very little time.
What can you do with 2000 euro? Buy one sheet-feeding set, and
one pair of digital cameras. Let two teams compete against each
other. Write a report for next year's Wikimania. Have great fun!
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
Project Runeberg - free Nordic literature - http://runeberg.org/
Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/