[Wikisource-l] Which books to scan to support Wikipedia

21 Jul 2012

After Wikimania, I took a trip to Toronto, where the
Internet Archive has a large book scanning center.
If you have worked on Wikisource, I think you know that
many of the books there come from the Internet Archive,
which provides scanned images and OCR text in the form
of a Djvu file, which can be proofread and presented
in Wikisource.

I went there to learn more about how to scan books, but
another possibility is to use their existing facilities,
which currently are not used at full capacity.

The way the collaboration is set up, the room is provided
by the University of Toronto, but the equipment and staff
belong to the Internet Archive. The participating libraries
around Canada decide which books to digitize and pay a
small fee to the Internet Archive for scanning them.

This opens up an interesting opportunity. If we want a
particular book to be digitized, and we can find it in
the library catalogs of the University of Toronto, it
might be possible (in theory, at least) for us to present
this wish and perhaps provide the money needed, either
directly from the Wikimedia Foundation, or its chapters.
Canada is a country with many immigrants and the library
system has books in many languages.

This brings me to the question: Which books would be the
most important to scan, to help Wikipedia?

For the Swedish and Danish Wikipedia, I know this are the
out-of-copyright encyclopedias from the early 20th century,
that I have already digitized in 2003 and 2008, respectively.

The Czech "Ottuv slovnik" has already been digitized by Google,
as the Wikipedia article points out,
http://cs.wikipedia.org/wiki/Ott%C5%AFv_slovn%C3%ADk_nau%C4%8Dn%C3%BD
But are the black-and-white scans good enough, or would the
Czech community be interested in Internet Archive quality
scans in color? If so, we have to figure out if the Internet
Archive will scan it for free, if the UofT library will pay,
if the Czech chapter can pay (some chapters have money, but
are not allowed to send donated money abroad, because of tax
deductions), or perhaps the WMF. I saw the Ottuv slovnik on
the shelves, in the same building as the scanning stations.
It's just waiting for somebody to piece the puzzle together.

But let's begin with a wish list of which books to scan. Of
course they need to be out-of-copyright to fit in the
Internet Archive, Wikimedia Commons, and in Wikisource.
Perhaps illustrated works are more interesting than text?

This would be a GLAM + wiki cooperation that cuts across
national borders.

-- 
   Lars Aronsson (lars(a)aronsson.se)
   Project Runeberg - free Nordic literature - http://runeberg.org/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikisource-l] Which books to scan to support Wikipedia