A brief account about the Workshop as a participant.
The workshop was meant as a DIY digitization without having to invest in a
scanner and to use a simple digital camera for effective digitization of
books and documents.
The following were covered during the Workshop by Viswaprabha, who mainly
led this workshop, along with Subhashish (who was trained by Viswaprabha
beforehand) and Shiju Alex:
a) Best practices in capturing images using a camera and tripod through
demonstration; intro to types of scanners;
b) how to hold books and the need to treat old books with respect; (very
critical)
c) discussion on image formats and some basic comparison (i.e. djvu, PDF,
JPEG, TIFF, BMP, GIF);
d) Introduction and practical use of SM Tether (using Nikon dSLR) in
capturing images;
e) Practical demonstration of using Scan Tailor (a Free Software) in
post-processing of scanned pages. Splitting, Deskewing, Rearranging
borders, De-speckling of scanned pages; f) some basic discussion on
Copyright and introduction to Wiki Source;
g) importance of online archival resources (DLI, DSAL,
Archive.org, etc)
and when to do or not to redo scanning of books (i.e. image resolution)
that are already available in scanned format;
h) OCR and Indian languages.
Basically the workshop focused on how an 'ordinary' Wikimedian without
access to high tech infrastructure can effectively undertake digitization
of old books (especially in Indian languages) in a collaborative manner and
use Wiki Source to openly share this knowledge.
This was an impromptu workshop that was suddenly decided because of the
availability of Viswaprabha (WMIN EC member), who was in Bangalore for WMIN
EC meetings. It was also conceptualized as an Alpha stage presentation and
meant to be improved further. All the participants were asked to better the
presentation, in true Wiki style, Viswa made all the participants also the
authors.
I think it was a successful workshop, especially since I had personally
witnessed how much money various institutions and govt. are spending on
ineffective digitization. Such similar 'small-scale' workshops can
potentially empower interested Wikimedians and can create a decentralized,
effective, need-based collaborative digitization of books (our cultural
heritage), especially in Indian languages.
The workshop was collaboratively organized by WMIN and CIS-A2K and no
expenditure other than coffee and snacks was incurred. Going forward it
wold be useful to further improve this presentation collaboratively,
preferably by practice than by discourse. Also to train atleast 3-4
Wikimedians from each language community across India, who can further
impart this knowledge to others.
Best,
Vishnu
On 19 August 2013 21:00, L. Shyamal <lshyamal(a)gmail.com> wrote:
Thank you Subhashish for the response at:
http://meta.wikimedia.org/wiki/Talk:India_Access_To_Knowledge/Events/Bangal…
Dear Shyamal, this workshop was for demonstrating the participants about
create a home made set up to scan the books, edit the scanned images and
make an eBook. OCR is not still stable for Indic languages and would need
another workshop for presentation. Here the "creating text based documents"
is basically the manual editing on WikiSource. Please feel free to edit and
correct if any mistakes found. --Subhashish
Panigrahi<http://meta.wikimedia.org/wiki/User:Psubhashish>(
talk <http://meta.wikimedia.org/wiki/User_talk:Psubhashish>) 10:34, 19
August 2013 (UTC) That sounds too basic. Almost any office assistant
these days is expected to know how to use a scanner and create at least a
PDF. It would be good if you can post your proposed workshops. There are
really excellent (and local if I may add) resources who can be proposed by
the community. Shyamal <http://meta.wikimedia.org/wiki/User:Shyamal>
(
talk<http://meta.wikimedia.org/wiki/User_talk:Shyamal>)
15:00, 19 August 2013 (UTC) ----
To abstract a bit from this - may I suggest that any planned workshops
actually work out the topic coverage on-wiki so that the structure is
useful and the expectations are clear. Asking potential attendees for areas
of interest or expectations may also help. Since there appears to be a plan
to do this on an India-wide scale I think it would be good to include
several other topics. Some of the things that I have had to consider in the
course of scanning thousands of pages of documents for the Internet Archive
are included below. Hope others can add to it.
* Image formats - djvu, PDF comparisons, PDF/A - grey shade, black&white,
use of JPEG2000, TIFF
* Online archival options
* Digital libraries - also good to bring in people (librarians,
archivists) involved with Indian digital library projects - including the
ICAR
http://www.icarlibrary.nic.in/ DLI - (even if their website is
terrible
http://dli.ernet.in/ ) - other local digital library/archive
ventures
http://statelibrary.kerala.gov.in/rarebooks/ ,
http://dspace.gipe.ac.in/jspui/ )
* Book-scanners
* Home scanning tricks and solutions eg. how to scan large sources (such
as a newspaper that is bigger than the scanner bed)
* OCRs - use of layers, Indic language problems, approaches (is it
possible to crowdsource manual transcription, can the technology be
extended to allow people who do not know the language to help in manual
transcription)
best wishes
Shyamal
_______________________________________________
Wikimediaindia-l mailing list
Wikimediaindia-l(a)lists.wikimedia.org
To unsubscribe from the list / change mailing preferences visit
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l