A brief account about the Workshop as a participant.

The workshop was meant as a DIY digitization without having to invest in a scanner and to use a simple digital camera for effective digitization of books and documents.

The following were covered during the Workshop by Viswaprabha, who mainly led this workshop, along with Subhashish (who was trained by Viswaprabha beforehand) and Shiju Alex:
a) Best practices in capturing images using a camera and tripod through demonstration; intro to types of scanners;
b) how to hold books and the need to treat old books with respect; (very critical)
c) discussion on image formats and some basic comparison (i.e. djvu, PDF, JPEG, TIFF, BMP, GIF);
d) Introduction and practical use of SM Tether (using Nikon dSLR) in capturing images;
e) Practical demonstration of using Scan Tailor (a Free Software) in post-processing of scanned pages. Splitting, Deskewing, Rearranging borders, De-speckling of scanned pages; f) some basic discussion on Copyright and introduction to Wiki Source;
g) importance of online archival resources (DLI, DSAL, Archive.org, etc) and when to do or not to redo scanning of books (i.e. image resolution) that are already available in scanned format;
h) OCR and Indian languages.

Basically the workshop focused on how an 'ordinary' Wikimedian without access to high tech infrastructure can effectively undertake digitization of old books (especially in Indian languages) in a collaborative manner and use Wiki Source to openly share this knowledge.

This was an impromptu workshop that was suddenly decided because of the availability of Viswaprabha (WMIN EC member), who was in Bangalore for WMIN EC meetings. It was also conceptualized as an Alpha stage presentation and meant to be improved further. All the participants were asked to better the presentation, in true Wiki style, Viswa made all the participants also the authors.

I think it was a successful workshop, especially since I had personally witnessed how much money various institutions and govt. are spending on ineffective digitization.  Such similar 'small-scale' workshops can potentially empower interested Wikimedians and can create a decentralized, effective, need-based collaborative digitization of books (our cultural heritage), especially in Indian languages.

The workshop was collaboratively organized by WMIN and CIS-A2K and no expenditure other than coffee and snacks was incurred.  Going forward it wold be useful to further improve this presentation collaboratively, preferably by practice than by discourse. Also to train atleast 3-4 Wikimedians from each language community across India, who can further impart this knowledge to others.

Best,
Vishnu


On 19 August 2013 21:00, L. Shyamal <lshyamal@gmail.com> wrote:
Thank you Subhashish for the response at:
http://meta.wikimedia.org/wiki/Talk:India_Access_To_Knowledge/Events/Bangalore/Digitization_workshop_18August2013

Dear Shyamal, this workshop was for demonstrating the participants about create a home made set up to scan the books, edit the scanned images and make an eBook. OCR is not still stable for Indic languages and would need another workshop for presentation. Here the "creating text based documents" is basically the manual editing on WikiSource. Please feel free to edit and correct if any mistakes found. --Subhashish Panigrahi (talk) 10:34, 19 August 2013 (UTC)
That sounds too basic. Almost any office assistant these days is expected to know how to use a scanner and create at least a PDF. It would be good if you can post your proposed workshops. There are really excellent (and local if I may add) resources who can be proposed by the community. Shyamal (talk) 15:00, 19 August 2013 (UTC)
----
To abstract a bit from this - may I suggest that any planned workshops actually work out the topic coverage on-wiki so that the structure is useful and the expectations are clear. Asking potential attendees for areas of interest or expectations may also help. Since there appears to be a plan to do this on an India-wide scale I think it would be good to include several other topics. Some of the things that I have had to consider in the course of scanning thousands of pages of documents for the Internet Archive are included below. Hope others can add to it.

* Image formats - djvu, PDF comparisons, PDF/A - grey shade, black&white, use of JPEG2000, TIFF
* Online archival options
* Digital libraries - also good to bring in people (librarians, archivists) involved with Indian digital library projects - including the ICAR http://www.icarlibrary.nic.in/ DLI - (even if their website is terrible http://dli.ernet.in/ ) - other local digital library/archive ventures http://statelibrary.kerala.gov.in/rarebooks/ , http://dspace.gipe.ac.in/jspui/ )
* Book-scanners
* Home scanning tricks and solutions eg. how to scan large sources (such as a newspaper that is bigger than the scanner bed)
* OCRs - use of layers, Indic language problems, approaches (is it possible to crowdsource manual transcription, can the technology be extended to allow people who do not know the language to help in manual transcription)

best wishes
Shyamal

_______________________________________________
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l