[Labs-l] Image manipulation tools into Labs

Alex Brollo alex.brollo at gmail.com
Wed Feb 5 22:39:22 UTC 2014


Thanks Jeremy, as you imagine it's a typical wikisource idea.

Here: http://www.opal.unito.it/ there's a large collection of free scans
from ancient Italian books published as double-page pdf. The idea is to
uploade them into Internet Archive, but presently IA OCR doesn't self-split
pages; I can't upload pdfs as they are; so I'm testing routines to extract
tiff/jpg images from pdf, to split them (by python PIL)  and  to wrap them
into zip files, so that they can be uploaded into IA.

As soon as IA derives the files, both wikisource and the whole web can find
"a done job", and can use resulting serchable pdf file or djvu or any other
derived file.

OPAL shares thousands of rare book, so any help by automated routines makes
the difference.

This is a big (perhaps, too big) challenge for my present limited skills,
but I found that I learn only from similar "missions impossible" :-)

Alex


2014-02-05 Jeremy Baron <jeremy at tuxmachine.com>:

> On Feb 5, 2014 3:11 AM, "Alex Brollo" <alex.brollo at gmail.com> wrote:
> > Just to avoid "rediscovering the wheel", is someone doing something
> similar into Labs?
>
> I'm unsure what your goal/purpose is but
> https://wikimania2012.wikimedia.org/wiki/Submissions/Open_Access_Media_Importerseems relevant. (git repo linked from there)
>
> -Jeremy
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/labs-l/attachments/20140205/5e91dd93/attachment.html>


More information about the Labs-l mailing list