I've been tinkering with the ia-upload tool and incorporating Alex Brollo's better system of DjVu generation (better than converting from PDF, that is; instead it works from the original Jpeg2000 files and merges the OCR data in).
I've set up a test installation of the tool at http://tools.wmflabs.org/ia-upload/test/ and would love anyone to have a go at it, and to report any bugs at https://github.com/wikisource/ia-upload/issues
Because DjVu generation can take a while (quite a while if you've got a crappy slow laptop like me), the tool runs each job on the grid engine, starting every 5 minutes. The queue is shown on the homepage of the tool, with a status of each job. (Unless you're just re-using an existing DjVu file from the IA, in which case it's just uploaded directly to Commons while you wait, like the tool's always done.)
Thanks!
This new feature is now live on the ia-upload tool: http://tools.wmflabs.org/ia-upload/ Please raise any issues on Github: https://github.com/wikisource/ia-upload/issues
The conversion process takes about 15 minutes for most books, it seems like. (For books that already have DjVus at IA, it uploads them immediately though.)
Thanks, Sam.
On Thu, 2 Feb 2017, at 09:33 AM, Sam Wilson wrote:
I've been tinkering with the ia-upload tool and incorporating Alex Brollo's better system of DjVu generation (better than converting from PDF, that is; instead it works from the original Jpeg2000 files and merges the OCR data in).
I've set up a test installation of the tool at http://tools.wmflabs.org/ia-upload/test/ and would love anyone to have a go at it, and to report any bugs at https://github.com/wikisource/ia-upload/issues
Because DjVu generation can take a while (quite a while if you've got a crappy slow laptop like me), the tool runs each job on the grid engine, starting every 5 minutes. The queue is shown on the homepage of the tool, with a status of each job. (Unless you're just re-using an existing DjVu file from the IA, in which case it's just uploaded directly to Commons while you wait, like the tool's always done.)
Thanks!
Thanks Sam! Now we should focus on help about requisites of a good, wikisource-oriented IA upload: proper scan quality, good file names and useful metadata. IMHO it would be great to build a "wikisource collection" into IA, since collection admins can edit any item detail but its ID, and fix most mistakes.
Alex
2017-02-09 4:10 GMT+01:00 Sam Wilson sam@samwilson.id.au:
This new feature is now live on the ia-upload tool: http://tools.wmflabs.org/ia-upload/ Please raise any issues on Github: https://github.com/wikisource/ia-upload/issues
The conversion process takes about 15 minutes for most books, it seems like. (For books that already have DjVus at IA, it uploads them immediately though.)
Thanks, Sam.
On Thu, 2 Feb 2017, at 09:33 AM, Sam Wilson wrote:
I've been tinkering with the ia-upload tool and incorporating Alex Brollo's better system of DjVu generation (better than converting from PDF, that is; instead it works from the original Jpeg2000 files and merges the OCR data in).
I've set up a test installation of the tool at http://tools.wmflabs.org/ia-upload/test/ and would love anyone to have a go at it, and to report any bugs at https://github.com/wikisource/ia-upload/issues
Because DjVu generation can take a while (quite a while if you've got a crappy slow laptop like me), the tool runs each job on the grid engine, starting every 5 minutes. The queue is shown on the homepage of the tool, with a status of each job. (Unless you're just re-using an existing DjVu file from the IA, in which case it's just uploaded directly to Commons while you wait, like the tool's always done.)
Thanks!
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
On Thu, 9 Feb 2017, at 03:13 PM, Alex Brollo wrote:
Thanks Sam!
Now we should focus on help about requisites of a good, wikisource- oriented IA upload: proper scan quality, good file names and useful metadata. IMHO it would be great to build a "wikisource collection" into IA, since collection admins can edit any item detail but its ID, and fix most mistakes.
That sounds like a great idea! So it sounds like[1] we need to have 50 items already uploaded before they'll create a collection for us. Then, maybe we build it into ia-upload: a way of uploading and setting metadata for a set of scan files? It would upload files to IA and then do the DjVu-creating thing and upload just the DjVu to Commons?
Or do people upload to Commons first? And then our tool takes a file (or category of files), uploads it to IA, and then pulls the DjVu back from there and adds it to the same category?
(I'm sort of thinking aloud...)
Links:
Hi everyone, I made this, hopefully is helful: https://docs.google.com/spreadsheets/d/158GvBrPBW0KfREHRmLFK7EhuB-FQBkLbm9qx...
It's the list of the files on Commons uploaded from Internet Archive. The idea, right now, is that every language Wikisource would take care of their uploads, and when they are more than 50 they create a "Italian/German/Bengali Wikisource", collection on Internet Archive. The whole set of collections will be inside one "Wikisource" global collection.
Make sense? Do you agree?
On Thu, Feb 9, 2017 at 8:38 AM, Sam Wilson sam@samwilson.id.au wrote:
On Thu, 9 Feb 2017, at 03:13 PM, Alex Brollo wrote:
Thanks Sam! Now we should focus on help about requisites of a good, wikisource-oriented IA upload: proper scan quality, good file names and useful metadata. IMHO it would be great to build a "wikisource collection" into IA, since collection admins can edit any item detail but its ID, and fix most mistakes.
That sounds like a great idea! So it sounds like https://archive.org/about/faqs.php#Collections we need to have 50 items already uploaded before they'll create a collection for us. Then, maybe we build it into ia-upload: a way of uploading and setting metadata for a set of scan files? It would upload files to IA and then do the DjVu-creating thing and upload just the DjVu to Commons?
Or do people upload to Commons first? And then our tool takes a file (or category of files), uploads it to IA, and then pulls the DjVu back from there and adds it to the same category?
(I'm sort of thinking aloud...)
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
That's a great idea!
I think we can use Wikidata to build the list: http://tinyurl.com/zwdbzyq
I had been erroneously thinking along the lines that we'd have to be uploading something to the items before making it part of a Wikisource collection, but of course that's not necessary. I think your hierarchy of wikisource collections sounds perfect.
It'd be cool if items with a page on a Wikisource could have a little footnote like they do for Open Library ones ("[Open Library icon]This book has an editable web page[1] on Open Library[2].).
—sam
On Sun, 12 Feb 2017, at 08:17 PM, Andrea Zanni wrote:
Hi everyone,
I made this, hopefully is helful:
https://docs.google.com/spreadsheets/d/158GvBrPBW0KfREHRmLFK7EhuB-FQBkLbm9qx...
It's the list of the files on Commons uploaded from Internet Archive. The idea, right now, is that every language Wikisource would take care of their uploads, and when they are more than 50 they create a "Italian/German/Bengali Wikisource", collection on Internet Archive.
The whole set of collections will be inside one "Wikisource" global collection. Make sense? Do you agree?
On Thu, Feb 9, 2017 at 8:38 AM, Sam Wilson sam@samwilson.id.au wrote:
__
On Thu, 9 Feb 2017, at 03:13 PM, Alex Brollo wrote:
Thanks Sam!
Now we should focus on help about requisites of a good, wikisource- oriented IA upload: proper scan quality, good file names and useful metadata. IMHO it would be great to build a "wikisource collection" into IA, since collection admins can edit any item detail but its ID, and fix most mistakes.
That sounds like a great idea! So it sounds like[3] we need to have 50 items already uploaded before they'll create a collection for us. Then, maybe we build it into ia-upload: a way of uploading and setting metadata for a set of scan files? It would upload files to IA and then do the DjVu-creating thing and upload just the DjVu to Commons?
Or do people upload to Commons first? And then our tool takes a file (or category of files), uploads it to IA, and then pulls the DjVu back from there and adds it to the same category?
(I'm sort of thinking aloud...)
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
Links:
1. http://openlibrary.org/ia/thatremystre00gaut 2. https://openlibrary.org/ 3. https://archive.org/about/faqs.php#Collections
On Mon, Feb 13, 2017 at 1:59 AM, Sam Wilson sam@samwilson.id.au wrote:
That's a great idea! I think we can use Wikidata to build the list:
Probably, en.source is the only one who has filled in all Wikisource data inside Wikidata... Or other Wikisources did that? Do you have some workflow to share?
I had been erroneously thinking along the lines that we'd have to be uploading something to the items before making it part of a Wikisource collection, but of course that's not necessary. I think your hierarchy of wikisource collections sounds perfect.
perfect.
It'd be cool if items with a page on a Wikisource could have a little footnote like they do for Open Library ones ("[image: [Open Library icon]] https://openlibrary.orgThis book has an editable web page http://openlibrary.org/ia/thatremystre00gaut on Open Library https://openlibrary.org/.).
We can try to convince them about that. It'd be only for a fraction of books, few thousands over the millions they have.
wikisource-l@lists.wikimedia.org