Dear all, Wikimedia Italia put in its budget 3000€ for Wikisource-related work. When we discussed this, months ago, we thought about paying a developer for the DJVU issue of the IA-Upload tool, which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some development is still needed, but I'd ask you (especially technically skilled people like Tpt, Sam, Philippe, etc.) if you think there is some low-hanging fruit that could be reached with that kind of budget. Of course, we will be looking for developers, so if you want to propose yourself for something, please do! ;-)
Aubrey
On Wed, 10 May 2017 at 15:38 +0200, Andrea Zanni wrote:
Dear all, Wikimedia Italia put in its budget 3000€ for Wikisource-related work. When we discussed this, months ago, we thought about paying a developer for the DJVU issue of the IA-Upload tool, which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some development is still needed, but I'd ask you (especially technically skilled people like Tpt, Sam, Philippe, etc.) if you think there is some low-hanging fruit that could be reached with that kind of budget. Of course, we will be looking for developers, so if you want to propose yourself for something, please do! ;-)
It's possible Tpt, me and perhaps Sam will be interested but the scope of the work is yet unclear. Have you some recent example of errors?
You can check in the queue that a lot of processes just freeze: es. https://tools.wmflabs.org/ia-upload/log/bullettinodella04italgoog
Also, there is an issue with HTML tags: sometime they are present in the IA description, and this means they are copied also in the Commons Book template during the workflow. When that happens, you get an error before uploading the book on Commons.
Aubrey
On Wed, May 10, 2017 at 4:54 PM, Philippe Elie phil.el@free.fr wrote:
On Wed, 10 May 2017 at 15:38 +0200, Andrea Zanni wrote:
Dear all, Wikimedia Italia put in its budget 3000€ for Wikisource-related work. When we discussed this, months ago, we thought about paying a developer
for
the DJVU issue of the IA-Upload tool, which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some development is still needed, but I'd ask you (especially technically skilled people like Tpt, Sam, Philippe, etc.) if you think there is some low-hanging fruit that could be reached with that kind of budget. Of course, we will be looking for developers, so if you want to propose yourself for something, please do! ;-)
It's possible Tpt, me and perhaps Sam will be interested but the scope of the work is yet unclear. Have you some recent example of errors?
-- Phe
On Wed, 10 May 2017 at 17:14 +0200, Andrea Zanni wrote:
You can check in the queue that a lot of processes just freeze: es. https://tools.wmflabs.org/ia-upload/log/bullettinodella04italgoog
Also, there is an issue with HTML tags: sometime they are present in the IA description, and this means they are copied also in the Commons Book template during the workflow. When that happens, you get an error before uploading the book on Commons.
Aubrey
There isn't also a trend when converting from jp2 --> pdf to produce too big djvu?
It may be. Not sure how Sam and Tpt solved that issue.
Aubrey
On Wed, May 10, 2017 at 6:01 PM, Philippe Elie phil.el@free.fr wrote:
On Wed, 10 May 2017 at 18:00 +0200, Andrea Zanni wrote:
There isn't also a trend when converting from jp2 --> pdf to produce too big djvu?
May you please explain it better? I don't understand.
Aren't djvu produced often too big?
-- Phe
Changing the topic as the conversation has diverged.
Not sure how Sam and Tpt solved that issue.
It's not solved yet at my knowledge.
Thomas
Le 10 mai 2017 à 18:03, Andrea Zanni zanni.andrea84@gmail.com a écrit :
It may be. Not sure how Sam and Tpt solved that issue.
Aubrey
On Wed, May 10, 2017 at 6:01 PM, Philippe Elie phil.el@free.fr wrote: On Wed, 10 May 2017 at 18:00 +0200, Andrea Zanni wrote:
There isn't also a trend when converting from jp2 --> pdf to produce too big djvu?
May you please explain it better? I don't understand.
Aren't djvu produced often too big?
-- Phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Yeah, it's still not fixed for books of more than about 500 pages. :-(
But it's on my list to work on! Along with https://phabricator.wikimedia.org/T159796, which hopefully will be before the hackathon next week. I've been having some dramas with getting JP2 things working on my new computer...
Unfortunately, at the moment, xtools is taking priority.
—sam
PS For and IA Upload bugs, feel free to add the community-tech tag in Phabricator, so they get a bit more visibility.
On Thu, 11 May 2017, at 12:07 AM, Thomas PT wrote:
Changing the topic as the conversation has diverged.
Not sure how Sam and Tpt solved that issue.
It's not solved yet at my knowledge.
Thomas
Le 10 mai 2017 à 18:03, Andrea Zanni zanni.andrea84@gmail.com a écrit :
It may be. Not sure how Sam and Tpt solved that issue.
Aubrey
On Wed, May 10, 2017 at 6:01 PM, Philippe Elie phil.el@free.fr wrote: On Wed, 10 May 2017 at 18:00 +0200, Andrea Zanni wrote:
There isn't also a trend when converting from jp2 --> pdf to produce too big djvu?
May you please explain it better? I don't understand.
Aren't djvu produced often too big?
-- Phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l Email had 1 attachment:
- signature.asc 1k (application/pgp-signature)
This is very cool news. :)
One possibly not-too-onerous feature would be to permit upload of other file types other than DjVu (e.g. PDF). Or there's the whole topic of creating/finding Wikidata items for the books uploaded, and updating them with the IA identifier. That'd probably require the uploading user to specify a Wikidata ID though — which is what the {{book}} template on Commons should work from anyway, in my opinion (because it can't be done via a sitelink). I'm very happy to help with whatever I can!
—sam
On Wed, 10 May 2017, at 09:38 PM, Andrea Zanni wrote:
Dear all, Wikimedia Italia put in its budget 3000€ for Wikisource-related work.> When we discussed this, months ago, we thought about paying a developer for> the DJVU issue of the IA-Upload tool, which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some development is still needed, but I'd ask you (especially technically skilled people like Tpt, Sam, Philippe, etc.) if you think there is some low-hanging fruit that could be reached with that kind of budget.> Of course, we will be looking for developers, so if you want to propose yourself for something, please do! ;-)> Aubrey
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Hello everyone, before talking again about this let me say that I think we have a "major" bug in the IA-upload: sometimes, the OCR is not aligned between the pages, meaning you have the right OCR but it's shown for the following page...
Aubrey
On Thu, May 11, 2017 at 1:30 AM, Sam Wilson sam@samwilson.id.au wrote:
This is very cool news. :)
One possibly not-too-onerous feature would be to permit upload of other file types other than DjVu (e.g. PDF). Or there's the whole topic of creating/finding Wikidata items for the books uploaded, and updating them with the IA identifier. That'd probably require the uploading user to specify a Wikidata ID though — which is what the {{book}} template on Commons should work from anyway, in my opinion (because it can't be done via a sitelink).
I'm very happy to help with whatever I can!
—sam
On Wed, 10 May 2017, at 09:38 PM, Andrea Zanni wrote:
Dear all, Wikimedia Italia put in its budget 3000€ for Wikisource-related work. When we discussed this, months ago, we thought about paying a developer for the DJVU issue of the IA-Upload tool, which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some development is still needed, but I'd ask you (especially technically skilled people like Tpt, Sam, Philippe, etc.) if you think there is some low-hanging fruit that could be reached with that kind of budget. Of course, we will be looking for developers, so if you want to propose yourself for something, please do! ;-)
Aubrey
*_______________________________________________* Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
This is indeed a bug! I can't replicate it though. Does it happen for every book for you? Or only sometimes? Do you know what is different about the ones that fail? Is it related to removing (or not) the Google cover page? I can find time this weekend I think, to work on this.
On Fri, 30 Jun 2017, at 03:23 PM, Andrea Zanni wrote:
Hello everyone, before talking again about this let me say that I think we have a "major" bug in the IA-upload:> sometimes, the OCR is not aligned between the pages, meaning you have the right OCR but it's shown for the following page...> Aubrey
On Thu, May 11, 2017 at 1:30 AM, Sam Wilson sam@samwilson.id.au wrote:>> __
This is very cool news. :)
One possibly not-too-onerous feature would be to permit upload of other file types other than DjVu (e.g. PDF). Or there's the whole topic of creating/finding Wikidata items for the books uploaded, and updating them with the IA identifier. That'd probably require the uploading user to specify a Wikidata ID though — which is what the {{book}} template on Commons should work from anyway, in my opinion (because it can't be done via a sitelink).>> I'm very happy to help with whatever I can!
—sam
On Wed, 10 May 2017, at 09:38 PM, Andrea Zanni wrote:
Dear all, Wikimedia Italia put in its budget 3000€ for Wikisource- related work.>>> When we discussed this, months ago, we thought about paying a developer for>>> the DJVU issue of the IA-Upload tool, which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some development is still needed, but I'd ask you (especially technically skilled people like Tpt, Sam, Philippe, etc.) if you think there is some low-hanging fruit that could be reached with that kind of budget.>>> Of course, we will be looking for developers, so if you want to propose yourself for something, please do! ;-)>>> Aubrey
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Unfortunately, sometimes, and apparently it's not related to the Google cover page (at least, I removed a page in a book and it doesn't have the problem. Another book indeed is disaligned, without removing the cover).
Look this: https://it.wikisource.org/wiki/Indice:Decio_Albini_-_La_spedizione_di_Sapri,...
On Fri, Jun 30, 2017 at 10:00 AM, Sam Wilson sam@archives.org.au wrote:
This is indeed a bug! I can't replicate it though. Does it happen for every book for you? Or only sometimes? Do you know what is different about the ones that fail? Is it related to removing (or not) the Google cover page?
I can find time this weekend I think, to work on this.
On Fri, 30 Jun 2017, at 03:23 PM, Andrea Zanni wrote:
Hello everyone, before talking again about this let me say that I think we have a "major" bug in the IA-upload: sometimes, the OCR is not aligned between the pages, meaning you have the right OCR but it's shown for the following page... Aubrey
On Thu, May 11, 2017 at 1:30 AM, Sam Wilson sam@samwilson.id.au wrote:
This is very cool news. :)
One possibly not-too-onerous feature would be to permit upload of other file types other than DjVu (e.g. PDF). Or there's the whole topic of creating/finding Wikidata items for the books uploaded, and updating them with the IA identifier. That'd probably require the uploading user to specify a Wikidata ID though — which is what the {{book}} template on Commons should work from anyway, in my opinion (because it can't be done via a sitelink).
I'm very happy to help with whatever I can!
—sam
On Wed, 10 May 2017, at 09:38 PM, Andrea Zanni wrote:
Dear all, Wikimedia Italia put in its budget 3000€ for Wikisource-related work. When we discussed this, months ago, we thought about paying a developer for the DJVU issue of the IA-Upload tool, which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some development is still needed, but I'd ask you (especially technically skilled people like Tpt, Sam, Philippe, etc.) if you think there is some low-hanging fruit that could be reached with that kind of budget. Of course, we will be looking for developers, so if you want to propose yourself for something, please do! ;-)
Aubrey
*_______________________________________________* Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
*_______________________________________________* Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Take a look to this case: https://archive.org/details/GiacomoRacioppiLAgiografiaDiSanLaverioDel1162Ima...
Here OCR (as you can see from _djvu.xml file) seems severely bugged, and obviously djvu file built by IA Upload tool can't be better than source.
Please Aubrey go on notifying me any case of faulty djvu coming from IA or coming from IA files used by IA Upload tool.
Alex
2017-06-30 10:10 GMT+02:00 Andrea Zanni zanni.andrea84@gmail.com:
Unfortunately, sometimes, and apparently it's not related to the Google cover page (at least, I removed a page in a book and it doesn't have the problem. Another book indeed is disaligned, without removing the cover).
Look this: https://it.wikisource.org/wiki/Indice:Decio_Albini_-_La_ spedizione_di_Sapri,_Tip._delle_Terme_diocleziane_di_G._ Balbi,_Roma_1891.djvu
On Fri, Jun 30, 2017 at 10:00 AM, Sam Wilson sam@archives.org.au wrote:
This is indeed a bug! I can't replicate it though. Does it happen for every book for you? Or only sometimes? Do you know what is different about the ones that fail? Is it related to removing (or not) the Google cover page?
I can find time this weekend I think, to work on this.
On Fri, 30 Jun 2017, at 03:23 PM, Andrea Zanni wrote:
Hello everyone, before talking again about this let me say that I think we have a "major" bug in the IA-upload: sometimes, the OCR is not aligned between the pages, meaning you have the right OCR but it's shown for the following page... Aubrey
On Thu, May 11, 2017 at 1:30 AM, Sam Wilson sam@samwilson.id.au wrote:
This is very cool news. :)
One possibly not-too-onerous feature would be to permit upload of other file types other than DjVu (e.g. PDF). Or there's the whole topic of creating/finding Wikidata items for the books uploaded, and updating them with the IA identifier. That'd probably require the uploading user to specify a Wikidata ID though — which is what the {{book}} template on Commons should work from anyway, in my opinion (because it can't be done via a sitelink).
I'm very happy to help with whatever I can!
—sam
On Wed, 10 May 2017, at 09:38 PM, Andrea Zanni wrote:
Dear all, Wikimedia Italia put in its budget 3000€ for Wikisource-related work. When we discussed this, months ago, we thought about paying a developer for the DJVU issue of the IA-Upload tool, which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some development is still needed, but I'd ask you (especially technically skilled people like Tpt, Sam, Philippe, etc.) if you think there is some low-hanging fruit that could be reached with that kind of budget. Of course, we will be looking for developers, so if you want to propose yourself for something, please do! ;-)
Aubrey
*_______________________________________________* Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
*_______________________________________________* Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Opppss... I *presume* that _djvu.xml is bugged, really I only examined whole text file (deved, I think, from _djvu.xml file). I'll take a deeper look, examining too searchable PDF.
Alex
2017-06-30 12:20 GMT+02:00 Alex Brollo alex.brollo@gmail.com:
Take a look to this case: https://archive.org/details/ GiacomoRacioppiLAgiografiaDiSanLaverioDel1162Images
Here OCR (as you can see from _djvu.xml file) seems severely bugged, and obviously djvu file built by IA Upload tool can't be better than source.
Please Aubrey go on notifying me any case of faulty djvu coming from IA or coming from IA files used by IA Upload tool.
Alex
2017-06-30 10:10 GMT+02:00 Andrea Zanni zanni.andrea84@gmail.com:
Unfortunately, sometimes, and apparently it's not related to the Google cover page (at least, I removed a page in a book and it doesn't have the problem. Another book indeed is disaligned, without removing the cover).
Look this: https://it.wikisource.org/wiki/Indice:Decio_Albini_-_La_sped izione_di_Sapri,_Tip._delle_Terme_diocleziane_di_G._Balbi,_Roma_1891.djvu
On Fri, Jun 30, 2017 at 10:00 AM, Sam Wilson sam@archives.org.au wrote:
This is indeed a bug! I can't replicate it though. Does it happen for every book for you? Or only sometimes? Do you know what is different about the ones that fail? Is it related to removing (or not) the Google cover page?
I can find time this weekend I think, to work on this.
On Fri, 30 Jun 2017, at 03:23 PM, Andrea Zanni wrote:
Hello everyone, before talking again about this let me say that I think we have a "major" bug in the IA-upload: sometimes, the OCR is not aligned between the pages, meaning you have the right OCR but it's shown for the following page... Aubrey
On Thu, May 11, 2017 at 1:30 AM, Sam Wilson sam@samwilson.id.au wrote:
This is very cool news. :)
One possibly not-too-onerous feature would be to permit upload of other file types other than DjVu (e.g. PDF). Or there's the whole topic of creating/finding Wikidata items for the books uploaded, and updating them with the IA identifier. That'd probably require the uploading user to specify a Wikidata ID though — which is what the {{book}} template on Commons should work from anyway, in my opinion (because it can't be done via a sitelink).
I'm very happy to help with whatever I can!
—sam
On Wed, 10 May 2017, at 09:38 PM, Andrea Zanni wrote:
Dear all, Wikimedia Italia put in its budget 3000€ for Wikisource-related work. When we discussed this, months ago, we thought about paying a developer for the DJVU issue of the IA-Upload tool, which then has been resolved by our beloved Sam Wilson.
The tool is still not perfect (I often get errors), so maybe some development is still needed, but I'd ask you (especially technically skilled people like Tpt, Sam, Philippe, etc.) if you think there is some low-hanging fruit that could be reached with that kind of budget. Of course, we will be looking for developers, so if you want to propose yourself for something, please do! ;-)
Aubrey
*_______________________________________________* Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
*_______________________________________________* Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
wikisource-l@lists.wikimedia.org