I recently added two recommandations for improvement of MediaViewer here:
https://www.mediawiki.org/wiki/Multimedia/Media_Viewer#Recommendations_for_i...
These my pretty exotic requests listed into "Bugs" section (I don't know if I posted them into the right section):
* While supporting multiple page djvu, to implement a tool to copy into clipboard with a click from displayed djvu page: (1) pure text layer, (2) lisp-like mapped text layer structure at line or word detail, (3) xml mapped text-layer (Alex brollo) * While supporting multiple page djvu, to implement a tool to crop and download illustrations (as jpg or png) from image djvu layer of current page (Alex brollo)
The access to those two tools could enhance a lot, IMHO, full usability of data wrapped into our beloved djvu files. Some js tool could manage mapped djvu text with interesting results. Many from us can get the same data by a local djvuLibre application, but getting data from MediaViewer could avoid the use of local command-line applications and make easier access to those interesting, so far unused data for anyone.
Alex
Actually, I would rather have a tool that pulls apart djvu files as they are uploaded; keeping the text in WS and the pics in Commons On Oct 1, 2014, at 8:32 AM, Alex Brollo wrote:
I recently added two recommandations for improvement of MediaViewer here:
https://www.mediawiki.org/wiki/Multimedia/Media_Viewer#Recommendations_for_i...
These my pretty exotic requests listed into "Bugs" section (I don't know if I posted them into the right section):
- While supporting multiple page djvu, to implement a tool to copy into clipboard with a click from displayed djvu page: (1) pure text layer, (2) lisp-like mapped text layer structure at line or word detail, (3) xml mapped text-layer (Alex brollo)
- While supporting multiple page djvu, to implement a tool to crop and download illustrations (as jpg or png) from image djvu layer of current page (Alex brollo)
The access to those two tools could enhance a lot, IMHO, full usability of data wrapped into our beloved djvu files. Some js tool could manage mapped djvu text with interesting results. Many from us can get the same data by a local djvuLibre application, but getting data from MediaViewer could avoid the use of local command-line applications and make easier access to those interesting, so far unused data for anyone.
Alex _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
A djvu file is the wrong file type to be extracting an image, it is a jpg, and we all are aware of the lossy issues with jpgs and playing with those as images. There are better file types to be using to base an image, and archive.org has those available, so something that allowed easy extraction better quality images would be better. Personally I still use the archive.org image viewer, expand it to the largest size, and then right click to save the image from the browser.
Pulling apart a djvu file into text and image is also destroying what is integral to a djvu file.
An online cropping tool from a djvu would be interesting, and WinDjView already does that if you have the software. My issue is that I don't see the advantage of that in MV if we have extracted the work already, and it isn't hard to do it if it hasn't been extracted.
Regards, Billinghurst
On Wed, Oct 1, 2014 at 5:18 PM, Jane Darnell jane023@gmail.com wrote:
Actually, I would rather have a tool that pulls apart djvu files as they are uploaded; keeping the text in WS and the pics in Commons On Oct 1, 2014, at 8:32 AM, Alex Brollo wrote:
I recently added two recommandations for improvement of MediaViewer here:
https://www.mediawiki.org/wiki/Multimedia/Media_Viewer#Recommendations_for_i...
These my pretty exotic requests listed into "Bugs" section (I don't know if I posted them into the right section):
- While supporting multiple page djvu, to implement a tool to copy into
clipboard with a click from displayed djvu page: (1) pure text layer, (2) lisp-like mapped text layer structure at line or word detail, (3) xml mapped text-layer (Alex brollo)
- While supporting multiple page djvu, to implement a tool to crop and
download illustrations (as jpg or png) from image djvu layer of current page (Alex brollo)
The access to those two tools could enhance a lot, IMHO, full usability of data wrapped into our beloved djvu files. Some js tool could manage mapped djvu text with interesting results. Many from us can get the same data by a local djvuLibre application, but getting data from MediaViewer could avoid the use of local command-line applications and make easier access to those interesting, so far unused data for anyone.
Alex _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
2014-10-01 9:18 GMT+02:00 Jane Darnell jane023@gmail.com:
Actually, I would rather have a tool that pulls apart djvu files as they are uploaded; keeping the text in WS and the pics in Commons
This is very interesting since abbyy.xml files contain both a full detail (character by character) detail of text mapping & format, and coordinates of any not-textual content (illustrations) of the scanned page. Using appropriately such data, it would be possible to extract automatically illustrations and other graphical elements of pages. nevertheless, I saw that such "self-cropping" of illustration sometimes fails, and often is confused by some unusual format of illustrations/graphical element, so that many "illustrations" are nonsense or have to be cropped again. Unluckily, djvu files have no such "illustration coordinates" inside.
Alex
I have seen many messy text-image mixes on Google books, especially older texts from manual typesetting days. That's why I was wondering if it would be possible to have a tool that stores pages as you go, so you can step in and adjust it on a per page basis. I am not familiar with abbyy.xml files, but this may be the way to go
On Wed, Oct 1, 2014 at 2:18 PM, Alex Brollo alex.brollo@gmail.com wrote:
2014-10-01 9:18 GMT+02:00 Jane Darnell jane023@gmail.com:
Actually, I would rather have a tool that pulls apart djvu files as they are uploaded; keeping the text in WS and the pics in Commons
This is very interesting since abbyy.xml files contain both a full detail (character by character) detail of text mapping & format, and coordinates of any not-textual content (illustrations) of the scanned page. Using appropriately such data, it would be possible to extract automatically illustrations and other graphical elements of pages. nevertheless, I saw that such "self-cropping" of illustration sometimes fails, and often is confused by some unusual format of illustrations/graphical element, so that many "illustrations" are nonsense or have to be cropped again. Unluckily, djvu files have no such "illustration coordinates" inside.
Alex
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
2014-10-01 15:15 GMT+02:00 Jane Darnell jane023@gmail.com:
I have seen many messy text-image mixes on Google books, especially older texts from manual typesetting days. That's why I was wondering if it would be possible to have a tool that stores pages as you go, so you can step in and adjust it on a per page basis. I am not familiar with abbyy.xml files, but this may be the way to go
I burned out some millions of neurons while attempting to parse abbyy xml files, since I'm not a professional programmer, but what I vaguely saw and got is very, very exciting. Unluckily my scripts are so rough that can't be shared, but I'm certain that real programmers could get unbeliavable results from such tons of data. I found too values of certainty of OCR recognition for any character and for any word, so that uncertain words could be highlighted when imported... or passed to a recaptcha tool. But abbyy xml use would be a next step; what I'll like by now is simply mapped text layer from djvu files - made simple and useful for any wikisource user.
Alex
Alex Brollo, 01/10/2014 08:32:
- While supporting multiple page djvu, to implement a tool to copy into
clipboard with a click from displayed djvu page: (1) pure text layer, (2) lisp-like mapped text layer structure at line or word detail, (3) xml mapped text-layer (Alex brollo)
- While supporting multiple page djvu, to implement a tool to crop and
download illustrations (as jpg or png) from image djvu layer of current page (Alex brollo)
Do we have a list of crop tools somewhere, made with both Commons and Wikisource workflows in mind?
Nemo
Please note the order of my list of requests:
1. extracting mapped text, in list-like format (lighter, easier to obtain by two DjvuLibre routines, but a little tricky to converto into a js object), or/and in xml format; this is really simple to get by a server routine, and difficult to obtain for a usual user: 2. cropping images (all what's needed is a tool to crop jpg already produced by mediawiki software); this is not so important.
Yes, sometimes default jpg images are not the best but IMHO a low-resolution image, cropped and self-uploaded into Commons, could be very useful as a "placeholder" to a better one coming from full resolution images of the page (usually from original TIFF or derived JP2 collections into IA).
Alex
2014-10-01 13:43 GMT+02:00 Federico Leva (Nemo) nemowiki@gmail.com:
Alex Brollo, 01/10/2014 08:32:
- While supporting multiple page djvu, to implement a tool to copy into
clipboard with a click from displayed djvu page: (1) pure text layer, (2) lisp-like mapped text layer structure at line or word detail, (3) xml mapped text-layer (Alex brollo)
- While supporting multiple page djvu, to implement a tool to crop and
download illustrations (as jpg or png) from image djvu layer of current page (Alex brollo)
Do we have a list of crop tools somewhere, made with both Commons and Wikisource workflows in mind?
Nemo
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
wikisource-l@lists.wikimedia.org