Hi, I have setup a new OCR service on tools.wmflabs.org, it provides through some javascript hosted on wikisource.org location data of words for djvu/pdf Index:. It can be used by adding
mw.loader.load('//wikisource.org/w/index.php?title=MediaWiki:Hocr.js&action=raw&ctype=text/javascript&dontcountme=s');
to your site wide MediaWiki:Common.js or to your own User Common.js, the script works in Page: namespace, in edit or view mode. There is no user interface except double click on a word should highlight the words on the scan. I found it very useful for encyclopedia when it can be time consuming to retrieve the possition of words on the image.
As the ocr and profread text are always different, the location of word is often shifted by one or more word, location provided is only approximate.
Hello Phil, Thanks for the script! Our MediaWiki:common.js already has a line reading: mw.loader.load('// wikisource.org/w/index.php?title=MediaWiki:OCR.js&action=raw&ctype=text/javascript' ); Should I replace it with your line or keep it and just add yours? Thanks!
On Thu, Aug 21, 2014 at 1:07 PM, Philippe Elie phil.el@free.fr wrote:
Hi, I have setup a new OCR service on tools.wmflabs.org, it provides through some javascript hosted on wikisource.org location data of words for djvu/pdf Index:. It can be used by adding
mw.loader.load('// wikisource.org/w/index.php?title=MediaWiki:Hocr.js&action=raw&ctype=text/javascript&dontcountme=s' );
to your site wide MediaWiki:Common.js or to your own User Common.js, the script works in Page: namespace, in edit or view mode. There is no user interface except double click on a word should highlight the words on the scan. I found it very useful for encyclopedia when it can be time consuming to retrieve the possition of words on the image.
As the ocr and profread text are always different, the location of word is often shifted by one or more word, location provided is only approximate.
-- Phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
On Thu, 21 Aug 2014 at 15:03 +0300, Nahum Wengrov wrote:
Hello Phil, Thanks for the script! Our MediaWiki:common.js already has a line reading: mw.loader.load('// wikisource.org/w/index.php?title=MediaWiki:OCR.js&action=raw&ctype=text/javascript' ); Should I replace it with your line or keep it and just add yours? Thanks!
Keep it and add the new, it's two different service.
I forget to say but asking an ocr with the OCR button, should be way faster nowadays as I ocr'ed wikisource ~24000 books during the last weeks.
wikisource-l@lists.wikimedia.org