Wikisource-l October 2015

wikisource-l@lists.wikimedia.org

14 participants
11 discussions

Fwd: [Wikimediaindia-l] Google's Optical Character Recognition software now works with all South Asian languages

by Jayanta Nath

Hi All, I have checked for Bengali Images, its works fine with 100% accuracy. Any how can it be implemented in Proofread extension? Regards, Jayanta ---------- Forwarded message ---------- From: Subhashish Panigrahi <subhashish(a)cis-india.org> Date: Sat, Aug 29, 2015 at 3:22 PM Subject: [Wikimediaindia-l] Google's Optical Character Recognition software now works with all South Asian languages To: wikimediaindia-l(a)lists.wikimedia.org -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Google's OCR which apparently is most accurate OCR we have seen so far, works really good for all the major South Asian scripts: http://globalvoicesonline.org/2015/08/29/googles-optical-character-recog nition-software-now-works-with-all-south-asian-languages Here are test cases of many Indian scripts: https://goo.gl/3X75iR. Except Gurmukhi most scripts are working really good. This could be really useful for Indian language Wikimedians and will come handy for digitization of printed and scanned text. Here is an animated tutorial for Wikimedians to use this tool for Wikisource/Wikipedia: https://commons.wikimedia.org/wiki/File:Tutorial_to_use_Google_Optical_C haracter_Recognition.gif Please write to me if anyone wants to localize this tutorial in your language. - -- Best! Subhashish Panigrahi Programme Officer, Access To Knowledge Centre for Internet and Society @subhapa / https://cis-india.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJV4YD0AAoJEHThehXZGxGO9ywP/RcJOXB3tFHJNF03X23x1jkY vffu+1Iob6kLMZt/JD3nTmpXasXDlme6pbGzaT7/YZsC0VouN+4NE9HoEmZAksJF 3nn7HoEive4mDalXH5qyATOilezqIEYOG2c32LVYHnX6Co+fXPVa5WqsHn5js957 OionIc5t0V9zlGB6e5RLOacPWXsAhXyVunaeY6Ma33cOWHFdVnu1XpUGphJ+miVj EWszTzjDOPlFiMsSsVonjWHvuz7hYPKXxvVXViXY1QAsoOT7wztvOepzM/hAPmYM kGiODSaN8fU/e/2l4xdnMRymAt8hsz61hdye2UYx7xRjlda/23BKNZz0hiuWiqgO FBntHycaHyqR8+fUK5EPE0vnqLp/7XdtRtQkRficuEDYlHz4PlMW8oiVEGhSZOaG fdpgg02sojU1iMOGOs3h/ODWxkRrE3qpG+eT8n1mWJp6Tq7ZLEaQGxW1P6ytlPFF qOz8JKl94D/MI7ybAtp+IsuUQk160H9wUPmaLxgemDRom7220xV6BysbmaMEWwww hgO4fBNG6dPUMp825pTSxx18rY/Kw53sgHmUasixCL6Zv6xnM3rRuTxjZh8j77TR gq2sKgoU+JkYt9eBpVRjrFO90xS5MxPrvL/lGH6P1smAODPull3o0tR681+NGKRp C8vU5vJOlmL+HlNXBSh9 =lwbI -----END PGP SIGNATURE----- _______________________________________________ Wikimediaindia-l mailing list Wikimediaindia-l(a)lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

8 years, 3 months

Bengali OCR for Proofread Page

by Jayanta Nath

A new version of Bengali Train data released just 4 month before. Could any one update for Bengali Wikisource? Regards, Jayantanth https://github.com/tesseract-ocr/tessdata/blob/master/ben.traineddata

8 years, 6 months

Import 80mb+ file size to commons through ia-upload

by Jayanta Nath

Hi all, How do I import 80mb+ file size to commons through ia-upload? During import , the tool got error gar-beg, Can anyone try to import https://archive.org/details/UpendraKishoreRachanaSamagra to common through ia-upload? Regards, Jayanatanth BNWS

8 years, 6 months

FineReader 11 on Internet Archive

by Federico Leva (Nemo)

Today I rehearsed ScanTailor and unpaper to upload a rare book from WWI: https://archive.org/details/lettere-ferrovieri-patria I noticed the OCR is now "ABBYY FineReader 11"; IA only recently switched to 9 after many years on 8, this seems good news. If I see correctly, 11 claims: New and improved language support New OCR languages: Turkmen (Latin) and Old Slavonic New ICR languages: Danish, Norwegian (Bokmal & Nynorsk), Old English, Serbian (Cyrillic), Tajik Latin language has full dictionary support Improved OCR for Chinese, Japanese & Korean 10 claims: Improved Language Support Chinese Korean Japanese It seems IA technology is evolving rather fast: https://blog.archive.org/2015/10/21/grant-to-develop-the-next-generation-wa… https://blog.archive.org/2015/10/23/zoom-in-to-9-3-million-internet-archive… Nemo

8 years, 6 months

Wikimedia Digitization User Group

by Luiz Augusto

FYI, https://meta.wikimedia.org/wiki/Wikimedia_Digitization_User_Group

8 years, 6 months

Wikisource Conference Program

by Andrea Zanni

Hi everyone, as you know in a month there'll be the Wikisource Conference in Wien, I hope many of you will participate. The organizing team discussed with the WMF about the program and we came up with a very simple schedule. We wanted to have *very few goals*, so it's easier to achieve them. * Friday for GLAMs/workshops/hackathon: Friday, as the first day, will be the warming up. Several people signed for making a presentation [1], or for hosting a workshop. We think it would be a good day for talking, learning, coding, whatever you feel like. Presentations are good for beginners, as are workshops. We feel that as a small community we can auto-organize and not be facilitated that day. There is the possibility of visiting an archive in Wien, but that would be just after lunch and would break the sessions. We can think of making the trip a fringe event on Thursday afternoon for those who arrive early... What do you think? We would really like to know your opinion on this. * Saturday is the main day, for the discussion on "Wikisource identity and future". It's a very important moment, possibly crucial as a international community. We would be most likely facilitated by a professional, as we'd have the goal of writing a Wikisource "manifesto". The goal of this is simple: gain a clean picture of what Wikisource is and what we want it to be, with a roadmap. This document would be then the start for a conversation with the WMF about the future of Wikisource. Many people could be "intimidated" by a discussion like this :-), and so there would be the possibility of having a parallel session on the relationship between Wikisource and Wikidata, that is the important issue to tackle. Both these sessions would be wrapped up on Sunday morning. Again, we would like your feedback on this: as you see, the conference is pretty simple, having few (but big!) topics, and we have few goals but we must succeed in achieving them. The very conference is a pilot, and if we fail we may not be given another chance. Please tell us what you think. Aubrey [1] https://meta.wikimedia.org/wiki/Wikisource_Community_User_Group/Wikisource_…

8 years, 6 months

Watermark on Google scan.

by Trần Nguyễn Minh Huy

Hello everybody. I have a question about watermark removal on Wikisource. If I removal claims copyright page and watermark of Google in the Google-scanned books under this guidance < https://en.wikisource.org/wiki/Help:DjVu_files#Removing_a_copyright_page>, then legal or not? Because per statement of Google, the Google-scanned books must be preserved such pages and watermark copyright. This guidance about removal watermark mainly based on case law Bridgeman Art Library v. Corel Corp. (see also < https://commons.wikimedia.org/wiki/Commons:When_to_use_the_PD-scan_tag#USA>), which (follow Wikipedia < https://en.wikipedia.org/wiki/Bridgeman_Art_Library_v._Corel_Corp.#Subseque…>) is in controversial: *"We are not convinced that the single case to which we are pointed where copyright was awarded for a “slavish copy” remains good law." The appeals court ruling cited and followed the **United States Supreme Court <https://en.wikipedia.org/wiki/United_States_Supreme_Court> decision in Feist Publications v. Rural Telephone Service <https://en.wikipedia.org/wiki/Feist_Publications_v._Rural_Telephone_Service>** (1991), explicitly rejecting difficulty of labor or expense as a consideration in copyrightability..."* Suppose that in future Google litigation Wikipedia for removing the page of Google claiming copyright in the Google-scanned books and that case was heard by the appeals court or the US Supreme Court (higher competent than Southern New York Court) and the court reverse ruling, then what should we do? Trần Nguyễn Minh Huy Supporter, Wikimedia Projects

8 years, 6 months

Large uploads to Wikisource

by Arne Wossink

Hi all, Wikimedia Nederland has recently approached by several institutions that would like to do uploads of source material. Wikisource would be the preferred platform for this as the material would be searchable (which it wouldn't be if it was only uploaded as pdf to Commons). I would like to know if there have been previous projects involving large uploads by institutions, and if there's any documentation on how to proceed with these. Thanks! Arne Wossink Projectleider / Project Lead Wikimedia Nederland Tel. +31 (0)6 11000505 *Postadres*: * Bezoekadres:* Postbus 167 Mariaplaats 3 3500 AD Utrecht Utrecht

8 years, 6 months

Registration now open: 1. International Wikisource Conference, Vienna - November 20-22

by Claudia Garád

Dear all, registration <http://goo.gl/forms/K5SFPHNkWP>for the international Wikisource Conference in Vienna is now open until *October 13, 2015*. As our PEG proposal was successful, we will be able to offer some scholarships to increase diversity among participants in terms of language and geography. The registration process also entails the scholarship application for those participants who are not funded by a Wikimedia affiliate or other organization. We encourage especially visa applicants to register and get in touch with the team from WMAT soon. We are looking forward to seeing some of you in Vienna soon! Claudia, Andrea and David -- Claudia Garád Executive Director Wikimedia Österreich - Verein zur Förderung Freien Wissens Siebensterngasse 25/15 1070 Wien 0699 141 28615 www.wikimedia.at <http://www.wikimedia.at/>

8 years, 6 months

Virtual Transcription Laboratory

by Federico Leva (Nemo)

Apparently, this rather mysterious project allows some select libraries to import a special flavour of DjVu into a system for manual transcription (and OCR training?). https://confluence.man.poznan.pl/community/display/WLT/Introduction+to+Virt… Found via http://succeed-project.eu/wiki/index.php/Cutouts Nemo

8 years, 7 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikisource-l October 2015