Wikisource-l December 2015

wikisource-l@lists.wikimedia.org

14 participants
8 discussions

Vote for Google OCR-Wikisource integration in 2015 community wishlist
by Bodhisattwa Mandal 22 Feb '16

22 Feb '16

Hi, For a long time Indic languages Wikisource projects depended totally on manual proofreading, which not only wasted a lot of time, but also a lot of energy. Recently Google has released OCR software for more than 20 Indic languages, along with other Asian languages. This software is far far better and accurate than the previous OCRs. But it has many limitations. Uploading the same large file two times (one time for Google OCR and another at Commons) is not an easy solution for most of the contributors, as Internet connection is way slow in India. Now if we develop a tool which can feed the uploaded pdf or djvu files of Commons directly to Google OCRs, so that uploading them 2 times can be avoided. This was proposed in 2015 community wishlist. Now, as the voting procedure for the wishlist has been started, the proposal needs your support. Please follow the link- https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Wikisource#T… FYI, this proposal was also accepted as a highest priority need at the 2015 Wikisource Conference in Vienna. (https://etherpad.wikimedia.org/p/wscon2015needs) Regards -- Bodhisattwa Mandal Administrator, Bengali Wikipedia ''Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge.''

11 27

Fwd: [Wikimediaindia-l] Google's Optical Character Recognition software now works with all South Asian languages
by Jayanta Nath 24 Jan '16

24 Jan '16

Hi All, I have checked for Bengali Images, its works fine with 100% accuracy. Any how can it be implemented in Proofread extension? Regards, Jayanta ---------- Forwarded message ---------- From: Subhashish Panigrahi <subhashish(a)cis-india.org> Date: Sat, Aug 29, 2015 at 3:22 PM Subject: [Wikimediaindia-l] Google's Optical Character Recognition software now works with all South Asian languages To: wikimediaindia-l(a)lists.wikimedia.org -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Google's OCR which apparently is most accurate OCR we have seen so far, works really good for all the major South Asian scripts: http://globalvoicesonline.org/2015/08/29/googles-optical-character-recog nition-software-now-works-with-all-south-asian-languages Here are test cases of many Indian scripts: https://goo.gl/3X75iR. Except Gurmukhi most scripts are working really good. This could be really useful for Indian language Wikimedians and will come handy for digitization of printed and scanned text. Here is an animated tutorial for Wikimedians to use this tool for Wikisource/Wikipedia: https://commons.wikimedia.org/wiki/File:Tutorial_to_use_Google_Optical_C haracter_Recognition.gif Please write to me if anyone wants to localize this tutorial in your language. - -- Best! Subhashish Panigrahi Programme Officer, Access To Knowledge Centre for Internet and Society @subhapa / https://cis-india.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJV4YD0AAoJEHThehXZGxGO9ywP/RcJOXB3tFHJNF03X23x1jkY vffu+1Iob6kLMZt/JD3nTmpXasXDlme6pbGzaT7/YZsC0VouN+4NE9HoEmZAksJF 3nn7HoEive4mDalXH5qyATOilezqIEYOG2c32LVYHnX6Co+fXPVa5WqsHn5js957 OionIc5t0V9zlGB6e5RLOacPWXsAhXyVunaeY6Ma33cOWHFdVnu1XpUGphJ+miVj EWszTzjDOPlFiMsSsVonjWHvuz7hYPKXxvVXViXY1QAsoOT7wztvOepzM/hAPmYM kGiODSaN8fU/e/2l4xdnMRymAt8hsz61hdye2UYx7xRjlda/23BKNZz0hiuWiqgO FBntHycaHyqR8+fUK5EPE0vnqLp/7XdtRtQkRficuEDYlHz4PlMW8oiVEGhSZOaG fdpgg02sojU1iMOGOs3h/ODWxkRrE3qpG+eT8n1mWJp6Tq7ZLEaQGxW1P6ytlPFF qOz8JKl94D/MI7ybAtp+IsuUQk160H9wUPmaLxgemDRom7220xV6BysbmaMEWwww hgO4fBNG6dPUMp825pTSxx18rY/Kw53sgHmUasixCL6Zv6xnM3rRuTxjZh8j77TR gq2sKgoU+JkYt9eBpVRjrFO90xS5MxPrvL/lGH6P1smAODPull3o0tR681+NGKRp C8vU5vJOlmL+HlNXBSh9 =lwbI -----END PGP SIGNATURE----- _______________________________________________ Wikimediaindia-l mailing list Wikimediaindia-l(a)lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

2 2

One Wikisource for Two Punjabi Scripts
by Satdeep Gill 20 Dec '15

20 Dec '15

Hi, As of now we don't have any separate Wikisource project for Punjabi. We have two Wikipedias as Punjabi is written in two scripts; One is LTR and the other one is RTL. I was wondering if it is possible to have one Wikisource for both the scripts. We have many texts especially the ones written before 1947, which are available in both the scripts. We can work one making the new ones available in the other script by using a transliteration tool and some proofreading. Is it possible to create an interface for these two different scripts? Regards Satdeep Gill

2 1

Community Wishlist Survey and Wikisource
by Andrea Zanni 18 Dec '15

18 Dec '15

FYI: Danny was saying good things towards us in a Wikimedia-l thread, so I replied as you see below. Good job, guys. Aubrey On Wed, Dec 16, 2015 at 9:55 PM, Danny Horn <dhorn(a)wikimedia.org> wrote: > The Wikisource community did a tremendous job in showing up and giving > support to the Wikisource proposals. The top wishes in that category got 41 > and 39 votes, which is really impressive considering the relative size of > the projects. > > The discussion on using Google's OCR in Indic language Wikisource is > especially interesting -- a lively debate about finding the right solution > to what is clearly a deeply-felt need from a community that's working > really hard to add their languages' knowledge to the movement. I hope that > having that debate here is a step towards a larger discussion about how we > can support Wikisource projects. > Thanks, Danny. In the Wikisource Conference, held in Vienna from 20 to 22 November, we discussed a lot about what Wikisource needs to reach it's full potential as a project. We decided to agree on a priority list (here: https://etherpad.wikimedia.org/p/wscon2015needs) and also to participate in the Survey. But, if you feel brave enough, there is the whole 665 lines Etherpad here: https://etherpad.wikimedia.org/p/wscon2015weekend [1] Wikisource, as a project, is completely dependent on the Proofread Page Extension [2]. Unfortunately, the extension is maintained by volunteers only (I think, just one: Tpt). Also, the extension doesn't support RTL languages: so Wikisources in arabic, hebrew, farsi, indic languages don't really work as the others. This is to be added to the fact that there is no good embedded OCR for Indic languages, right now. And, finally, to the simple fact that we'd love to have the Visual Editor, *within* the ProofreadPage Extension, as Wikisource uses a *lot* of formatting, and that could enable many, many more users in proofreading and validating pages. Of course, we are a small community, but we're trying really hard to make our case. At the moment, to the best of my knowledge, there is no, and there's never been, any software development dedicated to Wikisource from the WMF. Aubrey (also a member of the Wikisource Community User Group) [1] I hereby claim this as the longest Etherpad written by a group of wikimedians (~40). I hope there is a prize for it. You can even read the Wikisource mission forged and translated in real time in 21 languages (line 564). [2] https://www.mediawiki.org/wiki/Extension:Proofread_Page

5 6

Page to monitor Commons nominated deletions
by billinghurst 16 Dec '15

16 Dec '15

After I had another hissy fit at Commons administrators about WS works being deleted and no means to track them, they created https://commons.wikimedia.org/wiki/User:SteinsplitterBot/DR/enwikisource I would encourage other wikis to talk to Steinsplitter to get a similar page set up for their wikisource. I am going to need a better means to monitor it than manually checking from a watchlist, however, it is a start. Regards, Billinghurst

1 0

Fwd: [Wikidata] Fwd: "Wikipedia as the front matter to all research": A brown bag on scholarly citations in Wikipedia this Friday 12/4 @ 12 PT
by Tobias Schönberg 04 Dec '15

04 Dec '15

I think the below is also interesting to the Wikisource community. ---------- Forwarded message ---------- From: Dario Taraborelli <dtaraborelli(a)wikimedia.org> Date: 2015-12-04 16:43 GMT+01:00 Subject: [Wikidata] Fwd: "Wikipedia as the front matter to all research": A brown bag on scholarly citations in Wikipedia this Friday 12/4 @ 12 PT To: "Discussion list for the Wikidata project." < wikidata(a)lists.wikimedia.org> A reminder that this will be streamed today at 9pm CET / 12pm PST We’ll be talking <https://meta.wikimedia.org/wiki/Wikipedia_as_the_front_matter_to_all_resear…> about unique identifiers and bibliographic/citation data in general as well as https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData You can join the conversation via IRC on #wikimedia-office Dario Begin forwarded message: *From: *Dario Taraborelli <dtaraborelli(a)wikimedia.org> *Date: *December 2, 2015 at 11:01:51 AM PST *To: *wikimedia-l(a)lists.wikimedia.org, Research into Wikimedia content and communities <wiki-research-l(a)lists.wikimedia.org> *Subject: **"Wikipedia as the front matter to all research": A brown bag on scholarly citations in Wikipedia this Friday 12/4 @ 12 PT * Come and join us for a brown bag this *Friday* *December 4 *at 12 PT to learn about *unique identifiers and* *scholarly citations in Wikipedia*, why they matter and how we can bridge the gap between the Wikimedia, research and librarian communities. *Wikipedia as the front matter to all research* YouTube stream: http://www.youtube.com/watch?v=mB_oexqz8pA Event information on Meta: https://meta.wikimedia.org/wiki/Wikipedia_as_the_front_matter_to_all_resear… *Measuring citizen engagement with the scholarly literature through Wikipedia citations.* Geoffrey Bilder, CrossRef Wikipedia (in toto) is probably the 5th largest referrer of citations to the scholarly literature. That is, more Wikipedia users click on and follow citations to the scholarly literature *from* Wikipedia domains than from any single scholarly publisher in the world. What does this tell us about general interest in the scholarly literature? What does this tell us about scholarly engagement with editing Wikipedia articles? The short answer is “we don’t know.” But we are actively working with Wikimedia to find out. *Building the sum of all human citations* Dario Taraborelli, WIkimedia Foundation As sourcing and verifiability of online information are threatened <http://www.slideshare.net/dartar/citing-as-a-public-service-building-the-su…> by the explosion of answer engines and the changing habits of web users, Wikimedia has an outstanding opportunity to extract and store source data for every conceivable statement and make it transparently verifiable by its users. In this talk, I’ll present a grassroots effort <https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData> to create a human-curated, comprehensive repository of all human citations in Wikidata. ––––––––––––– Bonus read: a real-time tracker of scholarly citations added to Wikipedia, built with Raspberry Pi http://blog.crossref.org/2015/12/crossref-labs-plays-with-the-raspberry-pi-… *Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter <http://twitter.com/readermeter> *Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter <http://twitter.com/readermeter> _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

1 0

Errare humanum est, perseverare diabolicum
by Alex Brollo 04 Dec '15

04 Dec '15

I'm deeply convinced that splitting wikisource projects into variuos languages has been a mistake. Is anyone so bold to imagine that it is possible to revert that mistake? Or, are we forced to travel along the* diabolicum* trail? Alex

15 33

Layman works about djvu: a self-made editor and more
by Alex Brollo 02 Dec '15

02 Dec '15

I'm playing with djvu files, luckily I found a "simple" way to build a GUI and so I'm using a self-build djvu text editor with some features that allow many developments (select and save cropped images from page images; saving to ws nsPage text by pywikibot; aligning djvu text layer with ws nsPage text). Please don't ask me to put the scripts into Git or Github since I simply can't do this.... it's a blame but I can't. Here a draft doc about what I'm doing: https://it.wikisource.org/wiki/Utente:Alex_brollo/Djvu_Editor_-_Documentazi… ; it's in Italian, but I found that Chrome translator from Italian into English does a surplisingly good translation. Obviously I'll be happy to share the code with any of you; consider that I'm a DIY "programmer", and that therefore to read my code for a good programmer would be a terrible pain. Alex

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikisource-l December 2015