Wikisource-l April 2014

wikisource-l@lists.wikimedia.org

19 participants
15 discussions

Zurich hackathon
by Andrea Zanni 14 Apr '14

14 Apr '14

Hi guys, does somebody plans to go to the Zurich hackathon? http://www.mediawiki.org/wiki/Z%C3%BCrich_Hackathon_2014 Wikimedia CH can provide the accomodation, but it is too late to ask them for a scholarship. There are maybe other means, but please get in contact to us asap. the registration deadline is *tomorrow*. Aubrey

4 4

Nepali Wiki Source
by Ganesh Paudel 12 Apr '14

12 Apr '14

Hi everyone, I am Ganesh from Nepal. Good have time with Andrea in Berlin to discuss about Wiki Source. We require Wiki Source in Nepali Language (ne). Please help us out. Best, Ganesh K. Paudel

3 2

Re: [Wikisource-l] Improving information retrieval methods for OCR data sets consisting of Indic scripts
by Jayanta Nath 08 Apr '14

08 Apr '14

I am expecting any response from Ankur Group regarding Bengali/Bangla OCR project which we are interested to use in practical field. Area there any binary file to use offline? Jayanta Nath Bengali Wikisource Community On Mon, Feb 3, 2014 at 5:57 PM, Jayanta Nath <jayantanth(a)gmail.com> wrote: > Hi Sankarshan, > > Thank you for prompt inisitiative after talking at Kolkata bookfair. > Bengali wikipedia community ( wiki source, bn.wikisource.org), are > ready to do a nothing except coding to crack this OCR issues. As all you > know that, this will not only help for us, it will be the most awaited > wishes from longtime. > > Regards, > Jayanta > > > On Monday, February 3, 2014, Sankarshan Mukhopadhyay < > sankarshan.mukhopadhyay(a)gmail.com> wrote: > >> Hi Rabindra, >> >> Thank you for writing in. >> >> I am replying as a top-post because I have copied in the mailing list >> we use to discuss project ideas (subscription interface should be >> available from < >> http://lists.ankur.org.in/listinfo.cgi/project-ideas-ankur.org.in> >> >> I have also added Jayanta Nath in the list. I met Jayanta yesterday >> (after a suitably long period of interactions over email) and, we >> ended up chatting about the usual - "how to crack this OCR issue in a >> manner that helps the Bengali Wikipedia community and, especially >> Wikisource" >> >> I am glad to note that you have taken a look at Abhishek's existing >> work. Have you been able to reach out to him and discuss in some level >> of detail the current state of the work? The voting piece is somewhat >> based on the concept that a larger number of users of the system can >> help train the system for higher degree of accuracy. >> >> ankur.org.in will be putting in an application as a mentoring >> organization. However, the acceptance in GSoC2014 is always subject to >> - [1] good set of project ideas; [2] reasonable success from previous >> year etc. So, there is a period of waiting before one gets to know >> about being selected as a mentoring organization and, thereafter >> begins the process of selecting strong applications from students. >> >> I would recommend that you spend this time catching up with Abhishek >> and also Jayanta in order to be able to understand a real-life >> utilization of your project (should ankur.org.in be selected and, you >> are accepted as a student) >> >> /sankarshan >> >> On Mon, Feb 3, 2014 at 12:56 PM, Rabindra Rakshit <rovir2r(a)gmail.com> >> wrote: >> > I (Rabindra Rakshit), am interested in applying for GSOC 2014, and would >> > like to know if Ankur India is applying as a mentoring organization this >> > year also. >> > >> > I am currently pursuing my B.tech in Computer Science(CSE) from College >> of >> > Engineering and Management, Kolaghat, and being born a Bengali, would >> love >> > to see my language flourish in the open source community. >> > >> > I am particularly interested in the project about Improving information >> > retrieval methods for OCR data sets consisting of Indic scripts(Info >> > Rescue). I had a look on the work plan of Abhishek Gupta, the final >> voting >> > system in a general(abstract) manner is yet to be implemented. >> > >> > I don't have any exact experience about OCR, but I do have experience of >> > working with Information Retrieval Systems, in fact, right now I am >> working >> > on Consensus Sequence Segmentation, an Unsupervised Text Segmentation >> > algorithm that relies entirely on statistical relationships among >> alphabets >> > in the input sequence to detect location of word boundaries. I have >> attached >> > a document of our work which is still in progress. >> > >> > Link: http://arxiv.org/abs/1308.3839 >> >> -- >> sankarshan mukhopadhyay >> <https://twitter.com/#!/sankarshan> >> >

1 0

Wikisource stats
by Andrea Zanni 04 Apr '14

04 Apr '14

Hi guys, the Learning & Evaluation team pulled some updated Wikisource stats this week (see charts below, which Siko Bouterse presented to Sue and Erik yesterday as part of "how IEG is having impact by supporting community organizers" talk). There is ongoing growth in editors in Wikisource since the IEG me and David did last year. This is not to brag: we couldn't do it without the Wikisource community all around the world. This is a small piece of evidence that when we all work organized (say, Wikisource birthday contests) there are tangible results. I just wanted to remind you that we have a Wikisource Community User Group, which can be of help in organizing Wikisource related events/projects. And I really hope to see many of you wikisourcerors in London this summer: Wikimania is always the place where this awesome things starts (there has to be beer in the equation though :-) Cheers Aubrey [image: Inline image 1] [image: Inline image 2]

4 4

Merging djvu.xml and abbyy.xml
by Alex Brollo 01 Apr '14

01 Apr '14

Just to inspire any of you who can manage comfortably xml: I'm going to try to merge two interesting files from Internet Archive derive routine, djvu.xml and abbyy.xml. As many of you know, both contain mapped text from OCR. The first one contains mapped text at a word detail, without any formatting data and/or data about recognition probability; the second one is much more complex, it gives a pretty complex set of data at a detail of single character; it's so complex and detailed that it is almost unusable. Both seem to come from the same process and seem identical in shared data. So, it seems possible to extract some interesting, selected from abbyy.xml file and inject them into djvu.xml file, then loading the result into djvu text layer; the mostly interesting data being the probability of recognition of words. It such an idea already been explored/developed? I often "rediscover the wheel" :-) Alex brollo

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikisource-l April 2014