Wikisource-l

wikisource-l@lists.wikimedia.org

4 participants
1318 discussions

Fwd: [Foundation-l] Call for nominations! Steward Elections 2012
by Federico Leva (Nemo) 15 Jan '12

15 Jan '12

Forwarding to missing sister projects' lists. -------- Messaggio originale -------- Oggetto: [Foundation-l] Call for nominations! Steward Elections 2012 Data: Sun, 15 Jan 2012 06:10:59 +0600 Mittente: Tanvir Rahman Rispondi-a: Wikimedia Foundation Mailing List <foundation-l(a)lists.wikimedia.org> A: Wikimedia Foundation Mailing List <foundation-l(a)lists.wikimedia.org>, Wikimedia India list <wikimediaindia-l(a)lists.wikimedia.org>, Wikimedia BD List <wikimedia-bd(a)lists.wikimedia.org>, Wikimedia Commons Discussion List <commons-l(a)lists.wikimedia.org>, Wikimedia Translators <translators-l(a)lists.wikimedia.org> Hello everyone, I am pleased to announce the beginning of Steward Election 2012 [1]. We are now taking nominations from eligible candidates. Interested candidates can check their eligibility and procedure to submit the nomination on the guidelines page [2]. We are open to candidate submissions till January 28, 2012, 23:59 (UTC). Questions to the candidates can be submitted until February 6, 2012, 23:59 (UTC). Guidelines about questioning can also be found on our guidelines page [2]. This time we are also arranging the confirmation of existing stewards [3]. But the confirmation will begin on February 8, and will finish on February 27, 2012. Please remember, the voting has not yet begun and will be not until February 8, 2012, 00:00 (UTC). We will poke you once again when the voting start. As you all know steward election is a global event, so we need help from volunteers to translate necessary pages into languages they speak. For those who want to help us out with translation, please see our translation portal [4]. And if you have any queries related to translation or anything related to the election, you can ask us on the talk page [5]. Alternatively you are free to poke us on IRC channel #wikimedia-stewards-elections. For those who are a little bit surprised to see another election in few months, I want to let you know that previous one (in September-October of 2011) was a special one which we arranged in need. This is the regular one to elect new stewards and to confirm existing ones. Please feel free to forward this e-mail to any list if you think it will be useful. :-) Regards, Tanvir Rahman Wikitanvir on Wikimedia (On behalf of the Election Committee.) [1] http://meta.wikimedia.org/wiki/Stewards/Elections_2012 [2] http://meta.wikimedia.org/wiki/Stewards/Elections_2012/Guidelines [3] http://meta.wikimedia.org/wiki/Stewards/Confirm/2012/en [4] http://meta.wikimedia.org/wiki/Stewards/Elections_2012/Translation [5] http://meta.wikimedia.org/wiki/Talk:Stewards/Elections_2012 _______________________________________________ foundation-l mailing list foundation-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

1 0

Fwd: [Bug 21653] Creating a PDF with collection extension does not render the <pages> tag hook from proofread page extension
by billinghurst 12 Jan '12

12 Jan '12

This looks like very good news. We can but wait and maybe slightly ask to see when it will be incorporated to WMF's mw rollout. Regards, Andrew -------- Original Message -------- Subject: [Bug 21653] Creating a PDF with collection extension does not render the <pages> tag hook from proofread page extension Date: Wed, 11 Jan 2012 21:54:13 +0000 From: bugzilla-daemon(a)wikimedia.org To: billinghurst(a)gmail.com https://bugzilla.wikimedia.org/show_bug.cgi?id=21653 Ralf Schmitt <ralf(a)brainbot.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #16 from Ralf Schmitt <ralf(a)brainbot.com> 2012-01-11 21:54:13 UTC --- implemented in mwlib 0.13.2, which is already live on pdf cluster. please report further issues on github: https://github.com/pediapress/mwlib/issues/new -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You voted for the bug. You are on the CC list for the bug.

1 0

password
by Marc Galli 07 Jan '12

07 Jan '12

Hi I'm a sysop on French wikisource, I have lost my password and I have no email address to ask a new password. Explanations here : http://fr.wikisource.org/wiki/Wikisource:Scriptorium#Mot_de_passe_perdu I know it is possible to insert a new email address in the database, so I look for someone who can access to it. Thank for your help. Marc<http://fr.wikisource.org/wiki/Wikisource:Scriptorium#Mot_de_passe_perdu>

2 2

wikicaptcha on GitHub
by Cristian Consonni 05 Jan '12

05 Jan '12

[X-posting also on Wikitech-l] Dear all, I hope your season's holidays are going well. I'm writing to inform you that I have released on Github a first (0.1) version of wikicapthca[1] a ReCAPTCHA-like program for Wiki*. The thing is born from an initial observation by Alex brollo[2] about which we have discussed much both in WMI mailing list and also on Wikisource-l[3]. For starters the code there is a rewriting of Alex's scripts and nothing more, and here's what the program does for now: 1) gets a djvu, 2) extracts the text layer, 3) identifies non recognized words and 4) produce a tiff image of them. But I would like to write a "proof of concept" of the whole process from getting OCR-ed djvu's from Commons to producing challenges, serving them, collecting answer and then using those answers in some useful way. That's said, you see there's still a long way to go. Obviously in the long run we could use this system as a backup for the current one, which has demonstrated some limitations[4], but I'm sure there are many aspects of the problem which go beyond my knowledge (I'm a physicist not a computer scientist, you know) so any help and/or advice is welcome. Thanks for your time. Cristian [1]https://github.com/CristianCantoro/wikicaptcha [2]http://it.wikisource.org/wiki/Utente:Alex_brollo [3]http://lists.wikimedia.org/pipermail/wikisource-l/2011-February/000939.ht… [4]http://lists.wikimedia.org/pipermail/wikitech-l/2011-November/056078.html

1 0

Music module (LilyPond)
by Federico Leva (Nemo) 16 Dec '11

16 Dec '11

The WMF is working on it, there's some chance to get it deployed at last. Coders needed. https://wikisource.org/wiki/Wikisource:Scriptorium#Music_module Nemo

2 1

Re: [Wikisource-l] [cultural-partners] ABBYY Finereader 11 on Toolserver: do we like it?
by Andrea Zanni 15 Dec '11

15 Dec '11

Wow, thank you all for the quick responses. I'll try to reply in-line. 2011/11/28 Mathias Schindler <mathias.schindler(a)gmail.com> > I recommend sticking and supporting open source technology that has > been made available by third parties, such as > http://code.google.com/p/ocropus/ / > http://code.google.com/p/tesseract-ocr/ > This is true, and this would be the optimal way, but apparently it failed. I don't know way the OCR button is not running anymore, it seems to me that when ThomasV left things were not updated or something like that. >From my experience (I have used these software for professional projects) the quality and the usability of the software is very different. Of course, having Tesseract is better than having nothing. 2011/11/28 Lars Aronsson <mathias.schindler(a)gmail.com> > I think this is what the Internet Archive uses, as well as > several European libraries. We could look into establishing > a cooperation with the Internet Archive or perhaps with > Europeana in this area. Maybe the Internet Archive can > open up an API for OCR-ing a single page at a time? > This would be awesome :-) I don't have a clue about technicalities here, if you want to aske them be my guest :-) @Tomasz i think that Federico has a point in the approach he suggested: I'm wondering in fact how did IA get his license, we should ask them. Do we have any contact with Internet Archive? I know we could use directly IA for uploading PDFs (we do it already for getting DjVus) but still it's not the more usable way to handle with institutions or simple users... Aubrey

2 2

Re: [Wikisource-l] [Commons-l] [cultural-partners] ABBYY Finereader 11 on Toolserver: do we like it?
by Lars Aronsson 01 Dec '11

01 Dec '11

On 11/30/2011 09:55 PM, Eugene Zelenko wrote: > ABBYY has own online OCR service http://finereader.abbyyonline.com This is very interesting, OCR as a cloud service. I didn't know they were doing this. They charge EUR 7 per 200 pages, or US$ 0.05 per page, which I guess can be (almost) reasonable for the Wikimedia Foundation to pay. I sometimes feel bad because I have OCRed so many tens of thousand pages with a single EUR 129 license of Finereader. Here, EUR 129 would buy us 3700 pages. All languages of Wikisource together are proofreading slightly less than 900 pages/day, for which OCR would cost EUR 32/day or US$ 43/day. With good OCR, proofreading is more fun, and these numbers may increase. But then again, we wouldn't need the service for all pages, as some books already have OCR. The most interesting feature of a cloud-based OCR service, is if they can accumulate improvements in font training (?) and dictionaries from a large number of users over time. With Wikisource, they can of course get direct access to the page after proofreading. So, is the service any good? They even promise to do Fraktur (blackletter). Does it work well? -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

1 1

Re: [Wikisource-l] [cultural-partners] ABBYY Finereader 11 on Toolserver: do we like it?
by Lars Aronsson 30 Nov '11

30 Nov '11

On 11/28/2011 01:59 PM, Mathias Schindler wrote: > I recommend sticking and supporting open source technology that has > been made available by third parties, such as > http://code.google.com/p/ocropus/ / > http://code.google.com/p/tesseract-ocr/ Do you recommend this based on experience, or based on free software ideology? Apparently the Internet Archive tried and gave up, because Finereader was far better. Are there any good examples where free software has been used for good OCR quality? Wikisource does provide feedback on quality: After OCR, when a page has been proofread, the OCR software could learn from the diff. But is there any OCR software that can take this kind of input? When running OCR as an engine/server/API, what do we do when it misinterprets columns in a page, and reads long lines across the page? Is there a way to manually indicate where columns are, and resubmit the page for new OCR? -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

2 3

Re: [Wikisource-l] [cultural-partners] ABBYY Finereader 11 on Toolserver: do we like it?
by Andrea Zanni 28 Nov '11

28 Nov '11

2011/11/28 Gerard Meijssen <gerard.meijssen(a)gmail.com> > Hoi, > What scripts does it support ? > Thanks, > GerardM > Hi Gerard, could you be more explicit? I don't understand the question. Aubrey

2 1

ABBYY Finereader 11 on Toolserver: do we like it?
by Andrea Zanni 28 Nov '11

28 Nov '11

Dear all, it's a lot aof time I'm wondering a project that could help Wikisource (and some GLAMs too), and the idea is simply to install ABBYY Finereader 11 on toolserver, as a tool for all Wikisource users. For those who don't know, ABBYY Finereader is an OCR software: it is proprietary and fairly expensive, but it is accurate and works really, really well. Plus, its 11 version can save files in DjVu. Now, in my mind having such a software on toolserver could take us to: - restore our beloved OCR button, with a much more accurate OCR - use Finereader for transforming PDF/TIFF/JPG from Commons directly in OCRred DjVus. - others things I've not thought yet Issues are many too: - Cost: I don't know how much this could cost. Many WM chapters do give money to toolserver, and the status of the thing is a bit fuzzy a the moment, but, for example, Wikimedia Italy has frozen 5000 euros for the toolserver, and maybe we can use those money for the license (I'm in WMI Board, and I've asked, they say it's OK); - Technical: afaik, toolserver run Solaris, and apparently Finereader is Windows only (I think we can solve easly this if we want, though) - Ethic: this is proprietary software, and I don't know if we *want* to use it on Wikimedia projects... - Resources: i think this is probably the main issue: we need skilled people to set this up technically, and at least one toolserver operator (Phe, maybe?) Below, the mail I sent to ABBYY Europe, to see it the thing was feasible. They simply replied they want a phone call. Of course, if the thing would be too expensive the projects collapse immediately, but I think it's worth to discuss. If nobody wants it, I can drop it right now. Please, forward this may to everyone possibly interested, I don't thin it's a good idea to scatter discussions in every ws Village Pump. Cheers Aubrey *From:* Andrea Zanni <andrea.zanni(a)wikimedia.it> *Sent:* Friday, November 25, 2011 10:20 AM *To:* support_eu(a)abbyy.com *Subject:* Questions about server licenses Dear ABBYY Europe, my name is Andrea Zanni, and I'm a Board member of Wikimedia Italy, the Italian chapter of Wikimedia movement. We are a no-profit association which promotes and sustains Wikimedia project, as the online encyclopedia Wikipedia. I'm writing you because I'm interested in knowing about "server licences" of your new Finereader 11. As far as I know, your product save files in DjVu, and this is an interesting feature that could help some of our project. Maybe you know Wikisource, a multilingual digital library in which the community upload, transcribe and proofread books. This is the english version (http://en.wikisource.org/wiki/Main_Page). In each page of each book (which are uploaded in DjVu), we have a little button "OCR" which used to call a tesseract bot and ocr the page. Right now, the bot doesn't work for lack of maintainance. My idea would be to substitute the tesseract with Finereader, and also have the possibility to use other features, as taking a PDF/JPEG file and saving it as a OCRred DjVu, or as choosing the language of the OCR from project to project. Now, I do not have an estimate of how much this engine could be used (I understand this is a crucial factor for the price of a server license). I would count few hundreds of pages OCRred per day (maybe more, if this thing works), and a few dozens file conversions (any to DjVu) per day. So, my questions are: - do you have a rough idea how much this license would cost? - do you know if it is possible to run FR11 in other os than Windows (we actually run Solaris)? - do you know if is possible to have all these feature via API or something? Thank you for your time, regards Andrea Zanni -- Wikimedia Italia Board Sostieni la cultura, dona a Wikimedia Italia. http://sostienilacultura.it

2 1

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikisource-l