Forwarding to missing sister projects' lists.
-------- Messaggio originale --------
Oggetto: [Foundation-l] Call for nominations! Steward Elections 2012
Data: Sun, 15 Jan 2012 06:10:59 +0600
Mittente: Tanvir Rahman
Rispondi-a: Wikimedia Foundation Mailing List
<foundation-l(a)lists.wikimedia.org>
A: Wikimedia Foundation Mailing List <foundation-l(a)lists.wikimedia.org>,
Wikimedia India list <wikimediaindia-l(a)lists.wikimedia.org>,
Wikimedia BD List <wikimedia-bd(a)lists.wikimedia.org>, Wikimedia Commons
Discussion List <commons-l(a)lists.wikimedia.org>, Wikimedia Translators
<translators-l(a)lists.wikimedia.org>
Hello everyone,
I am pleased to announce the beginning of Steward Election 2012 [1]. We are
now taking nominations from eligible candidates. Interested candidates can
check their eligibility and procedure to submit the nomination on the
guidelines page [2]. We are open to candidate submissions till January 28,
2012, 23:59 (UTC). Questions to the candidates can be submitted until
February 6, 2012, 23:59 (UTC). Guidelines about questioning can also be
found on our guidelines page [2].
This time we are also arranging the confirmation of existing stewards [3].
But the confirmation will begin on February 8, and will finish on February
27, 2012.
Please remember, the voting has not yet begun and will be not until
February 8, 2012, 00:00 (UTC). We will poke you once again when the voting
start.
As you all know steward election is a global event, so we need help from
volunteers to translate necessary pages into languages they speak. For
those who want to help us out with translation, please see our translation
portal [4]. And if you have any queries related to translation or anything
related to the election, you can ask us on the talk page [5]. Alternatively
you are free to poke us on IRC channel #wikimedia-stewards-elections.
For those who are a little bit surprised to see another election in few
months, I want to let you know that previous one (in September-October of
2011) was a special one which we arranged in need. This is the regular one
to elect new stewards and to confirm existing ones.
Please feel free to forward this e-mail to any list if you think it will be
useful. :-)
Regards,
Tanvir Rahman
Wikitanvir on Wikimedia
(On behalf of the Election Committee.)
[1] http://meta.wikimedia.org/wiki/Stewards/Elections_2012
[2] http://meta.wikimedia.org/wiki/Stewards/Elections_2012/Guidelines
[3] http://meta.wikimedia.org/wiki/Stewards/Confirm/2012/en
[4] http://meta.wikimedia.org/wiki/Stewards/Elections_2012/Translation
[5] http://meta.wikimedia.org/wiki/Talk:Stewards/Elections_2012
_______________________________________________
foundation-l mailing list
foundation-l(a)lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
This looks like very good news. We can but wait and maybe slightly ask to
see when it will be incorporated to WMF's mw rollout.
Regards, Andrew
-------- Original Message --------
Subject: [Bug 21653] Creating a PDF with collection extension does not
render the <pages> tag hook from proofread page extension
Date: Wed, 11 Jan 2012 21:54:13 +0000
From: bugzilla-daemon(a)wikimedia.org
To: billinghurst(a)gmail.com
https://bugzilla.wikimedia.org/show_bug.cgi?id=21653
Ralf Schmitt <ralf(a)brainbot.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
--- Comment #16 from Ralf Schmitt <ralf(a)brainbot.com> 2012-01-11 21:54:13
UTC ---
implemented in mwlib 0.13.2, which is already live on pdf cluster.
please report further issues on github:
https://github.com/pediapress/mwlib/issues/new
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You voted for the bug.
You are on the CC list for the bug.
[X-posting also on Wikitech-l]
Dear all,
I hope your season's holidays are going well.
I'm writing to inform you that I have released on Github a first (0.1)
version of wikicapthca[1] a ReCAPTCHA-like program for Wiki*. The
thing is born from an initial observation by Alex brollo[2] about
which we have discussed much both in WMI mailing list and also on
Wikisource-l[3].
For starters the code there is a rewriting of Alex's scripts and
nothing more, and here's what the program does for now: 1) gets a
djvu, 2) extracts the text layer, 3) identifies non recognized words
and 4) produce a tiff image of them.
But I would like to write a "proof of concept" of the whole process
from getting OCR-ed djvu's from Commons to producing challenges,
serving them, collecting answer and then using those answers in some
useful way. That's said, you see there's still a long way to go.
Obviously in the long run we could use this system as a backup for the
current one, which has demonstrated some limitations[4], but I'm sure
there are many aspects of the problem which go beyond my knowledge
(I'm a physicist not a computer scientist, you know) so any help
and/or advice is welcome.
Thanks for your time.
Cristian
[1]https://github.com/CristianCantoro/wikicaptcha
[2]http://it.wikisource.org/wiki/Utente:Alex_brollo
[3]http://lists.wikimedia.org/pipermail/wikisource-l/2011-February/000939.ht…
[4]http://lists.wikimedia.org/pipermail/wikitech-l/2011-November/056078.html
Wow, thank you all for the quick responses.
I'll try to reply in-line.
2011/11/28 Mathias Schindler <mathias.schindler(a)gmail.com>
> I recommend sticking and supporting open source technology that has
> been made available by third parties, such as
> http://code.google.com/p/ocropus/ /
> http://code.google.com/p/tesseract-ocr/
>
This is true, and this would be the optimal way, but apparently it failed.
I don't know way the OCR button is not running anymore, it seems to me that
when ThomasV
left things were not updated or something like that.
>From my experience (I have used these software for professional projects)
the quality and the usability
of the software is very different. Of course, having Tesseract is better
than having nothing.
2011/11/28 Lars Aronsson <mathias.schindler(a)gmail.com>
> I think this is what the Internet Archive uses, as well as
> several European libraries. We could look into establishing
> a cooperation with the Internet Archive or perhaps with
> Europeana in this area. Maybe the Internet Archive can
> open up an API for OCR-ing a single page at a time?
>
This would be awesome :-)
I don't have a clue about technicalities here, if you want to aske them be
my guest :-)
@Tomasz
i think that Federico has a point in the approach he suggested:
I'm wondering in fact how did IA get his license, we should ask them.
Do we have any contact with Internet Archive?
I know we could use directly IA for uploading PDFs (we do it already for
getting DjVus)
but still it's not the more usable way to handle with institutions or
simple users...
Aubrey
On 11/30/2011 09:55 PM, Eugene Zelenko wrote:
> ABBYY has own online OCR service http://finereader.abbyyonline.com
This is very interesting, OCR as a cloud service. I didn't know they
were doing this. They charge EUR 7 per 200 pages, or US$ 0.05
per page, which I guess can be (almost) reasonable for the
Wikimedia Foundation to pay. I sometimes feel bad because I have
OCRed so many tens of thousand pages with a single EUR 129
license of Finereader. Here, EUR 129 would buy us 3700 pages.
All languages of Wikisource together are proofreading slightly
less than 900 pages/day, for which OCR would cost EUR 32/day
or US$ 43/day. With good OCR, proofreading is more fun, and
these numbers may increase. But then again, we wouldn't need
the service for all pages, as some books already have OCR.
The most interesting feature of a cloud-based OCR service, is
if they can accumulate improvements in font training (?) and
dictionaries from a large number of users over time. With
Wikisource, they can of course get direct access to the page
after proofreading.
So, is the service any good? They even promise to do Fraktur
(blackletter). Does it work well?
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
On 11/28/2011 01:59 PM, Mathias Schindler wrote:
> I recommend sticking and supporting open source technology that has
> been made available by third parties, such as
> http://code.google.com/p/ocropus/ /
> http://code.google.com/p/tesseract-ocr/
Do you recommend this based on experience, or based on free software
ideology? Apparently the Internet Archive tried and gave up, because
Finereader was far better. Are there any good examples where free
software has been used for good OCR quality?
Wikisource does provide feedback on quality: After OCR, when a page
has been proofread, the OCR software could learn from the diff.
But is there any OCR software that can take this kind of input?
When running OCR as an engine/server/API, what do we do when it
misinterprets columns in a page, and reads long lines across the
page? Is there a way to manually indicate where columns are, and
resubmit the page for new OCR?
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
2011/11/28 Gerard Meijssen <gerard.meijssen(a)gmail.com>
> Hoi,
> What scripts does it support ?
> Thanks,
> GerardM
>
Hi Gerard,
could you be more explicit?
I don't understand the question.
Aubrey
Dear all,
it's a lot aof time I'm wondering a project that could help Wikisource (and
some GLAMs too), and the idea is simply to install ABBYY Finereader 11 on
toolserver,
as a tool for all Wikisource users.
For those who don't know, ABBYY Finereader is an OCR software: it is
proprietary and fairly expensive,
but it is accurate and works really, really well. Plus, its 11 version can
save files in DjVu.
Now, in my mind having such a software on toolserver could take us to:
- restore our beloved OCR button, with a much more accurate OCR
- use Finereader for transforming PDF/TIFF/JPG from Commons directly in
OCRred DjVus.
- others things I've not thought yet
Issues are many too:
- Cost: I don't know how much this could cost. Many WM chapters do give
money to toolserver, and the status of the thing is a bit fuzzy a the
moment, but, for example,
Wikimedia Italy has frozen 5000 euros for the toolserver, and maybe we can
use those money for the license (I'm in WMI Board, and I've asked, they say
it's OK);
- Technical: afaik, toolserver run Solaris, and apparently Finereader is
Windows only (I think we can solve easly this if we want, though)
- Ethic: this is proprietary software, and I don't know if we *want* to use
it on Wikimedia projects...
- Resources: i think this is probably the main issue: we need skilled
people to set this up technically, and at least one toolserver operator
(Phe, maybe?)
Below, the mail I sent to ABBYY Europe, to see it the thing was feasible.
They simply replied they want a phone call. Of course, if the thing would
be too expensive the projects collapse immediately,
but I think it's worth to discuss. If nobody wants it, I can drop it right
now.
Please, forward this may to everyone possibly interested,
I don't thin it's a good idea to scatter discussions in every ws Village
Pump.
Cheers
Aubrey
*From:* Andrea Zanni <andrea.zanni(a)wikimedia.it>
*Sent:* Friday, November 25, 2011 10:20 AM
*To:* support_eu(a)abbyy.com
*Subject:* Questions about server licenses
Dear ABBYY Europe,
my name is Andrea Zanni, and I'm a Board member of Wikimedia Italy,
the Italian chapter of Wikimedia movement.
We are a no-profit association which promotes and sustains Wikimedia
project,
as the online encyclopedia Wikipedia.
I'm writing you because I'm interested in knowing
about "server licences" of your new Finereader 11.
As far as I know, your product save files in DjVu, and this is an
interesting feature that
could help some of our project.
Maybe you know Wikisource, a multilingual digital library in which the
community upload, transcribe and proofread books.
This is the english version (http://en.wikisource.org/wiki/Main_Page).
In each page of each book (which are uploaded in DjVu), we have a little
button "OCR"
which used to call a tesseract bot and ocr the page.
Right now, the bot doesn't work for lack of maintainance.
My idea would be to substitute the tesseract with Finereader, and also have
the possibility to
use other features, as taking a PDF/JPEG file and saving it as a OCRred
DjVu, or as choosing the language of the OCR from project to project.
Now, I do not have an estimate of how much this engine could be used (I
understand this is a crucial factor for the price of a server license).
I would count few hundreds of pages OCRred per day (maybe more, if this
thing works), and a few dozens file conversions (any to DjVu) per day.
So, my questions are:
- do you have a rough idea how much this license would cost?
- do you know if it is possible to run FR11 in other os than Windows (we
actually run Solaris)?
- do you know if is possible to have all these feature via API or
something?
Thank you for your time,
regards
Andrea Zanni
--
Wikimedia Italia Board
Sostieni la cultura, dona a Wikimedia Italia.
http://sostienilacultura.it