I just wanted to make sure there will be some wikisourcerers at this
event. Is that so?
-------- Message transféré --------
Sujet : [Wikimedia-l] Wikimedia Conference 2018 - Registration closes
in one month!
Date : Fri, 15 Dec 2017 14:23:18 +0100
De : Michelle Poltier <michelle.poltier(a)wikimedia.de>
Répondre à : Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
Pour : wikimedia-l(a)lists.wikimedia.org
Thanks to those of you who have already registered for the Wikimedia
Conference 2018 – we are already very excited to host the conference
from April 20 to April 22, 2018 in Berlin!
== Registration information ==
You have not registered for the conference yet? Your affiliation is
eligible  to attend the conference and you have been selected to
represent your affiliation in Berlin? Then please be kindly reminded
to register until Monday, January 15, 2018. To register for the
conference, please find the link to the registration form below. 
== Holiday break of the organizing team ==
In view of the upcoming holidays, we would like to inform you that the
WMCON18 organizing team will be out of the office from December 22,
2017 until January 3, 2018.
If you need assistance or have any questions, we recommend you to
contact us before December 22. Otherwise, we will respond to your
emails as soon as possible upon our return.
== Visa information ==
We strongly advise all those of you who need a visa to register until
Monday, December 18, 2017.
We will do our best to send the documents (letter of invitation,
foreign travel health insurance, copy of registration of association
of WMDE) by December 22, 2017, which are needed for the visa
application process, to everyone, who registers until Monday, December
18, 2017. Should you register after this date, we will prepare and
send the documents to you beginning of January.
Further information on the visa process and assistance can be found on Meta. 
Should you have any questions, please do not hesitate to contact us.
Daniela & Michelle on behalf of the organizing team
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:email@example.com?subject=unsubscribe>
আগামী ১৮ই মে শিলচরের অসম বিশ্ববিদ্যালয়ের বাংলা বিভাগের সহায়টায় উইকিসংকলনের
একটি কর্মশালা অনুষ্ঠিত হবে।
বিস্তারিত এখানে - https://bn.wikisource.org/s/fw5o
পশ্চিমবঙ্গ ইউজার গ্রুপ এই কর্মশালার জন্য প্রয়োজনীয় খরচ বহন করছে।
Good to know. I consulted the website of ABBYY and it say one option is
an "Open license for local use on workstations", but I guess it's not a
FLOSS license, unfortunately.
By the way, what is the state of the affair regarding Indic languages?
Do we have a central page documenting existing OCR pipeline used by the
What should I say to a contributor which come to me asking "I have this
old PD book in my personnal library that I would like to digitalize,
share and proofread in Wikisource, where should I start?". Do we have an
online service, for example on tool labs, which enable to either upload
or simply input url of a facsimile and that launch the OCR for example
backed on tesseract?
Shouldn't we update our roadmap, or is there a more up to date
Le 13/04/2018 à 08:28, Nahum Wengrov a écrit :
> I use ABBYY Finereader, don't remember the exact version (probably 12
> or 11). I bought it a few years ago and it works perfectly for my
> language (Hebrew).
> On Fri, Apr 13, 2018 at 2:22 AM, mathieu stumpf guntz
> <psychoslave(a)culture-libre.org <mailto:firstname.lastname@example.org>>
> Thank you Nahum,
> Could you indicate which OCR solution you are using?
> Le 26/03/2018 à 17:27, Nahum Wengrov a écrit :
>> I frequently work offline on he.wikisource. I download the entire
>> pdf file from commons to my hard drive, and OCR the page I need
>> myself. One can use the OCR of wikisource and download the text
>> too, I guess, page by page. Then I proof the text in a Word
>> document, open to the lower half of my screen, with the pdf open
>> on the upper half of the screen, where I go to the page I need
>> with acrobat reader, and scroll both windows down or up as needed.
>> On Mon, Mar 26, 2018 at 11:21 AM, mathieu stumpf guntz
>> <mailto:email@example.com>> wrote:
>> Le 24/03/2018 à 16:22, billinghurst a écrit :
>>> Though that would defeat the purpose of online proofreading
>>> with account verification. Some of the true value of our
>>> online process is that contribution builds a level of trust
>>> and knowledge and that is reflected in both our patrolling
>>> and the allocation of autopatrolled status.
>> How providing tools to make batch work offline would
>> interfere in anyway with that? Once the work is done, it can
>> be uploaded to Wikisource with whichever account the user want.
>> Actually, to my mind, the main benefit of the online aspect
>> is the peer to peer production model. Also there is no need
>> of a central node carrying accounts to take into account the
>> trust given to a particular contributor. There is digital
>> signature technologies such as gpg for example. Having a
>> central node with a web interface just makes things easier
>> for most users, it doesn't improve the trustability of the
>> environment. On the contrary, with a single point of failure,
>> we actually rely on a weaker solution on this regard.
>>> Also how would you have access to templates, and components
>>> like that from off-line?
>> Well, that just show how innefecient are this tools to
>> continue to contribute while being offline. It's allways
>> possible to install Mediawiki and download required
>> templates, but currently this process seems way to
>> complicated, doesn't it.
>>> Also we generally cannot download the images separately as
>>> that is usually part of the later clean-up where people have
>>> the technical skills.
>> I'm afraid the term "image" misguided your answer. It's seems
>> you interpreted that as picture elements from files, while I
>> was talking about this files themselves.
>>> So yes, there is the capacity to have the text and proofread
>>> the text, that actual checking the text against the image is
>>> not the sole component of proofreading, and further it would
>>> not be at all helpful for validation.
>> There is nothing magic about working directly in a browser.
>> People do download and upload all the required material
>> anyway, but on a page per page base. The result is just as
>> valid as it is done when transactions are operated on a file
>> repository level.
>> Wikisource-l mailing list
>> Wikisource-l mailing list
A person in a local Wikisource workshop asked me if we could download
all material of a specific work to proofread it offline. So download
both the pictures and the OCRed text. Additionaly I think it would be
good to provide tool to at least have side by side plain text and pictures.
So, are you aware of anything close to such a tool? :)
We are testing a trick, useful for IA items where there's no djvu file but
there's a _djvu.xml file.
_djvu.xml file is splitted into pages and uploaded "as it is" as page text.
An jQuery script can parse xml and convert it into an excellent plain text.
The same trick runs both in djvu and in pdf based Index pages. Another
advantage is that mapped text is saved as first version of page content and
that it can be recovered and used with no external tool.
While parsing xml, the same script can fix too some FineReader severe
mistakes from wrong analysis of text layout (wrong splitting of text into
columns/regions) using words coordinates.