Scripto is an alternative to the ProofreadPage extension used
by Wikisource. It is based on Mediawiki but also on OpenLayers,
the software used to zoom and pan in OpenStreetMap.
The only website I have seen that uses Scripto is the U.K.
War Department papers, and in many ways it is more clumsy
than ProofreadPage. But there might be a few ideas that could
be worth picking up. Take a look.
The software is described at http://scripto.org/
As for reference installations, they mention
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
Forwarding to this list because Wikisource makes heavy use of
(forwarding from my personal address because the other isn't
subscribed to this list)
---------- Forwarded message ----------
From: Guillaume Paumier <gpaumier(a)wikimedia.org>
Date: Fri, Nov 30, 2012 at 1:09 PM
Subject: Re: [Wikitech-l] LabeledSectionTransclusion performance problems
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
On Fri, Nov 30, 2012 at 10:07 AM, Merlijn van Deen <valhallasw(a)arctus.nl> wrote:
> After the new version of LabeledSectionTransclusion (LST) was deployed on
> itwikisource, performance issues popped up. itwikisource's main page makes
> heavy use of LST, and the new version is clearly heavier than the old one.
As a sidenote: because of the performance issues, the most recent
changes to the LST extension will probably be reverted today (Friday,
If you made changes to articles or templates to accommodate the new
version or benefit from new features, you may want to revert those
FYI, proposal for a new Wikimedia project or Wikibooks/Wikisource inclusion
---------- Mensagem encaminhada ----------
De: "Mathieu Stumpf" <psychoslave(a)culture-libre.org>
Data: 27/11/2012 14:37
Assunto: [Wikimedia-l] Creating Wikikultur
Hello, I just added this proposition : http://meta.wikimedia.org/**
Here is the summary :
Currently there are still some digital works which can not be published on
a wikimedia projects, because none of them have the right editorial
guideline to host them. Things like original poetry, songs, essays, theses,
novels, etc., wether they were already previously published or not.
It would be great to see wikimedia launch a project to host this kind of
The no original research for wikipedia is understable and very important of
course. Now what's relevant for an encyclopedia may not fit others projects
which need different editorial guidelines. In existing projects Wikibooks
is too pedagogic works oriented, and wikisource won't accept works which
were not previously published elsewhere.
Moreover, one may argue that it would be a good way to softly evacuate
originals works from wikipedia, with a message like "your contribution
contains original claims, wikipedia is not the place to publish this kind
of content, but you could share your original work on wikikultur". Then
eventualy, the wikipedia article could use wikikultur arcticles as
references. This would give access to authoring information (eventually
anonymous/IP claims), and all avantages of a free/libre work on a wiki. For
example, we may have statistics on articles, so we can check if it's not
used as a reference of an over represented point of view in an wikipedia
On artistics topics, I think it would really help to boost the free-libre
culture movement to have a place where every artists can directly
experiment what it means to share and build together, with an audience
intertwinned with other mediawiki projects.
It looks like other proposed projects would be included in such a project,
like Wikiessay (I just discovered and I'm gonna read others projects
description, sorry), but others aspect don't seem - at first glance - to be
covered, like artistics topics.
Wikimedia-l mailing list
Is the "pdftotext" program used when extracting
the OCR text layer from a PDF file?
In this book,
it seems that using "pdftotext -raw" would produce
a better result than the current one.
If you download the source PDF file and try to run
pdftotext with and without the -raw option, you
will see a difference in how some very boldface
words are produced: H e l l o (without -raw) and
Hello (with -raw), respectively;
and also in the column separation of some pages,
e.g. page 81 (De Roster--Herborn), where Dyck
is followed by E (with -raw) or G (without -raw).
The man page for pdftotext says -raw is deprecated,
but I don't understand why, as it produces the
Lars Aronsson (lars(a)aronsson.se)
Projekt Runeberg - fri nordisk litteratur - http://runeberg.org/
Wikisource is using WebFonts (in some subdomains): how are the language
codes added there?
Nick White, 19/11/2012 10:51:
> In case the previous description was unclear, below is an example of
> what the extension does:
> original wikitext:
> Hello this is English, τηφσ ισ θοδδω, and back to English again.
> processed to become this:
> Hello this is English, <span lang="grc">τηφσ ισ θοδδω,</span> and back to English again.
> The span attributes can be configured with a global configuration
> variable, as can the range of characters that should be enclosed in
> Again, any response would be very welcome.
> Mediawiki-i18n mailing list
Andrew Gray, 16/11/2012 16:43:
> On 16 November 2012 15:30, Yaroslav M. Blanter <putevod(a)mccme.ru> wrote:
>> But I think the point (at leat originally) was not so much to have the
>> global discussion forum or the global village pump, but to have a common
>> place for Wikivoyage discussions, which so far were held on the old
>> Wikivoyage, but now are stale since the old Wikivoyage is locked for
>> editing, and anyway it is not a WMF project.
> Wikisource has a multilingual central project: among other things, one
> of the goals is centralised cross-language coordination, through eg/
> It doesn't seem to be very heavily used, but the precedent is still
> there. I don't see any reason that Wikivoyage couldn't have a
> centralised wiki as well...
wikisource.org is there for historical reasons because originally
language subdomains for wikisource weren't planned.
Its daily "core business" is 1) being the Wikisource portal, 2) hosting
languages without a subdomain.
When proofreading pages in it.wikisource, we are plagued by this message:
Change not allowed - You are not allowed to change the proofreading status
of this page
Going back to failed page, we noted that:
1. quality level is reverted to 1;
2. anything stored into header has been lost;
3. proofreadpage_username is set to an empty string
The bug is impredictable, it occurs with different frequencies to different
users when trying to edit different pages. Something wrong seems to occur
when building the page code merging variables, header, body and footer
contents. We wrote an "emergency fixing script" but it can only be called
after the edit turns out to be wrong.
Is something similar happening into any other wikisource project? Can you
give us a link to understand what's happening? And - where can we find the
js script which merges variables-header-body-footer into page code to send
it to the server?
On 11/08/2012 01:53 PM, Ole Palnatoke Andersen wrote:
> The book has been digitized now. I can see it at
Looking at this file, pdfinfo says it was created
by Finereader Recognition Server and that the
page size is 595.44 × 841.68 pts. If "pts" is 1/72
of an inch, this would mean 8.27 × 11.69 inches,
close to letter size, which is clearly unrealistic.
I would guess that the pages of the physical
book is roughly half of that or 4-5 × 6-7 inches.
The images are 1170 × 1873 pixels, and I would
estimate the scanning resolution to be in the
range 250 to 300 dpi. That's good enough.
The included OCR text looks like this for a text
page (page 20 of the PDF, paginated -12-):
ventede den, men meente jeg forstoed ncr-
sten alt Norsk, fisen jeg t mine yngre Aar
ogsaa havde varet her nogen tort Tltd,
og da langt lcrttere kom til rette t daglig
Samtale. Imidlertid blev jkg snart vaec
at Adflilligheven beroede paa den Kster
og Vesterlandske äisleÄcs heel store og
It's not bad that it read "forstoed", with
the long "s" and the old spelling "oe".
But on the second line "siden" was read
as "fisen", which is incorrect. Not a
single "æ" is correct, which is odd for
trying to recognize Danish, while "ä"
erroneously appears on the last line
of this excerpt. This indicates that it
really tries to recognize the German
alphabet, albeit with a Danish dictionary.
This is "the usual quality" for OCR of
blackletter (fraktur), and not radically
I uploaded the work to
with the OCR text provided.
It's now ready for your proofreading.
Lars Aronsson (lars(a)aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/