Scripto is an alternative to the ProofreadPage extension used
by Wikisource. It is based on Mediawiki but also on OpenLayers,
the software used to zoom and pan in OpenStreetMap.
The only website I have seen that uses Scripto is the U.K.
War Department papers, and in many ways it is more clumsy
than ProofreadPage. But there might be a few ideas that could
be worth picking up. Take a look.
The software is described at http://scripto.org/
As for reference installations, they mention
http://wardepartmentpapers.org/transcribe.php
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
As a member of "Wikisource users group", I feel the need of a common set of
templates, Lua modules, editing tools and conventions and I think that the
only mean to go from theory to practice is to find a "neutral" source
project, to upload there some works, and to start a "multi-project test".
IMHO, the ideal "neutral" project is oldwikisource (a very quiet
environment...) so I uploaded there Index:Labi 1996.djvu (coming from
it.source and already, in part, proofread) and Index:Labi 1997.djvu (the
real test work: nothing has been done by now).
Aubrey, Micru, what do you think about? Can this support in practice
activity of Wikisource users group?
Alex
Yesterday a blogpost announced a new feature
for editing Wikipedia from mobile.
http://blog.wikimedia.org/2013/07/25/edit-wikipedia-on-the-go/
Via twitter they confirmed that also Wikisource and all other project can
be edited:
do you confirm that? how is the user experience?
Maybe we could file some bugs (if any), to improve the Wikisource editing...
I don't have a smartphone, so I don't know if it is useful for us or not
(what about the proofread extension?)
We can contact them at @WikimediaMobile on Twitter.
Aubrey
Hi Aubrey,
Thanks for the heads-up, I have CC'ed Sébastien from fr-ws, he worked on
the djvu text extraction/merging and he was interested in following-up on
that. Maybe he has some fresh ideas about it.
Micru
On Tue, Jul 16, 2013 at 10:24 AM, Andrea Zanni <zanni.andrea84(a)gmail.com>wrote:
> Hi David, Aarti, thibaud and Tpt,
> please look at this thread:
>
> http://en.wikisource.org/wiki/Wikisource:Scriptorium#EPUB.2FHTML_to_Wikitext
> especially the last message.
>
> It seems George Orwell III knows his stuff about Djvu and Proofread
> extension,
> and it's probably worth digging into this "layer text" djvu thing.
>
> Even if I might dream of an ideal solution (a "layered structure" for
> wikisource, in which text can marked up several times in different layers)
> that is probably very far away.
>
> But it's still important to pave the way for further improvements, I guess:
> losing all the information from a formatted, mapped IA djvu it's not a
> good thing to do, IMHO.
> And the Visual Editor could help us, in the future, to keep some of that
> information (italics, bold, etc.)
>
> I know Aarti spoke with Alex about abbyy.xml: is it possible to do
> something with it?
>
> Aubrey
>
--
Etiamsi omnes, ego non
Dear all,
DPLA* is planning the first annual DPLAFest, in Boston this October 24-25.
It would be great to have groups in attendance from Wikisource,
Wikidata, and Commons. There are already some collaboration underway
on Commons:
http://commons.wikimedia.org/wiki/Commons:Digital_Public_Library_of_America
If groups from each project would be interested in organizing a tent
or sessions at the festival, space (both physical and on the agenda)
could be provided. This is a good opportunity to connect with the
heads of GLAM institutions, and the more technical curators and
archivists (who should be recruited to the generative side :-)
Sam.
* The Digital Public Library of America - a digital platform for
sharing digital collections, and metadata about physical collections,
of all sizes. Started in America, aiming to contribute to shared
standards for similar work everywhere in the world. Focused on
free-software toolchains, CC-0 metadata, and data APIs.
There's a reasonable need to get a shared, standard set of templates,
modules, js scripts for wikisource projects - all projects sharing a
common, identical goal and facing with the same, identical issues, and
needing the same set of international, standard metadata. Nevertheless it's
very difficult to synchronize efforts, while working into different
"boiling" projects; and I personally found very frustrating to admit that
some painful efforts to solve specific issues turned out to be simply a
"rediscovering the wheel". :-(
Oldwikisource, given its "neutral" character, could be IMHO the perfect
project to share the best of source projects, and there's a perfect kind of
works that could uploaded into oldwikisource and proofreaded using common,
shared styles & tools: they are multilanguage works.
Presently, we are going to upload and proofread a three-language (French,
German, Italian and some English too) magazine: Histoire des Alpes - Storia
delle Alpi - Geschichte der
Alpen<http://it.wikisource.org/wiki/Histoire_des_Alpes_-_Storia_delle_Alpi_-_Gesc…>.
It's released under CC-BY-SA-2.0 licence, My idea is, to upload it into
oldwikisource, transcluding it by Iwpage into any interested projecy - the
proofreading/formatting job being done into oldwikisource, with common
tools, common templates, common modules, common "styles". What do you think
about?
Alex (from it.wikisource)
Just to let you know what I'm doing: I'm exploring abbyy.xml (_abbyy.gz
file in Internet Archive file list).
The abbyy.xml file contains many data to go much ahead into
"self-formatting" of text - with details that can't be found into text
layer of djvu files. It contains the XCA_Extended version of xml output of
OCR: (http://www.abbyy-developers.com/en:tech:features:xml), and this is a
brief list of its useful features:
1. coordinates l,t,r,b of any element (from page to character )
2. three main "blockType": text, table, picture;
3. four level details of text areas: region, paragraph, line, character
(and a fifth one, word, can be calculated);
4. data about indenting, font size, word and character certainty of
recognition.
Using coordinates and original images, it's possible to extract images from
original page image; this could be useful both for a "wikiReCaptcha" engine
(extracting doubtful word text and their images) and to extract (or show
without extracting) pictures (the latter can be done showing a clone of
existing thumbnail of the page as the background of a div, and setting
appropriately div and overflow coordinates, with a very low server load).
In brief: all this stuff is extremely exciting, I'm going ahead with my
bold tries, but the matter deserves IMHO the interest of best source geeks
- I'm only playing with very limited skill with a rough layman programming
style.
Alex brollo (from it.wikisource)