Yeah, Linus is kind of an asshole too. I don't see that as something to
emulate.
-- brion
On Wed, Jul 17, 2013 at 1:10 PM, David Cuenca <dacuetu(a)gmail.com> wrote:
Now that you mention it...
http://linux.slashdot.org/story/13/07/15/2316219/kernel-dev-tells-linus-tor…
Micru
On Wed, Jul 17, 2013 at 11:36 AM, Brion Vibber <brion(a)pobox.com> wrote:
I'm not sure his attitude will encourage
people to work with him to his
specifications.
-- brion
On Wed, Jul 17, 2013 at 8:12 AM, David Cuenca <dacuetu(a)gmail.com> wrote:
> I'm forwarding this message by George Orwell III on en-ws [1]. I think
it
> is extremely important as it offers an
insight about what is wrong with
> Djvu handling on Wikisource.
>
>
> "We/you are losing the X-min, Y-min, X-Max & Y-max (mapping
coordinates)
> because the original PHP contributing a-hole
for the DjVu routine on
our
> servers never bothered to finish the part
where the internal DjVu text
> layer is converted to a (coordinate rich) XML file using the existing
> DjVuLibre software package because, at the time, the software had
issues.
"That faulty DjVuLibre version was the equivalent of 4,317 versions ago
and
the issue has been long fixed now EXCEPT that the
.DTD file needed to
base
> the plain-text to XML conversion on still has the wrong 'folder path'
on
local
DjVuLibre installs (if this is true on server installs as well, I
cannot say for sure). Once I copied the folder to the [wrong] folder
path,
I was able to generate the XMLs all day long.
These XMLs are just like
the
> ones IA generates during their process (in addition to the XML that
AABBY
generates
for them).
"So its not that we as a community decided not to follow through with
(coordinate rich) XML generation but got stuck with the plain-text dump
workaround due to a DjVuLibre problem that no longer exists. Plus, the
guy
who created the beginnings of this fabulous
disaster was like tick with
an
attention span deficit and moved on to conjuring
up some other blasted
thing or another instead of following up on his own workaround & finish
the
XML coding portion once DjVuLibre glitch was
fixed. -- 15:16, 15 July
2013
(UTC)
[1]
http://en.wikisource.org/wiki/Wikisource:Scriptorium#EPUB.2FHTML_to_Wikitext
On Wed, Jul 17, 2013 at 6:57 AM, Alex Brollo <alex.brollo(a)gmail.com>
wrote:
> Just a brief comment about djvu text layer, using IA files to digging
> deeper the topic.
>
> FineReader OCR stores an incredibly detailed information in a
proprietary
> > format; then, various FineReader versions export something of this
> > extremely rich set of information into different outputs - one of
them
> > being djvu text layer. It's worth
to note that even if any
information
>
stored into djvu text layer can be extracted and used, the set of
> information wrapped into djvu text layer (both in lisp-like format or
in
xml
format) is only a minor subset of original OCR information.
If someone is interested to get much more information, it can find it
into
> abbyy.xml output; and Internet Archive gives it as abbyy.gz into the
list
> > of exportable files. It's a very heavy and complex xml structure but
it
is
> possible to parse it, end to extract from it any information wrapped
into
> djvu text layer and much more - most
interestingly, wortPenalty, that
is,
> > word by word, the resume of degree of incertainty of OCR recognition
of
the
> whole word.
>
> We (I and Aarti) are digging into this mess, with fast preliminary
> results; you can see into [[it:w:Utente:Alex brollo/Sandbox]] some
brief
> > pieces of text extracted from abbyy.gx, where doubtful words (in the
> > opinion of OCR software) are red. They can be easily managed by
> > VisualEditor - caming simply from a simple span tag.
> >
> > Now, I'm waiting dor Aarti work; as soon a VisualEditor for nsPage
will
> > run, it would be possible to extract
text by bot from abbyy.gz (if
the
> work
> > comes from IA) and to upload such text as OCR.
> >
> > Alex
> >
> >
> >
> > 2013/7/16 David Cuenca <dacuetu(a)gmail.com>
> >
> >> Hi Aubrey,
> >> Thanks for the heads-up, I have CC'ed Sébastien from fr-ws, he
worked
on
> >> the djvu text extraction/merging and he was interested in
following-up
on
> that. Maybe he has some fresh ideas about
it.
>
> Micru
>
> On Tue, Jul 16, 2013 at 10:24 AM, Andrea Zanni <
zanni.andrea84(a)gmail.com>wroteote:
>
>> Hi David, Aarti, thibaud and Tpt,
>> please look at this thread:
>>
>>
http://en.wikisource.org/wiki/Wikisource:Scriptorium#EPUB.2FHTML_to_Wikitext
> >>> especially the last message.
> >>>
> >>> It seems George Orwell III knows his stuff about Djvu and Proofread
> >>> extension,
> >>> and it's probably worth digging into this "layer text"
djvu thing.
> >>>
> >>> Even if I might dream of an ideal solution (a "layered
structure"
for
> >>> wikisource, in which text can
marked up several times in different
> layers)
> >>> that is probably very far away.
> >>>
> >>> But it's still important to pave the way for further improvements,
I
> >>> guess:
> >>> losing all the information from a formatted, mapped IA djvu it's
not
a
>>
good thing to do, IMHO.
>> And the Visual Editor could help us, in the future, to keep some of
that
>> information (italics, bold, etc.)
>>
>> I know Aarti spoke with Alex about abbyy.xml: is it possible to do
>> something with it?
>>
>> Aubrey
>>
>
>
>
> --
> Etiamsi omnes, ego non
> _______________________________________________
> Wikisource-l mailing list
> Wikisource-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
--
Etiamsi omnes, ego non
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
--
Etiamsi omnes, ego non
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l