On Thu, Feb 24, 2011 at 10:55 AM, Moka Pantages <mpantages(a)wikimedia.org> wrote:
> Thanks for looping me in, Casey.
> Not sure who Dr. Nate is. We can try to recover.
> Let me know what you want to do. The info@ team at Twitter is pretty responsive.
I have spent a few hours trying to work out who it is, and I am pretty
sure they are not a wikisourcerer.
If you could recover this name, that would be good. It isn't urgent.
I found the new syntax
<ref name="...">...</ref> ...... <ref follow="...">...</ref> ....
A smart solution for a difficult trouble (splitted notes into different
pages). But I didn't find the way to use effectively, in nsPage, the syntax
<ref name="..." /> .... <references><ref name="...."></references>
It would be great to use it in nsPage, to let "in place" references to
enhance text readibility when proofreading and validating texts, and to save
original text structure for easier script comparison with djvu text layer.
Am I missing something? Can I found doc or examples of the latter syntax
into a proofread source work?
while exploring the djvu text layer, it.source community found interesting
features that is good thing to share (SPOILER ALERT: Wikisource reCAPTCHA).
(I added the technicalities in the footnotes, please look at them if you're
We discovered that when the text layer is extracted with djvuLibre
djvused.exe tool 
a text file is obtained, containing words and their absolute coordinates
into the image of the page.
Here a some example rows of such txt file from a running test:
(line 402 2686 2424 2757
(word 402 2699 576 2756 "State.")
(word 679 2698 892 2757 "Effects")
(word 919 2698 991 2756 "of")
(word 1007 2697 1467 2755 "Domestication")
(word 1493 2698 1607 2755 "and")
(word 1637 2697 1910 2757 "Climate.")
(word 2000 2698 2132 2756 "The")
(word 2155 2686 2424 2754 "Persians^"))
As you can see, the last word has a ^ character inside, that indicates a
doubtful, unrecognized character by OCR software.
What's really interesting is that python script can select these words using
the ^ character and produce automatically a file with the image of the word,
since all needed parameters for a ddjvu.exe call can be obtained (please
consider that this code comes from a rough, but *running* test script ).
So, in our it.source test script, a tiff image has been automatically
produced, exactly contaning the image of "Persians^" doubtful OCR output.
Its name is built as name-of-djvu-file+page number+coordinates into the
page, that it is all what is needed to link unambiguously the image and the
specific word into a specific page of a djvu file.
The image has been uploaded into Commons as
As you can easily imagine, this could be the core of a "wikicaptcha" project
(as John Vandenberg called it), enabling us to produce our own Wikisource
A djvu file could be uploaded into a server (into an "incubator"); a
database of doubtful word images could be built; images could be presented
to wiki users (both as a voluntary task or as a formal reCAPTCHA to confirm
edits by unlogged contributors); resulting human interpretation could be
validated somehow (i.e. by n repetitions of matching, different
interpretations) then used to edit text layer of djvu file. Finally the
edited djvu file could be uploaded to Commons for formal source managing.
Please contact us if you like to have a copy of running, test scripts.
There's too a shared Dropbox folder with the complete environment where we
are testing scripts.
Opinions, feedbacks or thoughts are more than welcome.
 command='djvused name-of.file.djvu -e "select page-number;
 if "^" in word:
command="ddjvu "+fileDjvu+" -page="+pag+"
-format=tiff "+segment+" "+filetiff
As you may know, the Wikimedia teach team has started to upgrade
MediaWiki on some wikis. MediaWiki is the software that runs all
The most visible change for Wikimedia users will be the deployment of
its delivery by compressing it sometimes, and cutting down on the amount
The installation of ResourceLoader may cause compatibility issues with
Trevor Parscal and Roan Kattouw, the main developers of ResourceLoader,
will be available on IRC  on Monday, February 14th, at 18:00 (UTC)
, to answer questions and help fix issues related to ResourceLoader.
Please spread this information as widely as possible; it's critical to
Logs of the session will be published publicly.
 All timezones: http://ur1.ca/3819u
Product manager - Wikimedia Foundation
Support free knowledge: http://donate.wikimedia.org