user Alex Brollo has developed a very interesting tool which I think many
of you will find very useful.
(afaiu he developed this from a previous tool, but he can explain it better)
To test it, go on your oldwikisource account,
and in User:YOU/vector.js copy this:
The tool will appear under Tools, and it's called "Ritaglio immagini" (crop
It opens a window which allows you to select and crop an image directy on
It is extremely useful to use illustration directly on the page.
It is usable both in View and Edit mode.
In the Italian Wikisource we have made it a Gadget and button, so it's
active for everyone.
volcanic User:Alex brollo is working on "dictionaries", aka generating
lists of used words in Pages and works.
as a list of the book Il cavallarizzo.
I bet that in the next few years (with more books, more users, Wikidata,
and the world domination led by Wikisource) we would have more and more of
A list of used words of an ancient book could help customize OCR and tools
for typo corrections, for example.
Moreover, we will have Wikidata, and maybe we will need to store some
metadata (eg page numbers, or metadata about images and scans) into
Lua could help us build tools for creating automatic indexes, or textual
version in ns0 (eg precompile the pagelist tag...)
the question is: want we Wikisource communities a new Data namespace?
How do you like the idea? Would you want to have the Wikibase extension in
it, or just a normal namespace?
I'm sure you will find this mail confusing, but I think we are in the need
I just don't know what it is :-)
I'd like to announce the US National Archives' new virtual internship
Wikipedians. We are offering unpaid internships at the National
Archives for experienced Wikipedians with technical or community skills.
This is intended to be a way for Wikipedians interested in working on
NARA's GLAM efforts to formalize their affiliation with NARA, and receive
academic credit, work experience, and a reference. The interns will have a
staff mentor (me) to guide their work, and the chance to have a real impact
on the state of Wikipedia and public access to cultural heritage.
We are initially offering internships for Wikipedians with technical
skills, who would help us with Commons image uploads, analytics, etc., and
those skilled at organizing the Wikimedia community, to help coordinate our
WikiProject and communicate our activities. There is no required time
commitment or start date, and these sorts of details can be negotiated. I
would encourage anyone on this list with interest to apply or share out the
posting with other members of the Wikimedia community who might be a good
Please feel free to reply here or contact me personally if you have any
questions. More information and instructions for applying can be found
Digital Content Specialist, Wikipedian in Residence
National Archives and Records Administration
On 12/20/2013 10:23 PM, Lars Aronsson wrote:
> where some fine print is no longer legible. What I
> want is one that has only been reduced down
> to 300 dpi or so. How can I get that?
With a little help from okfn-labs (Open Knowledge
Foundation), here is a script that works for my book:
hex=`printf "%04X" $pag`
dec=`printf "%04d" $pag`
if [ ! -s $dec.jpg ]
echo -n .
wget -q -O $dec.jpg
echo -n :
pag=`expr $pag + 1`
That is the URL for one tile, but the tile that I
request starts at 0,0 and is 10000 pixels wide,
so it contains the full page 1800x2400 pixels,
in full (pct:100) = 300 dpi resolution.
This was faster than waiting for BL's webmaster's
response on Monday.
In my case, I want the JPEGs. But if you want to
use a book in Wikisource, you might want to
create a Djvu or PDF bundle of all the JPEGs for
the entire book.
Lars Aronsson (lars(a)aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/
I’m really sorry for all the issues affecting Wikisources. A fix for most of them have just been deployed (it have been an hard work) and I’ll try to fix all the remaining ones for the next Tuesday deployment.
Here is a list of the fixed issues:
* The image didn’t appear for Page: pages that are member of a multipage file but that haven’t as Index: page a page called Index:NAME_OF_THE_DJVU (it have affected mostly pl.wikisource)
* A part of some Page: pages body content appeared as part of their footer in edit mode when those pages contains a <noinclude> tag
* The color of the link to a Page: page was red in the Index: page when its level category name contained a whitespace (to fix this issue a purge of each Page: page affected is required)
* The creation of a Page: page using the API caused a fatal error
* An indentation was displayed on the first paragraph of a Page: page
Here is the remaining known issues (thanks to report ones that haven’t been listed here):
* The Page: pages edit summary contains twice the proofreading level change tag. A fix for it is on review that will let the software adds the tag on Page: page saving (the tag won’t be visible during the editing process.
* The addition of default header and footer content adds a strange string instead of tags like <references />. A fix for it is on review
* The body textarea on Page: pages editing is too big. I’m working on a fix that would use the size defined in User preferences instead.
* Fatal error on submit for a very few pages (a fix is on review)
* It isn’t possible anymore to zoom in with a mouse. I’m working on a fix
* the issue with gadget that Aubrey have reported here (I think that the solution is more on the gadget side that on the extension one)
* It’s not possible to edit only the body of a page throw the API
I’m going to work on automatized tests in the next weeks in order to avoid a so major number of bugs the next times.
PS: For people that ignore it, the maintenance work of the ProofreadPage extension is mostly done by volunteers like you. So, please be kind
On Mon, Dec 9, 2013 at 12:18 PM, Tom Morris <tfmorris(a)gmail.com> wrote:
> I'm not sure I agree. There's a lot of good data in OpenLibrary, but
> there's also a lot of junk. Freebase imported a bunch of OpenLibrary data,
> after winnowing it to what they thought was the good stuff, and still ended
> up deleting a bunch of the supposedly "good" stuff later because they found
> their goodness criteria hadn't been strict enough.
> One of the reasons OpenLibrary is such a mess is because *they*
> arbitrarily imported junky data (e.g. Amazon scraped records). The last
> thing the world needs is more duplicate copies of random junk. We've
> already got the DPLA for that. :-)
> Another issue with the OpenLibrary metadata is that there's no clear
> license associated with it. IA's position is that they got it from
> wherever they got it from and you're own your own if you want to reuse it,
> which isn't very helpful. The provenance for major chunks of it is
> traceable and new stuff by users is nominally being contributed under CC0,
> so they could probably be sorted out with enough effort (although the same
> thing is true of the data quality issues too).
Gosh, I withdraw my support for fully reusage of Open Library data.
That was probably the best efforts they can do in past years, before the
mass disponibilization of data dumps directly from well known libraries
catalogs, but now we are in a very different scenario.
Even a simple mass import from the already mentioned datahub  in the
openlibrary engine (open source software) without further editing will
generate best quality data.
 - http://datahub.io/group/bibliographic
Edward Summers, 09/12/2013 12:18:
> If OpenLibrary gets active again, [...]
Definition of active? The fact that there's no software
development/investment doesn't mean it's inactive. Are there stats on
users activity there and can it be compared in some way to ours as
regards that kind of data?