There's been some discussion at
about the @wikisource Twitter account (and others).
I've just been made a team member of @wikisource, and we should add some
more people too.
The idea is that @wikisource is the cross-Wikisource account, and each
language Wikisource can also have its own account (e.g. @wikisource_fr
@wikisource_de, etc. here's a list I started this morning:
https://twitter.com/wikisource/lists/wikisources/members are there any
So, who wants to be added, and what are your Twitter usernames?
We'll also have @wikisource_en later today or tomorrow; same deal.
The passwords for these accounts are held by Aubrie Johnson, the WMF
social media person.
There is a huge collection of rare books in different languages at the
Endangered Archives Program website of British Library. (http://eap.bl.uk/)
1. The books are split into pages
2. Almost every pages have a scale/ruler beside the scanned page, so we
need to crop each and every page. (attached)
Can anyone please develop a tool (enhanced version of BUB) to process the
books and transfer them to Commons?
The Editing team at the Wikimedia foundation is planning (in a quite long term future) to cleanup some features of the wikitext markup language in order to be easier to parse and more compatible with current standards like HTML 5. It will ease a lot the improvements of some tools like the VisualEditor and Wsexport (the epub export tool).
The main project is to replace the current HTML cleanup software called Tidy that is very outdated by a cleaner one that follows the HTML 5 specification. But it would requires to fix some wiki pages to make sure that they keep being parsed well because the behaviour of the cleaning tool is going to change for some edge cases.
An introduction and tools to update the wiki pages are available at  and more details at . There is no hurry to update all pages soon but it would be nice to make sure that the list of things to update will not grow in the future.
FADGI is the Federal Agencies Digitization Guidelines Initiative:
The technical guideline has a number of sound suggestions:
(it also links dozens of English Wikipedia articles).
The verification software is available for free (only for Windows):
(Kakadu is not free software, sadly.)
-------- Messaggio inoltrato --------
Oggetto: Using Kakadu JPEG2000 Compression to Meet FADGI Standards
Data: Mon, 31 Jul 2017 21:00:46 +0000
Mittente: jeff kaplan
Using Kakadu JPEG2000 Compression to Meet FADGI Standards
The Internet Archive is grateful to the folks at Kakadu Software for
contributing to Universal Access to Knowledge by providing the world’s
leading implementation of the JPEG2000 standard, used in the Archive’s
image processing systems.
Here at the Archive, we digitize over a thousand books a day. JPEG2000,
an image coding system that uses compression techniques based on wavelet
technology, is a preferred file format for storing these images
efficiently, while also providing advantages for presentation quality
and metadata richness. The Library of Congress has documented its
adoption of the JPEG2000 file format for a number of digitization
projects, including its text collections on archive.org
Recently we started using their SDK to apply some color corrections to
the images coming from our cameras. This has helped us achieve FADGI
standards in our work with the Library of Congress.
Thank you, Kakadu, for helping make it possible for millions of books to
be digitized, stored, and made available with high quality on
If you are interested in finding out more about Kakadu Software’s
powerful software toolkit for JPEG2000 developers, visit
kakadusoftware.com <http://kakadusoftware.com> or email
Just to let it known, some it.source contributors are using a comfortable
gadget to manage diacritics - it can delete, replace or add a pretty large
list of diacritical marks to any character with a single click.
It uses .normalize() string method, so decomposing-recomposing (when
possible) unicode characters and allowing to manage diacritics alone
indipendently from base ascii character.
Perhaps is this gadget "rediscovering the wheel"....? Anyway, the code is