In a separate thread (sorry--digest mode bit me), Dominic wrote:
Many cultural institutions are developing their own crowdsourced transcription
projects. I think Wikisource can be a much more robust platform than these
one-off projects, with a more well-developed community that aggregates the
transcription efforts of texts from many institutions in a single place
with a proven process.
I'm a big fan of Wikisource, and have recommended it, but I don't think
that data extraction is the biggest barrier to adoption the GLAM sector
faces. Branding is a much, much bigger deal. I talked about this the ALA
this summer (
http://manuscripttranscription.blogspot.com/2014/07/collaborative-digitizat…
-- see the slide with a screenshot of Wiksource next to one of Letters
1916, which uses DIY History/Scripto as its platform):
"The first one is is the French-language version of Wikisource. Wikisource
is a sister project to Wikipedia that was spun off around 2003 that allows
people to transcribe documents and do OCR correction both. This is being
used by the Departmental Archives of Alpes-Maritimes to transcribe a
set of journals
of episcopal visits
<http://fr.wikisource.org/wiki/Livre:FRAD006_001J201.pdf>. The bishop in
the sixteenth century would go around and report on all the villages [in
his diocese], so there's all this local history, but it's also got some
difficult paleography.
"So they're using Wikisource
<http://manuscripttranscription.blogspot.com/2012/04/french-departmental-arc…>,
which is a great tool! It has all kinds of version control. It has ways to
track proofreading. It does an elegant job of putting together indiviual
pages into larger documents. But, do you see "Departmental Archives of
Alpes-Maritimes" on this page? No! You have no idea [who the institution
is]. Now, if they're using this internally, that may be fine -- it's a
powerful tool.
"By contrast, look at the Letters of 1916
<http://dh.tcd.ie/letters1916/diyhistory/>. [Three sentences inaudible.]
This is public engagement in a public-facing site. "
There were a lot of nods in the room, and even more when I revisited the
slide in a crowdsourcing workshop a month later.
If an institution were able to attach a custom stylesheet to pages
displaying its 'project', if it were able to send users to an attractive
homepage for its 'project', showing the project's materials, and recent
activity on them, with ways for admins to monitor their volunteers'
questions or discussions on talk pages, or announce news -- that would drop
that barrier to entry. At the moment, a GLAM that points its users to
Wikisource effectively 'loses' them -- they're sending them off to a
different community and a different site that just happens to contain
copies of the institution's material, with no easy way for the users to get
back to the institution.
That said, think bulk export of transcripts would help, especially if there
were an easy way for the institution to match each transcript to the
identifier in its own system. Plaintext may be good enough for e.g. a
library that's using a CMS and just wants their docs to be searchable.
I've seen TEI recommended in the past, and while I'm a big fan, I suspect
it's of secondary importance.
Ben
What do we see as the next components for Wikisource?
What are our major hurdles for system development?
If we were offered development help where do people think that we
should be making use of that help? Is it incremental fixes,
transactional changes, or are we wanting transformational changes,
completely new features, and new opportunities?
Regards, Billinghurst
On Wed, Nov 19, 2014 at 10:18 AM, Emmanuel Engelhart <kelson(a)kiwix.org>
wrote:
> [...]
>
> We also plan to use this code base to aggregate other online PD/free books
> libraries. Wikisource is one of the first we would love to add, this might
> be done pretty easily as soon as an OPDS feed is available.
>
> How we can develop an OPDS feed for Wikisource?
Maybe this is the big chance to us start to ignore some misunderstandings
that we've inherited from Wikipedia ("local communities autonomy") and
remember that in the Libraries world standardization of practices is a big
plus?
I mean specially on ways to deliver to users contents by subject or author
despite it's language of production...
Hi,
The Kiwix team is happy to release the whole Project Gutenberg
(http://www.gutenberg.org/) library in a ZIM format:
http://download.kiwix.org/zim/gutenberg/gutenberg_mul_all_2014-11.zim.torre….
We also provide a few language specific versions here
http://download.kiwix.org/zim/gutenberg.
This file is dedicated to an offline usage (no connection to Internet)
and it readable with Kiwix (http://www.kiwix.org). This allows anybody
with a computer or a smartphone to own his own copy of this 50.000 books
big library. You can also make it available for read to other people on
your network, they only need a web browser.
In this ZIM file, you will find all the books available in HTML
(directly readable), but also in EPUB (and time to time in PDF). We have
created a custom user interface which is really simple to use: in a few
clicks you can find your book, read it or download it. What is also
unique is that Kiwix proposes a fulltext search engine over all books
content. You can see by yourself using this demonstration web site:
http://library.kiwix.org/gutenberg_mul_all_2014-11/
Most of the work was done during a week long hackathon in Lyon, France
by four Kiwix volunteer developers. This hackathon was funded by the
Fondation Orange with the administrative help of Framasoft and Wikimedia
CH. The Fondation Orange is the first beneficiary of this work and use
it already for its own deployments in Africa.
The solution to build this ZIM file is 100% free software and is
available here https://github.com/kiwix/gutenberg. This solution allows
to release easily new up2date versions. This is not a "one shot" project
and we will release periodically new version of this offline version of
the Project Gutenberg.
We also plan to use this code base to aggregate other online PD/free
books libraries. Wikisource is one of the first we would love to add,
this might be done pretty easily as soon as an OPDS feed is available.
We hope to see this work deployed by other third part organisation which
are on place where Internet is not available, expensive or censored. We
also need more developer (mostly Python) workforce for the next steps, a
hackathon with this purpose will hopefully be organised in 2015 (sponsor
needed). Last but not least: users, please report any problem here:
https://github.com/kiwix/gutenberg/issues
Regards
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication
Hello guys,
are you preparing for the contest?
Can we use this mailing list to coordinate and understand how many contests
will be there?
It would be nice if we could coordinate, or at least use shared scripts and
tools.
Maybe it's better to have an overview of what is needed to run the contest.
WHAT DO YOU NEED
* a collection of books to be proofread
this is really easy :-)
* a Wikisource contest page
like this one:
https://it.wikisource.org/wiki/Wikisource:Undicesimo_compleanno_di_Wikisour…
* some social media coverage
you can use social media etc., we always try to convince the it.wikipedia
to use their SiteNotice... Of course, you must also use your own Wikisource
sitenotice.
* some awards
if you have a national Wikimedia chapter, it's good to ask for few bucks.
In Italy, we awarded 3 prizes with just 100 euros (50 euros worth book
voucher as a 1st prize, 30 and 20 for 2nd and 3rd).
* a way to count validated and proofread pages.
If I'm not mistaken, the code is here: http://pastebin.com/Vk6ikCUg
WHAT YOU NEED TO DECIDE
* time
In Italy, we wanted to go from 24 November at 00.01 till 1st December at
23.59.
* scoring
In it.source, we will probably award 3 points for every proofread page and
1 point for every validated page.
* awards
Last year, it.source only allowed to validate pages (and not to proofread).
We awarded the first prize to the "user who validated more pages".
The second and the third prizes instead were randomly extracted from the
others: but the more pages a user validated, the more chances he had.
Every validated page (or point) counts as a lottery ticket: the more I
have, the more chances too.
Cristian Cantoro made also this awesome tool to pick the 3 winners, we
should adapt it for every contest: http://balist.es/wscontest/
I believe everything can be easily adapted if you both allow to proofread
and validate at the same time
I really hope many Wikisources will be present this year :-)
Aubrey
I'd like to get text layer of a djvu page, just as proofread extension
does, by an API call or any exotic trick, in different settings from the
usual trigger condition (the creation of a new page).
Is this possible? I browsed API doc but I failed.
Alex brollo
Dear all,
on 26 October I made a little test on the Italian Wikisource.
I've always been an enthusiast about the WSexoport, Tpt's tool for epub
conversion. But I always thought that the link was not visible enough...
So, I've been bold and added the link to the converter directly in the
Header template.
The result, I think, is quite stunning:
on the 26th we had 4738 dowloads, now we are at 8700!
In ten days we doubled the number of downloads we had in few years...
I think it is a simple but powerful edit. You just have to put in your
header a thing like this:
<div class="noprint" style="text-align: left;">
<small>{{epub|{{PAGENAME}}|testo=Scarica questo testo come
EPUB}}</small></div>
I suggest you keep track of the stats *before* and *after* the edit here
http://wsexport.wmflabs.org/tool/stat.php
Let me know how it goes! :-)
Aubrey
The Fifth International Conference on Digital Information and
Communication Technology and its Applications (DICTAP2015)
Faculty of Engineering - Lebanese University, Beirut, Lebanon
April 29 – May 01, 2015
http://sdiwc.net/conferences/dictap2015/
The conference is technically co-sponsored by IEEE Lebanon Section. All
registered papers will be submitted to IEEE for inclusion to IEEE Xplore
as well as other Abstracting and Indexing (A&I) databases.
=================================================================
You are invited to submit your papers to the conference. The DICTAP2015
welcomes submissions on any topic in the field of digital information,
communications technology and any related topics:
- Security in Information and Telecommunication System
- Network Systems and Devices
- Wireless and Optical Communications
- Algorithms, Architecture, and Infrastructures
- Information Content Security
- Cloud Computing and Computer Networks
- Sensor Networks and Embedded System
- E-Learning, E-Commerce, E-Business and E-Government
- Data Exchange Issues and Supply Chain
- Information Retrieval
- Web Services, Web based Application
- Data Grids, Data and Information Quality
- Data Warehouses and Data Mining
- Image Analysis and Image Processing
- Management and Diffusion of Multimedia Applications
- Mobile, Ad Hoc and Sensor Network Security
- Video Search and Video Mining
- Enterprise Computing
- Web Mining including Web Intelligence and Web 3.0
- Knowledge Management
- Compression and Coding
- XML and other extensible languages
- Intelligent and Robust System
- ICT for Social and Humanity
- Security and Access Control
- Constraint Programming
- Ubiquitous Systems
- Semantic Web, Ontologies and Rules
- Communication Protocols, Communication Systems
- Network Management Techniques
- Telecommunication Business & Regulation
- Modeling, Algorithm, and Optimization
- Information Theory, System, and Technology
- Scientific Computing and Multimedia Processing
- Transmission, Antenna & Propagation
- Artificial Intelligence and Decision Support Systems
- Data Life Cycle in Products and Processes
- Information Visualization
- Web Metrics and its Applications
- Data Models for Production Systems and Services
- Data, Text, and Web Content Mining
- Multimedia and Interactive Multimedia
- Case Studies on Data Management, Monitoring and Analysis
- Mobile Data Management
- Computer Graphics
- Soft Computing
- Networks Security, Encryption and Cryptography
- Peer to Peer Data Management
- Natural Language Processing
- Human-Computer Interaction
- Distributed Information Systems
- Temporal and Spatial Databases
- Digital Rights Management
- Quality of Service Issues
- Interoperability
Papers should be submitted electronically as pdf format without
author(s) name. You can submit your research paper at
http://sdiwc.net/conferences/dictap2015/paper-submission/
IMPORTANT DATES
===============
Submission Deadline: March 1, 2015
Notification of Acceptance: March 22, 2015
Camera Ready Submission: March 30, 2015
Registration: March 30, 2015
Conference Dates: April 29 – May 01, 2015