On Fri, Apr 15, 2016 at 1:29 AM, Mpaa <mpaa.wiki(a)gmail.com> wrote:
> @Alex
> since IA is not using djvu any longer
Oh? Where I can read more about this policy change by IA?
--
John Vandenberg
Hi,
I added a new tool: https://tools.wmflabs.org/phetools/not_transcluded/ to
provide a list of Index containing corrected or validated page which are not
transcluded from main:, see the README.txt.
--
phe
Hi,
I don't know where things are with OCR for non-latin scripts, so maybe
this is not relevant anymore. Last time I grabbed information about it,
there was limitation with the google service which was a problem namely
for Indic languages. Well, yesterday we had a contribution day around
Alsatian and Franconian dialects
<https://fr.wikipedia.org/wiki/Discussion_Projet:Alsace#Journ.C3.A9e_contrib…>
where I had the opportunity to talk with some linguists. One of them
told me that google was in fact using tesseract
<https://github.com/tesseract-ocr> for its OCR service, which is open
source. According to what she told me (or at least what I remember from
this), it works with a trans-script training machine, you have to define
matching between picture sample and character and there it goes. Looking
quickly at the langdata repository I see that there are stuff about
Devenagari, which I believe is a script used in at least a part of Indic
texts, isn't it?
Hope that may help,
mathieu
Applications are now open for an expert workshop to be held in Kraków,
Poland, on 12 July 2016, 9:30am - 4:00pm, as part of the Digital Humanities
2016 conference (http://dh2016.adho.org/).
[UPDATE: The Wellcome Library is looking to explore specific questions
around crowdsourcing as part of the DH2016 workshop "*Beyond The Basics:
What Next For Crowdsourcing?*" In order to encourage wide participation in
this event, the Wellcome Library has funds to support travel by scholars
outside of Europe. Participants applying for funding should note this on
the workshop application form. If you have already applied and would like
funding you can contact Christy Henshaw at c.henshaw(a)wellcome.ac.uk. ]
[UPDATE: Despite the "sold out" message on the DH2016 registration page,
there are currently 15 open positions for the workshop. We apologize for
the mis-communication.]
We welcome applications from all, but please note that we will aim balance
expertise, disciplinary backgrounds, experience with different types of
projects, and institutional and project affiliations when finalising our
list of participants. This workshop is not suitable for beginners.
Participants should have some practical knowledge of running crowdsourcing
projects or expertise in human computation, machine learning or other
relevant topics. You can apply to attend at
https://docs.google.com/forms/d/1l05Rba3EqMyy-X4UVmU9z7hQ-jlK2x2kLGvNtJfgtg…
Beyond The Basics: What Next For Crowdsourcing?
Crowdsourcing - asking the public to help with inherently rewarding tasks
that contribute to a shared, significant goal or research interest related
to cultural heritage collections or knowledge - is reasonably well
established in the humanities and cultural heritage sector. The success of
projects such as Transcribe Bentham, Old Weather and the Smithsonian
Transcription Center in processing content and engaging participants, and
the subsequent development of crowdsourcing platforms that make launching a
project easier, have increased interest in this area. While emerging best
practices have been documented in a growing body of scholarship, including
a recent report from the Crowd Consortium for Libraries and Archives
symposium, this workshop looks to the next 5 - 10 years of crowdsourcing in
the humanities, the sciences and in cultural heritage. The workshop will
gather international experts and senior project staff to document the
lessons to be learnt from projects to date and to discuss issues we expect
to be important in the future.
Topics for discussion will be grouped by participants in an
unconference-style opening session in which topics are proposed and voted
on by participants. They are likely to include the following:
Public humanities, education and audiences:
- The use of crowdsourcing in formal education
- Designing research questions that encourage participation and create
space for informal education, the social production of knowledge and
collaborative problem solving
- The intersection of crowdsourcing, public humanities and engagement
with cultural heritage and academic goals
- Resolving tensions between encouraging participants to follow
opportunities for informal learning and skills development, and focusing
on
project productivity
Organisational and project management issues:
- Collaborative partnerships/funding to develop community platform(s)
based on open source software
- The state of focused research into interface design, engagement
methods, and end-user impact studies
- Design tensions between techniques that can improve the productivity
of projects (such as handwritten text recognition and algorithmic
classification) and optimising the user experience
- Workflows for crowdsourced data and the ingestion of crowdsourced data
into collections management systems
- Challenges to institutional and expert authority
- The compromises, pros and cons involved in specifying and selecting
crowdsourcing software and platforms
- The impact of crowdsourcing on organisational structures and resources
- Inter-institutional cooperation or competition for crowdsourcing
participants
Future challenges:
- The integration of machine learning and other computational techniques
with human computation
- Lessons to be learnt from the long histories of grassroots and
community history projects
- Sharing lessons learnt and planning peer outreach to ensure that
academics and cultural heritage professionals can benefit from collective
best practice
- The ethics of new and emerging forms of crowdsourcing
The timetable will include a brief round of introductions, a shared
agenda-setting exercise, four or so discussion sessions, and a final
session for closing remarks and to agree next steps.
The discussion and emergent guidelines documented during the workshop would
help future projects benefit from the collective experiences of
participants. Outcomes from the workshop might include a whitepaper and/or
the further development of or support for a peer network for humanities
crowdsourcing.
The workshop is organised by Mia Ridge (British Library), Meghan Ferriter
(Smithsonian Transcription Centre), Christy Henshaw (Wellcome Library) and
Ben Brumfield (FromThePage).
We anticipate accepting 30 participants. You can apply to attend at
https://docs.google.com/forms/d/1l05Rba3EqMyy-X4UVmU9z7hQ-jlK2x2kLGvNtJfgtg…
On notification of acceptance, we will send detailed instructions for
formal registration.
For more information, please contact benwbrum(a)gmail.com and mia.ridge(a)bl.uk,
who will be in contact with the rest of the organisers.
Regards,
Ben W. Brumfield
http://fromthepage.com/http://manuscripttranscription.blogspot.com
You probably just saw it, but we have the posibility of trying the VE in
the main namespace on Wikisources now :-)
Here's there is a guide:
https://www.mediawiki.org/wiki/Help:VisualEditor/The_visual_editor_at_Wikis…
.
It's be awesome if everyone of you could
* spread the word
* spread the guide
* translate it :-)
Let me know if there are some issues.
Aubrey