Classic Wikisource issues
-------- Messaggio inoltrato --------
Oggetto: Can You Help us Make the 19th Century Searchable?
Data: Fri, 21 Aug 2020 20:32:17 +0000
Mittente: Brewster Kahle
<http://blog.archive.org/2020/08/21/can-you-help-us-make-the-19th-century-se…>
Can You Help us Make the 19th Century Searchable?
In 1847, Frederick Douglass started a newspaper
<https://archive.org/details/pub_frederick-douglass-paper> advocating
the abolition of slavery that ran until 1851. After the Civil War,
there was a newspaper for freed slaves, the Freedmen’s Record
<https://archive.org/details/pub_freedmens-record>. The Internet
Archive is bringing these and many more works online for free public
access. But there’s a problem:
Our Optical Character Recognition (OCR), while the best commercially
available OCR technology, is not very good at identifying text from
older documents.
Take for example, this newspaper from 1847. The images
<https://archive.org/details/sim_frederick-douglass-paper_1847-12-03_1_1>
are not that great, but a person can read them:
The problem is our computers’ optical character recognition tech gets
it wrong
<https://archive.org/stream/sim_frederick-douglass-paper_1847-12-03_1_1/sim_…>,
and the columns get confused.
What we need is “Culture Tech” (a riff on fintech, or biotech) and
Culture Techies to work on important and useful projects–the things we
need, but are probably not going to get gushers of private equity
interest to fund. There are thousands of professionals taking on similar
challenges in the field of digital humanities and we want to complement
their work with industrial-scale tech that we can apply to cultural
heritage materials.
One such project would be to work on technologies to bring 19th-century
documents fully digital.. We need to improve OCR to enable full text
search, but we also need help segmenting documents into columns and
articles. The Internet Archive has lots of test materials and thousands
are uploading more documents all the time.
What we do not have is a good way to integrate work on these projects
with the Internet Archive’s processing flow. So we need help and ideas
there as well.
Maybe we can host an “Archive Summer of CultureTech” or something…Just
ideas. Maybe working with a university department that would want to
build programs and classes around Culture Tech… If you have ideas or
skills to contribute, please post a comment here or send an email to
info(a)archive.org with some of this information.
The post Can You Help us Make the 19th Century Searchable?
<http://blog.archive.org/2020/08/21/can-you-help-us-make-the-19th-century-se…>
appeared first on Internet Archive Blogs <http://blog.archive.org>.
Hello everyone,
Hope everyone is safe and sound in the light of the ongoing Pandemic. I
would like to share an update regarding the progress of the Movement
Strategy and the design of the transition events.
The Transition Design Group [1] has been meeting over the past month to
discuss the design of the virtual events that will aim to facilitate
the inclusive
Transition process for the movement to start implementing the Movement
Strategy recommendations [2]. The Design Group discussions have been largely
centred around people and their ease of participation, processes for the
Transition events, and the legitimacy of the process, needed resources, and
communications [3]. Transition will be a turning point for our movement to
create a 18-month collaborative implementation work plan that will empower
our community and affiliates.
The virtual Transition events are being planned from the month of September
to December, 2020. We invite your feedback on the draft outline for the
transition events to ensure the events can be as inclusive, participative,
and engaging as possible [4]. The draft outline offers both light and
detailed information regarding the events. The Transition events aim to be
easy to join only once or for multiple events. They are for everyone,
whether a newcomer or a seasoned strategy enthusiast. They are being
designed for diverse participation across time zones and regions in order
to create a movement-wide implementation plan. The review period for the
draft outline is till August 20. After receiving your feedback, the Design
Group will finalize the plan and the Wikimedia Foundation will ensure the
delivery of the events according to the design.
Please comment on the Meta talk page [4] and feel free to use the questions
below as an orientation:
1. How can the plan be improved? In your opinion, what are some barriers
to entry that must be lowered so everyone can take part in Transition?
2. How can we make sure that you and your community have what you need
to participate in the Transition events?
3. If you have attended other virtual events, what has your experience
been like and what lessons can be applied in this case?
You are also encouraged to share your individual feedback directly via
email strategy2030(a)wikimedia.org for any suggestions or ideas.
On behalf of the Design Team,
Rupika
[1]
https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2018-20/Transit…
[2]
https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2018-20/Recomme…
[3]
https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2018-20/Transit…
[4]
https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2018-20/Transit…
Hello everyone,
I am pleased to announce that Wikisource Pagelist Widget
<https://meta.wikimedia.org/wiki/Wikisource_Pagelist_Widget> is now
available on Beta Wikisource. We need your help in testing the widget and
providing feedback.
You can test the widget by editing the following Index page on Beta
Wikisource:
https://en.wikisource.beta.wmflabs.org/wiki/Index:War_and_Peace.djvu
There is a new ‘Preview pagelist’ button under the pagelist field. Click it
to get a preview, and then click on any page number in the preview to open
the widget.
You can also create a fresh Index Page for any other PDF or DjVu from
Wikimedia Commons and test it.
We need your feedback on the following questions:
-
What is your general opinion about the Pagelist Widget?
-
Is it obvious how to use the widget? If not, what is difficult to
understand?
-
What other changes would you like to suggest in order to improve the
widget further?
Please provide your feedback on the Project Talk Page on Meta-Wiki
<https://meta.wikimedia.org/wiki/Talk:Wikisource_Pagelist_Widget>.
P.S. - The widget doesn’t work on Local Uploads in Beta Wikisource as of
now (due to T257807).
Regards,
Sohom Datta.
Hello everyone,
I am delighted to share this with everyone that the improved IA-Upload tool
<https://ia-upload.toolforge.org/> is now available for use. The tool now
provides a drop-down feature to select whether to directly upload PDFs from
IA to Commons or to upload DJVUs using the existing workflow.
The tool also checks (using iwbacklinks) whether the requested IA item has
already been uploaded under a different name, and if so it shows an error.
The improvements were made by User:Lautgesetz from Panlex as a part of the
following Project Grant: Grants:Project/PanLex/Balinese palm-leaf
transcription platform on Wikisource
<https://meta.wikimedia.org/wiki/Grants:Project/PanLex/Balinese_palm-leaf_tr…>
There are a few more improvements to the tool than what I mentioned above
and you can read more about them here:
https://github.com/wikisource/ia-upload/pull/42
Thank you!
Satdeep
--
Satdeep Gill (pronouns - he, him)
Program Officer
GLAM and Underrepresented Knowledge
Wikimedia Foundation <https://wikimediafoundation.org/>