Wikisource-l July 2013

wikisource-l@lists.wikimedia.org

10 participants
9 discussions

Scripto, free software for transcribing documents

by Lars Aronsson

Scripto is an alternative to the ProofreadPage extension used by Wikisource. It is based on Mediawiki but also on OpenLayers, the software used to zoom and pan in OpenStreetMap. The only website I have seen that uses Scripto is the U.K. War Department papers, and in many ways it is more clumsy than ProofreadPage. But there might be a few ideas that could be worth picking up. Take a look. The software is described at http://scripto.org/ As for reference installations, they mention http://wardepartmentpapers.org/transcribe.php -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

10 years, 7 months

Two test uploads into oldwikisource: a "sharing opportunity"

by Alex Brollo

As a member of "Wikisource users group", I feel the need of a common set of templates, Lua modules, editing tools and conventions and I think that the only mean to go from theory to practice is to find a "neutral" source project, to upload there some works, and to start a "multi-project test". IMHO, the ideal "neutral" project is oldwikisource (a very quiet environment...) so I uploaded there Index:Labi 1996.djvu (coming from it.source and already, in part, proofread) and Index:Labi 1997.djvu (the real test work: nothing has been done by now). Aubrey, Micru, what do you think about? Can this support in practice activity of Wikisource users group? Alex

10 years, 9 months

Editing Wikisource from Mobile

by Andrea Zanni

Yesterday a blogpost announced a new feature for editing Wikipedia from mobile. http://blog.wikimedia.org/2013/07/25/edit-wikipedia-on-the-go/ Via twitter they confirmed that also Wikisource and all other project can be edited: do you confirm that? how is the user experience? Maybe we could file some bugs (if any), to improve the Wikisource editing... I don't have a smartphone, so I don't know if it is useful for us or not (what about the proofread extension?) We can contact them at @WikimediaMobile on Twitter. Aubrey

10 years, 9 months

Re: [Wikisource-l] Proofread extension "extraction" of OCR text in Djvu

by David Cuenca

Hi Aubrey, Thanks for the heads-up, I have CC'ed Sébastien from fr-ws, he worked on the djvu text extraction/merging and he was interested in following-up on that. Maybe he has some fresh ideas about it. Micru On Tue, Jul 16, 2013 at 10:24 AM, Andrea Zanni <zanni.andrea84(a)gmail.com>wrote: > Hi David, Aarti, thibaud and Tpt, > please look at this thread: > > http://en.wikisource.org/wiki/Wikisource:Scriptorium#EPUB.2FHTML_to_Wikitext > especially the last message. > > It seems George Orwell III knows his stuff about Djvu and Proofread > extension, > and it's probably worth digging into this "layer text" djvu thing. > > Even if I might dream of an ideal solution (a "layered structure" for > wikisource, in which text can marked up several times in different layers) > that is probably very far away. > > But it's still important to pave the way for further improvements, I guess: > losing all the information from a formatted, mapped IA djvu it's not a > good thing to do, IMHO. > And the Visual Editor could help us, in the future, to keep some of that > information (italics, bold, etc.) > > I know Aarti spoke with Alex about abbyy.xml: is it possible to do > something with it? > > Aubrey > -- Etiamsi omnes, ego non

10 years, 9 months

DPLAFest: Wikisource and Wikidata participation?

by Samuel Klein

Dear all, DPLA* is planning the first annual DPLAFest, in Boston this October 24-25. It would be great to have groups in attendance from Wikisource, Wikidata, and Commons. There are already some collaboration underway on Commons: http://commons.wikimedia.org/wiki/Commons:Digital_Public_Library_of_America If groups from each project would be interested in organizing a tent or sessions at the festival, space (both physical and on the agenda) could be provided. This is a good opportunity to connect with the heads of GLAM institutions, and the more technical curators and archivists (who should be recruited to the generative side :-) Sam. * The Digital Public Library of America - a digital platform for sharing digital collections, and metadata about physical collections, of all sizes. Started in America, aiming to contribute to shared standards for similar work everywhere in the world. Focused on free-software toolchains, CC-0 metadata, and data APIs.

10 years, 9 months

Phe toolserver

by Andrea Zanni

Is Phe toolserver stats down for good? http://toolserver.org/~phe/statistics.php Aubrey

10 years, 9 months

Pitch In! SLQ project

by John Vandenberg

Late on Monday the State Library of Queensland launched its 'Pitch In' program, where the SLQ staff work with digital volunteers working on Flickr Commons, Trove, Wikisource and Historypin. http://www.slq.qld.gov.au/about-us/pitch-in Their 'Wikisource' page: http://www.slq.qld.gov.au/about-us/pitch-in/transcribe Our 'SLQ' page: https://en.wikisource.org/wiki/Wikisource:WikiProject_SLQ They are using 'pitchinslq' as the tag to communicate with the digital volunteers on twitter and facebook. https://twitter.com/search?q=%23pitchinslq https://www.facebook.com/hashtag/pitchinslq Already one work has been transcribed by volunteers, and the majority of the transcription came from a new contributor 'HelenVSmith', which is tweeting as Helen V Smith @HVSresearch. https://en.wikisource.org/wiki/Special:Contributions/HelenVSmith https://twitter.com/HVSresearch/status/354231873628680196 https://en.wikisource.org/wiki/Eleanor_Elizabeth_Bourne_Papers (note that you can add links to Wikipedia or Wiktionary to the text, if it helps comprehension. A great example is 'Persia', which links to the Wikipedia article 'SS Persia (1900)') SLQ is also providing their own transcription, where they have obtained it the old fashioned way. We've uploaded their transcription into Wikisource, so now we need people to carefully review the transcription. https://en.wikisource.org/wiki/Index:George_Green_-_2nd_Light_Horse_Regimen… The first 'big' project is the transcription of an 170 page book, with about 20 pages done so far. https://en.wikisource.org/wiki/Index:Australian_enquiry_book_of_household_a… The SLQ staff are helping: https://en.wikisource.org/wiki/Special:Contributions/Mawarre https://en.wikisource.org/wiki/Special:Contributions/Araunik And there online campaign is bringing in many new contributors: https://en.wikisource.org/wiki/Special:Contributions/Evelyn_Kruger https://en.wikisource.org/wiki/Special:Contributions/Danielseed https://en.wikisource.org/wiki/Special:Contributions/Pippigolightly https://en.wikisource.org/wiki/Special:Contributions/Penelopean https://en.wikisource.org/wiki/Special:Contributions/Carmattif https://en.wikisource.org/wiki/Special:Contributions/Sscoates https://en.wikisource.org/wiki/Special:Contributions/Natsa https://en.wikisource.org/wiki/Special:Contributions/Natalie Not bad for 48 hours! ;-) And finally, a big thanks to Craig Franklin for helping with the influx of Wikisource newbies ;-) -- John Vandenberg

10 years, 9 months

Revitalizing oldwikisource

by Alex Brollo

There's a reasonable need to get a shared, standard set of templates, modules, js scripts for wikisource projects - all projects sharing a common, identical goal and facing with the same, identical issues, and needing the same set of international, standard metadata. Nevertheless it's very difficult to synchronize efforts, while working into different "boiling" projects; and I personally found very frustrating to admit that some painful efforts to solve specific issues turned out to be simply a "rediscovering the wheel". :-( Oldwikisource, given its "neutral" character, could be IMHO the perfect project to share the best of source projects, and there's a perfect kind of works that could uploaded into oldwikisource and proofreaded using common, shared styles & tools: they are multilanguage works. Presently, we are going to upload and proofread a three-language (French, German, Italian and some English too) magazine: Histoire des Alpes - Storia delle Alpi - Geschichte der Alpen<http://it.wikisource.org/wiki/Histoire_des_Alpes_-_Storia_delle_Alpi_-_Gesc…>. It's released under CC-BY-SA-2.0 licence, My idea is, to upload it into oldwikisource, transcluding it by Iwpage into any interested projecy - the proofreading/formatting job being done into oldwikisource, with common tools, common templates, common modules, common "styles". What do you think about? Alex (from it.wikisource)

10 years, 9 months

Exploring abbyy.xml: a layman trip

by Alex Brollo

Just to let you know what I'm doing: I'm exploring abbyy.xml (_abbyy.gz file in Internet Archive file list). The abbyy.xml file contains many data to go much ahead into "self-formatting" of text - with details that can't be found into text layer of djvu files. It contains the XCA_Extended version of xml output of OCR: (http://www.abbyy-developers.com/en:tech:features:xml), and this is a brief list of its useful features: 1. coordinates l,t,r,b of any element (from page to character ) 2. three main "blockType": text, table, picture; 3. four level details of text areas: region, paragraph, line, character (and a fifth one, word, can be calculated); 4. data about indenting, font size, word and character certainty of recognition. Using coordinates and original images, it's possible to extract images from original page image; this could be useful both for a "wikiReCaptcha" engine (extracting doubtful word text and their images) and to extract (or show without extracting) pictures (the latter can be done showing a clone of existing thumbnail of the page as the background of a div, and setting appropriately div and overflow coordinates, with a very low server load). In brief: all this stuff is extremely exciting, I'm going ahead with my bold tries, but the matter deserves IMHO the interest of best source geeks - I'm only playing with very limited skill with a rough layman programming style. Alex brollo (from it.wikisource)

10 years, 10 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikisource-l July 2013