Wikisource-l November 2012

wikisource-l@lists.wikimedia.org

11 participants
14 discussions

Scripto, free software for transcribing documents

by Lars Aronsson

Scripto is an alternative to the ProofreadPage extension used by Wikisource. It is based on Mediawiki but also on OpenLayers, the software used to zoom and pan in OpenStreetMap. The only website I have seen that uses Scripto is the U.K. War Department papers, and in many ways it is more clumsy than ProofreadPage. But there might be a few ideas that could be worth picking up. Take a look. The software is described at http://scripto.org/ As for reference installations, they mention http://wardepartmentpapers.org/transcribe.php -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

10 years, 7 months

Fwd: [Wikitech-l] LabeledSectionTransclusion performance problems

by Guillaume Paumier

Forwarding to this list because Wikisource makes heavy use of LabeledSectionTransclusion. (forwarding from my personal address because the other isn't subscribed to this list) ---------- Forwarded message ---------- From: Guillaume Paumier <gpaumier(a)wikimedia.org> Date: Fri, Nov 30, 2012 at 1:09 PM Subject: Re: [Wikitech-l] LabeledSectionTransclusion performance problems To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> On Fri, Nov 30, 2012 at 10:07 AM, Merlijn van Deen <valhallasw(a)arctus.nl> wrote: > > After the new version of LabeledSectionTransclusion (LST) was deployed on > itwikisource, performance issues popped up. itwikisource's main page makes > heavy use of LST, and the new version is clearly heavier than the old one. As a sidenote: because of the performance issues, the most recent changes to the LST extension will probably be reverted today (Friday, November 30). If you made changes to articles or templates to accommodate the new version or benefit from new features, you may want to revert those changes temporarily. -- Guillaume Paumier

11 years, 4 months

New LabeledSectionTransclusion code breaks Wikisource

by Federico Leva (Nemo)

See <http://lists.wikimedia.org/pipermail/wikitech-l/2012-November/064741.html> The new version was disabled on at least some Wikisources and the changes will also be reverted in the code until fixed, apparently, but they may come back. <https://wikisource.org/w/index.php?title=Wikisource:Scriptorium&diff=339889…> Nemo

11 years, 4 months

[Fwd: Adam Hyde] Booktype project at WMF for PediaPress etc.

by Federico Leva (Nemo)

See http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/65398 This is relevant for all projects heavily using the Collection extension, I guess. Nemo

11 years, 4 months

Fwd: [Wikimedia-l] Creating Wikikultur

by Luiz Augusto

FYI, proposal for a new Wikimedia project or Wikibooks/Wikisource inclusion criteria ---------- Mensagem encaminhada ---------- De: "Mathieu Stumpf" <psychoslave(a)culture-libre.org> Data: 27/11/2012 14:37 Assunto: [Wikimedia-l] Creating Wikikultur Para: <Wikimedia-l(a)lists.wikimedia.org> Hello, I just added this proposition : http://meta.wikimedia.org/** wiki/Wikikultur <http://meta.wikimedia.org/wiki/Wikikultur> Here is the summary : Currently there are still some digital works which can not be published on a wikimedia projects, because none of them have the right editorial guideline to host them. Things like original poetry, songs, essays, theses, novels, etc., wether they were already previously published or not. It would be great to see wikimedia launch a project to host this kind of works. The no original research for wikipedia is understable and very important of course. Now what's relevant for an encyclopedia may not fit others projects which need different editorial guidelines. In existing projects Wikibooks is too pedagogic works oriented, and wikisource won't accept works which were not previously published elsewhere. Moreover, one may argue that it would be a good way to softly evacuate originals works from wikipedia, with a message like "your contribution contains original claims, wikipedia is not the place to publish this kind of content, but you could share your original work on wikikultur". Then eventualy, the wikipedia article could use wikikultur arcticles as references. This would give access to authoring information (eventually anonymous/IP claims), and all avantages of a free/libre work on a wiki. For example, we may have statistics on articles, so we can check if it's not used as a reference of an over represented point of view in an wikipedia article. On artistics topics, I think it would really help to boost the free-libre culture movement to have a place where every artists can directly experiment what it means to share and build together, with an audience intertwinned with other mediawiki projects. ---- It looks like other proposed projects would be included in such a project, like Wikiessay (I just discovered and I'm gonna read others projects description, sorry), but others aspect don't seem - at first glance - to be covered, like artistics topics. Kind regards, mathieu -- Association Culture-Libre http://www.culture-libre.org/ ______________________________**_________________ Wikimedia-l mailing list Wikimedia-l(a)lists.wikimedia.**org <Wikimedia-l(a)lists.wikimedia.org> Unsubscribe: https://lists.wikimedia.org/**mailman/listinfo/wikimedia-l<https://lists.wikimedia.org/mailman/listinfo/wikimedia-l>

11 years, 4 months

pdftotext

by Lars Aronsson

Is the "pdftotext" program used when extracting the OCR text layer from a PDF file? In this book, http://fr.wikisource.org/wiki/Livre:Liste_provisoire_des_noms_destines.pdf it seems that using "pdftotext -raw" would produce a better result than the current one. If you download the source PDF file and try to run pdftotext with and without the -raw option, you will see a difference in how some very boldface words are produced: H e l l o (without -raw) and Hello (with -raw), respectively; and also in the column separation of some pages, e.g. page 81 (De Roster--Herborn), where Dyck is followed by E (with -raw) or G (without -raw). The man page for pdftotext says -raw is deprecated, but I don't understand why, as it produces the best result. -- Lars Aronsson (lars(a)aronsson.se) Projekt Runeberg - fri nordisk litteratur - http://runeberg.org/

11 years, 4 months

Re: [Wikisource-l] [Mediawiki-i18n] Extension to wrap certain characters in a span

by Federico Leva (Nemo)

Wikisource is using WebFonts (in some subdomains): how are the language codes added there? Nemo Nick White, 19/11/2012 10:51: > In case the previous description was unclear, below is an example of > what the extension does: > > original wikitext: > > Hello this is English, τηφσ ισ θοδδω, and back to English again. > > processed to become this: > > Hello this is English, <span lang="grc">τηφσ ισ θοδδω,</span> and back to English again. > > The span attributes can be configured with a global configuration > variable, as can the range of characters that should be enclosed in > it. > > Again, any response would be very welcome. > > Nick > > _______________________________________________ > Mediawiki-i18n mailing list > Mediawiki-i18n(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n >

11 years, 4 months

Re: [Wikisource-l] [Wikimedia-l] A place for project wide discussions

by Federico Leva (Nemo)

Andrew Gray, 16/11/2012 16:43: > On 16 November 2012 15:30, Yaroslav M. Blanter <putevod(a)mccme.ru> wrote: > >> But I think the point (at leat originally) was not so much to have the >> global discussion forum or the global village pump, but to have a common >> place for Wikivoyage discussions, which so far were held on the old >> Wikivoyage, but now are stale since the old Wikivoyage is locked for >> editing, and anyway it is not a WMF project. > > Wikisource has a multilingual central project: among other things, one > of the goals is centralised cross-language coordination, through eg/ > > http://wikisource.org/wiki/Wikisource:Scriptorium > > It doesn't seem to be very heavily used, but the precedent is still > there. I don't see any reason that Wikivoyage couldn't have a > centralised wiki as well... wikisource.org is there for historical reasons because originally language subdomains for wikisource weren't planned. Its daily "core business" is 1) being the Wikisource portal, 2) hosting languages without a subdomain. Nemo

11 years, 5 months

Proofreading bug: Change not allowed

by Alex Brollo

When proofreading pages in it.wikisource, we are plagued by this message: Change not allowed - You are not allowed to change the proofreading status of this page Going back to failed page, we noted that: 1. quality level is reverted to 1; 2. anything stored into header has been lost; 3. proofreadpage_username is set to an empty string The bug is impredictable, it occurs with different frequencies to different users when trying to edit different pages. Something wrong seems to occur when building the page code merging variables, header, body and footer contents. We wrote an "emergency fixing script" but it can only be called after the edit turns out to be wrong. Is something similar happening into any other wikisource project? Can you give us a link to understand what's happening? And - where can we find the js script which merges variables-header-body-footer into page code to send it to the server? Alex

11 years, 5 months

Re: [Wikisource-l] [WikiDA-l] [GLAM] Library e-books on demand

by Lars Aronsson

On 11/08/2012 01:53 PM, Ole Palnatoke Andersen wrote: > The book has been digitized now. I can see it at > http://www.kb.dk/e-mat/dod/130019427200.pdf Looking at this file, pdfinfo says it was created by Finereader Recognition Server and that the page size is 595.44 × 841.68 pts. If "pts" is 1/72 of an inch, this would mean 8.27 × 11.69 inches, close to letter size, which is clearly unrealistic. I would guess that the pages of the physical book is roughly half of that or 4-5 × 6-7 inches. The images are 1170 × 1873 pixels, and I would estimate the scanning resolution to be in the range 250 to 300 dpi. That's good enough. The included OCR text looks like this for a text page (page 20 of the PDF, paginated -12-): ventede den, men meente jeg forstoed ncr- sten alt Norsk, fisen jeg t mine yngre Aar ogsaa havde varet her nogen tort Tltd, og da langt lcrttere kom til rette t daglig Samtale. Imidlertid blev jkg snart vaec at Adflilligheven beroede paa den Kster og Vesterlandske äisleÄcs heel store og It's not bad that it read "forstoed", with the long "s" and the old spelling "oe". But on the second line "siden" was read as "fisen", which is incorrect. Not a single "æ" is correct, which is odd for trying to recognize Danish, while "ä" erroneously appears on the last line of this excerpt. This indicates that it really tries to recognize the German alphabet, albeit with a Danish dictionary. This is "the usual quality" for OCR of blackletter (fraktur), and not radically good. I uploaded the work to http://runeberg.org/glossnor/ with the OCR text provided. It's now ready for your proofreading. -- Lars Aronsson (lars(a)aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/

11 years, 5 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikisource-l November 2012