Re: [Wikisource-l] Two requests for MediaViewer

1 Oct 2014

I have seen many messy text-image mixes on Google books, especially older
texts from manual typesetting days.  That's why I was wondering if it would
be possible to have a tool that stores pages as you go, so you can step in
and adjust it on a per page basis. I am not familiar with abbyy.xml files,
but this may be the way to go

On Wed, Oct 1, 2014 at 2:18 PM, Alex Brollo &lt;alex.brollo(a)gmail.com&gt; wrote:

...
  2014-10-01 9:18 GMT+02:00 Jane Darnell
&lt;jane023(a)gmail.com&gt;om>:

  Actually, I would rather have a tool that pulls
apart djvu files as they
 are uploaded; keeping the text in WS and the pics in Commons

 This is very interesting since abbyy.xml files contain both a full detail
 (character by character) detail of text mapping & format, and coordinates
 of any not-textual content (illustrations) of the scanned page. Using
 appropriately such data, it would be possible to extract automatically
 illustrations and other graphical elements of pages. nevertheless, I saw
 that such "self-cropping" of illustration sometimes fails, and often is
 confused by some unusual format of illustrations/graphical element, so that
 many "illustrations" are nonsense or have to be cropped again. Unluckily,
 djvu files have no such "illustration coordinates" inside.

 Alex

 _______________________________________________
 Wikisource-l mailing list
 Wikisource-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikisource-l] Two requests for MediaViewer