Re: [Wikisource-l] On linking Wikisource with page images

22 Jan 2008

Hi Gregory and everyone,

On Jan 22, 2008 11:22 AM, Gregory Maxwell &lt;gmaxwell(a)gmail.com&gt; wrote:
...
  On Jan 21, 2008 7:16 PM, Jesse Martin (Pathoschild)
 &lt;pathoschild(a)gmail.com&gt; wrote:
  That's a good point. How about a much cleaner
syntax that can be used
 to generate the OCR markup? With your example text:
 {{ocr line| The first experiments were made on the absorption of carbonic }}
 {{ocr line| acid gas by water: and here a singular disagreement was observed }}
 {{ocr line| in the first trials made under exactly the same circumstances. It }}

 This is much easier to read, you know where the line breaks go, and
 it's immediately clear even to someone stumbling across the text that
 we're specifically keeping track of lines (so they don't helpfully
 remove unneeded line breaks). Since single line breaks are ignored by
 MediaWiki, we can just use the same line width so the template syntax
 lines up for easier ignoring. 
 Oh that gets it most of the way there.. but could I still smuggle in
 the coords? ;) like:

 {{ocr line|551-4202-2666-4278-1|The first experiments were made on the
 absorption of carbonic}}

 I suppose I could also make the coords base 60 or so.. so they would be shorter. 
I dont understand why the HTML output needs to have the DJVU markers;
it could be in the raw text.  Would it be acceptable to have one line
per printed line, and hidden comments as required.  i.e.

---
The first experiments were made on the absorption of carbonic <!--
DJVU position: 551-4202-2666-4278-1 -->
acid gas by water: and here a singular disagreement was observed <!--
DJVU position: ... -->
in the first trials made under exactly the same circumstances. It <!--
DJVU position: ... -->
---

How will words that are broken across two lines be handled ?

I understand that these DJVU files will probably have a lot of
corrections initially.  Are you planning on updating the DJVU file on
commons incrementally, or after the entire DJVU has been proof-read ?

--
John

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikisource-l] On linking Wikisource with page images