Re: [Wikisource-l] archive.org switched to ABBYY FineReader 9.0

11 Oct 2013

Dispenser kindly made a list of DjVu files on Commons linking an IA 
item, with some information like global usage: 
https://toolserver.org/~dispenser/temp/djvu2archive.org.txt (just change 
the extension to csv to open it as a spreadsheet, tab-separated).
It's about 5000 books with 6-200 global usages and 5000 outside that 
range (which probably means completely unused apart some talk pages or 
whatever, or with most text already living on wiki pages).
If I manage to convince a "slash-admin", I'll get those 5000 re-OCR'd, 
otherwise I need to do it manually so suggestions on priorities are 
welcome. :)

Nemo

P.s.:
The used query http://pastebin.com/L4EXDY5F and another one 
http://pastebin.com/avg3LYti

-- 2> /dev/null; date; echo '
SELECT /*SLOW_OK*/
   CONCAT("[[:File:", REPLACE(img_name, "_", " "),
"]]") AS "File",
   el_from,
   img_size,
   REPLACE(img_user_text, "_", " ") AS "Uploader",
   img_timestamp AS "Timestamp",
   el_to,
   (SELECT COUNT(*) FROM globalimagelinks WHERE gil_to=page_title) AS 
"Usage",
   (SELECT COUNT(*) FROM oldimage WHERE oi_name=page_title) AS "Reuploads",
   SUBSTRING_INDEX(el_to,"/",-1) AS "Archive_Name"
FROM image
JOIN page ON page_namespace=6 AND page_title=CONVERT(img_name USING latin1)
JOIN externallinks ON el_from=page_id
WHERE img_media_type="BITMAP"
AND img_major_mime="image" AND img_minor_mime="vnd.djvu"
AND (el_index LIKE "http://org.archive.%/details/_%"
OR  el_index LIKE "https://org.archive.%/details/_%")
;-- ' | sql -r commonswiki_p > ~/public_html/temp/djvu2archive.org.txt; 
date;

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikisource-l] archive.org switched to ABBYY FineReader 9.0