Dispenser kindly made a list of DjVu files on Commons linking an IA
item, with some information like global usage:
https://toolserver.org/~dispenser/temp/djvu2archive.org.txt (just change
the extension to csv to open it as a spreadsheet, tab-separated).
It's about 5000 books with 6-200 global usages and 5000 outside that
range (which probably means completely unused apart some talk pages or
whatever, or with most text already living on wiki pages).
If I manage to convince a "slash-admin", I'll get those 5000 re-OCR'd,
otherwise I need to do it manually so suggestions on priorities are
welcome. :)
Nemo
P.s.:
The used query
http://pastebin.com/L4EXDY5F and another one
http://pastebin.com/avg3LYti
-- 2> /dev/null; date; echo '
SELECT /*SLOW_OK*/
CONCAT("[[:File:", REPLACE(img_name, "_", " "),
"]]") AS "File",
el_from,
img_size,
REPLACE(img_user_text, "_", " ") AS "Uploader",
img_timestamp AS "Timestamp",
el_to,
(SELECT COUNT(*) FROM globalimagelinks WHERE gil_to=page_title) AS
"Usage",
(SELECT COUNT(*) FROM oldimage WHERE oi_name=page_title) AS "Reuploads",
SUBSTRING_INDEX(el_to,"/",-1) AS "Archive_Name"
FROM image
JOIN page ON page_namespace=6 AND page_title=CONVERT(img_name USING latin1)
JOIN externallinks ON el_from=page_id
WHERE img_media_type="BITMAP"
AND img_major_mime="image" AND img_minor_mime="vnd.djvu"
AND (el_index LIKE "http://org.archive.%/details/_%"
OR el_index LIKE "https://org.archive.%/details/_%")
;-- ' | sql -r commonswiki_p > ~/public_html/temp/djvu2archive.org.txt;
date;