[Wikisource-l] PDF/Djvu to Index

12 Apr 2010


      It is increasingly common to add books to Wikisource
by finding a PDF or Djvu file, uploading it to Commons,
and then to create an Index: page on Wikisource
for proofreading.
But this would be much easier if:
1) The fields (author, title, etc.) of the Index
page were filled in from the data already given
on Commons. (Yes, those could be wrong or need
additional care, but this could always be
edited afterwards, if initial values are fetched
from Commons.)
2) The <pagelist/> tag was already in the
"pages" box.
3) All pages were created in automatically
with the OCR text from Commons, instead
of leaving a long list of red links. (This
would require the text for each page to be
extracted, something that pdftotext can do
in seconds, but Commons takes weeks to do.)
Could this be automated? Is there already
some tool or bot that does this?
-- 
  Lars Aronsson (lars@aronsson.se)
  Aronsson Datateknik - http://aronsson.se

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikisource-l] PDF/Djvu to Index