Hi.
Is there any preference for a python pdf library, in case one would like to add pdf file processing to pywikibot? Or is it good enough, if possible, to rely on pdfinfo (which I guess is linux-only)?
Suggestions/comments appreciated.
Mpaa
I sometimes use xpdf binaries for Windows, what kind of pdf processing do you need? Really, I convert pdf files into djvu as soon as possible.....
Alex
2016-04-13 21:18 GMT+02:00 Mpaa mpaa.wiki@gmail.com:
Hi.
Is there any preference for a python pdf library, in case one would like to add pdf file processing to pywikibot? Or is it good enough, if possible, to rely on pdfinfo (which I guess is linux-only)?
Suggestions/comments appreciated.
Mpaa
pywikibot mailing list pywikibot@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikibot
On 14 Apr 2016 02:18, "Mpaa" mpaa.wiki@gmail.com wrote:
Hi.
Is there any preference for a python pdf library, in case one would like
to add pdf file processing to pywikibot?
I have no preference, or experience. Pypdf2 seems to be the most commonly used. There are a few worrying Python 3 encoding bugs.
Or is it good enough, if possible, to rely on pdfinfo (which I guess is
linux-only)?
For a script, using pdfinfo is definately good enough IMO. reflinks uses pdfinfo.
There are precompiled binaries available for windows at http://www.foolabs.com/xpdf/download.html
-- John
@Alex since IA is not using djvu any longer, on en.wikisource there is demand of a script similar to djvutxt.py for pdf (or it could be a single one for both formats ...)
On Thu, Apr 14, 2016 at 7:35 PM, John Mark Vandenberg jayvdb@gmail.com wrote:
On 14 Apr 2016 02:18, "Mpaa" mpaa.wiki@gmail.com wrote:
Hi.
Is there any preference for a python pdf library, in case one would like
to add pdf file processing to pywikibot?
I have no preference, or experience. Pypdf2 seems to be the most commonly used. There are a few worrying Python 3 encoding bugs.
Or is it good enough, if possible, to rely on pdfinfo (which I guess is
linux-only)?
For a script, using pdfinfo is definately good enough IMO. reflinks uses pdfinfo.
There are precompiled binaries available for windows at http://www.foolabs.com/xpdf/download.html
-- John
pywikibot mailing list pywikibot@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikibot
My way: pdf2djvu converts perfectly IA pdfs into excellent djvu files, so I download pdf, convert it into djvu and upload it into Commons. Probably very soon IA Upload bot will upload djvu files from IA as previously, simply grabbing the pdf and converting it into djvu before uploading them into Commons.
pdftotext (one of xpdf routines) has some problems coming from different text layer of pdf files; ie, I found that in many pdf files soft hyphens are removed. This is a big problem while proofreading.
IMHO it would be a pity to emulate IA policy and to shift from djvu to pdf - djvu files are simply "open & free pdf", and wiki loves freedom.
Alex
2016-04-14 20:29 GMT+02:00 Mpaa mpaa.wiki@gmail.com:
@Alex since IA is not using djvu any longer, on en.wikisource there is demand of a script similar to djvutxt.py for pdf (or it could be a single one for both formats ...)
On Thu, Apr 14, 2016 at 7:35 PM, John Mark Vandenberg jayvdb@gmail.com wrote:
On 14 Apr 2016 02:18, "Mpaa" mpaa.wiki@gmail.com wrote:
Hi.
Is there any preference for a python pdf library, in case one would
like to add pdf file processing to pywikibot?
I have no preference, or experience. Pypdf2 seems to be the most commonly used. There are a few worrying Python 3 encoding bugs.
Or is it good enough, if possible, to rely on pdfinfo (which I guess is
linux-only)?
For a script, using pdfinfo is definately good enough IMO. reflinks uses pdfinfo.
There are precompiled binaries available for windows at http://www.foolabs.com/xpdf/download.html
-- John
pywikibot mailing list pywikibot@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikibot
pywikibot mailing list pywikibot@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikibot
On Fri, Apr 15, 2016 at 1:29 AM, Mpaa mpaa.wiki@gmail.com wrote:
@Alex since IA is not using djvu any longer
Oh? Where I can read more about this policy change by IA?
John Mark Vandenberg, 15/04/2016 02:19:
Oh? Where I can read more about this policy change by IA?
https://lists.wikimedia.org/pipermail/wikisource-l/2016-March/002735.html
Nemo