Re: [Wikisource-l] Converting pdf files into wiki markup

14 Jun 2013


      2013/6/12 David Cuenca dacuetu@gmail.com:
...
It is not a trivial matter. The best bet would be to take an existing pdf
import tool for a word processor, and try to write a similar tool for
wikitext.
There is the Oracle PDF Import Extension for Open Office, the code can be
browsed, maybe it can give you some ideas
http://extensions.services.openoffice.org/project/pdfimport
PDF scraping is a technique that's is gaining more and more attention
since a lot of data on the web are hidden in PDF; so some libraries
for this task are under development.
My favorite language being Python I will suggest this blog post:
http://blog.scraperwiki.com/2010/12/17/scraping-pdfs-now-26-less-unpleasant-...
C

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikisource-l] Converting pdf files into wiki markup