It is not a trivial matter. The best bet would be to take an existing pdf import tool for a word processor, and try to write a similar tool for wikitext.
There is the Oracle PDF Import Extension for Open Office, the code can be browsed, maybe it can give you some ideas
http://extensions.services.openoffice.org/project/pdfimport
MicruOn Wed, Jun 12, 2013 at 12:38 PM, Alex Brollo <alex.brollo@gmail.com> wrote:
When we tried to convert into wiki code (a needed step to add links and to convert files into a "wiki hypertext") a pdf file, that's a opaque, closed format, such a work turned off in a nightmare. If we simply load free pdf books "as they are", I don't see any advantage, but "feed wikisource numbers/statistics" nd this in presently far from my personal interest.As you guess, I'm one of users who don't support Aubrey's enthusiasm about texts born digital, even if free. :-)Alex2013/6/12 David Cuenca <dacuetu@gmail.com>Nobody is saying anything about using copyrighted works, there are many books that have an open license that would allow to include them in Wikisource.
For instance in ca-ws we have this translation from 2009:
http://ca.wikisource.org/wiki/Llibre:El_secret_de_l%E2%80%99or_que_creix_%282009%29.djvu
The original is in the PD, and the translator gave away his rights. It would have been much easier to work directly with the pdf, instead of converting to djvu.
MicruEtiamsi omnes, ego nonOn Wed, Jun 12, 2013 at 10:47 AM, Aarti K. Dwivedi <ellydwivedi2093@gmail.com> wrote:
If I am not wrong, as of today, most books that were born digital, are still under copyright. Of course, they are available freely on the
internet. But we can't use the pirated copies. How would we go about the procurement of these books?If we procure these copyrighted books, then the only we would have to do is to check for proper formatting. Isn't it?--On Wed, Jun 12, 2013 at 7:58 PM, Lars Aronsson <lars@aronsson.se> wrote:
On 06/12/2013 02:48 PM, Andrea Zanni wrote:
We could define some tasks as
* corrected the page* OPTIONAL added optional templates/links/annotations
*...
Geotagged all the photos, ...
The list doesn't end. You need a generic mechanism
for any new feature you can invent. But aren't our
existing templates and categories the best way to
do this? You could just add to each page:
{{done|proofread=user1|validated=user2|geotagged=user4|...}}
--
Lars Aronsson (lars@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/
_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Aarti K. Dwivedi
_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
--
_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
--
Etiamsi omnes, ego non
_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l