I have identified an organization which is willing to spend up to about EUR 10,000 on adding support for exporting MediaWiki pages as PDF files, and improving document management for documents consisting of multiple pages.
My current thinking is that the functionality implemented, as a minimum, would be as follows: a) Using an extension, integrate of a "PDF link" on any wiki page which would call an external library like HTMLDOC on a single wiki page b) Support filters on the rendered HTML (replacing image thumbnails with high resolution images, filter content by regular expression, etc.), and revision filters (export last revision edited by user on whitelist Y, or approximating currentdate-Z) c) Create a "PDF basket" UI which makes it possible to compile a PDF from multiple pages easily (and rearrange the pages in a hierarchy). The resulting structures could potentially also be stored as wikitext, using a new <structure> extension tag, so that they can be used both by individuals compiling PDFs for personal use, and by groups collaborating on complex documents.
Possibly some budget could also be allocated for improving the external PDF library used, especially if we can allocate additional funds for this project.
I'd like to request comments on this approach, specifically: - Besides HTMLDOC, do you know a good (X)HTML-to-PDF library which could be used for this purpose? - Within this budget, do you believe an alternative approach which utilizes an intermediate format is viable (e.g. wiki-to-Docbook-to-PDF), given the complexity of the MediaWiki syntax, its various extensions, and the need to keep up with parser changes? - If you are a developer, would you be interested in working on this project, and available to do so? (If so, please contact me privately.)
Any other comments would also be appreciated.
On 1/4/07, Erik Moeller erik@wikimedia.org wrote:
I have identified an organization which is willing to spend up to about EUR 10,000 on adding support for exporting MediaWiki pages as PDF files, and improving document management for documents consisting of multiple pages.
I once implemented something somewhat like this. Personal project, highly unpolished, not suitable for mainstream use - but it worked for what I needed it to.
I did it by writing a wikitext -> LaTeX converter.
Now, my stuff didn't handle images or tables, and those are the bits that are the hardest to convert, so this may be pointless. However, it might at least be worth looking into. LaTeX does make very nice-looking output with not a whole lot of fuss.
Just my two cents.
-- Josh
Erik Moeller wrote:
c) Create a "PDF basket"
Betwwen a) and this one, the ability to automatically put (recursively or not) the article belonging to a category (with or without the category itself, or as a hyperlinked summary maybe) in this "basket".
All together, it's quite an interesting move. Hopefully it will be released as GNU/GPL, so it can be improved over the years by anyone.
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Erik Moeller Sent: 05 January 2007 04:02 To: Wikimedia developers; MediaWiki announcements and site admin list Subject: [Mediawiki-l] RfC: PDF and document management for MediaWiki
I have identified an organization which is willing to spend up to about EUR 10,000 on adding support for exporting MediaWiki pages as PDF files, and improving document management for documents consisting of multiple pages.
My current thinking is that the functionality implemented, as a minimum, would be as follows: a) Using an extension, integrate of a "PDF link" on any wiki page which would call an external library like HTMLDOC on a single wiki page b) Support filters on the rendered HTML (replacing image thumbnails with high resolution images, filter content by regular expression, etc.), and revision filters (export last revision edited by user on whitelist Y, or approximating currentdate-Z) c) Create a "PDF basket" UI which makes it possible to compile a PDF from multiple pages easily (and rearrange the pages in a hierarchy). The resulting structures could potentially also be stored as wikitext, using a new <structure> extension tag, so that they can be used both by individuals compiling PDFs for personal use, and by groups collaborating on complex documents.
Possibly some budget could also be allocated for improving the external PDF library used, especially if we can allocate additional funds for this project.
I'd like to request comments on this approach, specifically:
- Besides HTMLDOC, do you know a good (X)HTML-to-PDF library which
could be used for this purpose?
(X)HTML transformed with XSL to XSL-FO and then use Apache FOP for PDF generation.
- Within this budget, do you believe an alternative approach which
utilizes an intermediate format is viable (e.g. wiki-to-Docbook-to-PDF), given the complexity of the MediaWiki syntax, its various extensions, and the need to keep up with parser changes?
The standard docbook tools perform the same process as above, but start with DocBook which is transformed with XSL to XSL-FO.
Jared
Question, does anything like this already exist for RTF for DOC formats?
peace, ted
On 1/4/07, Erik Moeller erik@wikimedia.org wrote:
I have identified an organization which is willing to spend up to about EUR 10,000 on adding support for exporting MediaWiki pages as PDF files, and improving document management for documents consisting of multiple pages.
My current thinking is that the functionality implemented, as a minimum, would be as follows: a) Using an extension, integrate of a "PDF link" on any wiki page which would call an external library like HTMLDOC on a single wiki page b) Support filters on the rendered HTML (replacing image thumbnails with high resolution images, filter content by regular expression, etc.), and revision filters (export last revision edited by user on whitelist Y, or approximating currentdate-Z) c) Create a "PDF basket" UI which makes it possible to compile a PDF from multiple pages easily (and rearrange the pages in a hierarchy). The resulting structures could potentially also be stored as wikitext, using a new <structure> extension tag, so that they can be used both by individuals compiling PDFs for personal use, and by groups collaborating on complex documents.
Possibly some budget could also be allocated for improving the external PDF library used, especially if we can allocate additional funds for this project.
I'd like to request comments on this approach, specifically:
- Besides HTMLDOC, do you know a good (X)HTML-to-PDF library which
could be used for this purpose?
- Within this budget, do you believe an alternative approach which
utilizes an intermediate format is viable (e.g. wiki-to-Docbook-to-PDF), given the complexity of the MediaWiki syntax, its various extensions, and the need to keep up with parser changes?
- If you are a developer, would you be interested in working on this
project, and available to do so? (If so, please contact me privately.)
Any other comments would also be appreciated.
Peace & Love, Erik
DISCLAIMER: This message does not represent an official position of the Wikimedia Foundation or its Board of Trustees. _______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Question, does anything like this already exist for RTF for DOC formats?
I'd be very interested in this as well.
Has there been any development since the original RfC (as in, a suitable programmer was found, the project was started, ETA is tomorrow at noon)?
-- Frederik
mediawiki-l@lists.wikimedia.org