[Mediawiki-l] RfC: PDF and document management for MediaWiki
jared.williams1 at ntlworld.com
Fri Jan 5 13:54:07 UTC 2007
> -----Original Message-----
> From: mediawiki-l-bounces at Wikimedia.org
> [mailto:mediawiki-l-bounces at Wikimedia.org] On Behalf Of Erik Moeller
> Sent: 05 January 2007 04:02
> To: Wikimedia developers; MediaWiki announcements and site admin list
> Subject: [Mediawiki-l] RfC: PDF and document management for MediaWiki
> I have identified an organization which is willing to spend up to
> about EUR 10,000 on adding support for exporting MediaWiki pages as
> PDF files, and improving document management for documents consisting
> of multiple pages.
> My current thinking is that the functionality implemented, as a
> minimum, would be as follows:
> a) Using an extension, integrate of a "PDF link" on any wiki page
> which would call an external library like HTMLDOC on a single wiki
> b) Support filters on the rendered HTML (replacing image thumbnails
> with high resolution images, filter content by regular expression,
> etc.), and revision filters (export last revision edited by user on
> whitelist Y, or approximating currentdate-Z)
> c) Create a "PDF basket" UI which makes it possible to compile a PDF
> from multiple pages easily (and rearrange the pages in a hierarchy).
> The resulting structures could potentially also be stored as wikitext,
> using a new <structure> extension tag, so that they can be used both
> by individuals compiling PDFs for personal use, and by groups
> collaborating on complex documents.
> Possibly some budget could also be allocated for improving the
> external PDF library used, especially if we can allocate additional
> funds for this project.
> I'd like to request comments on this approach, specifically:
> - Besides HTMLDOC, do you know a good (X)HTML-to-PDF library which
> could be used for this purpose?
(X)HTML transformed with XSL to XSL-FO and then use Apache FOP for PDF
> - Within this budget, do you believe an alternative approach which
> utilizes an intermediate format is viable (e.g.
> wiki-to-Docbook-to-PDF), given the complexity of the MediaWiki syntax,
> its various extensions, and the need to keep up with parser changes?
The standard docbook tools perform the same process as above, but start with
DocBook which is transformed with XSL to XSL-FO.
More information about the MediaWiki-l