Hi *,
New to the list, so please tell me to RTFM if I've missed anything or this is the incorrect place for my question.
We've been using and hacking 2 different extensions (PdfBook and PdfExport) to render our wiki pages to PDF. Both extensions have been hacked enough to work quite well but now we I a requirement to include CSS so that the rendered PDF looks EXACTLY like the wiki. Since htmldoc doesn't support CSS as of yet, I've been looking at some different approaches to rendering wiki's to PDF.
The first option I came across was PrinceXML. This does quite a nice job but at a cost of a license ~$6k USD. Unfortunately that's a deal breaker due to the nature of the projects (some of which are funded and others are completely open and not funded except for my time). I then ran across wkhtmltopdf which utilizes QTWebkit. After a few scripts to test externally, this seems to be a viable option that I would like to put some cycles towards.
I've been using mediawiki for quite some time and know PHP, so writing an extension should not be to terribly hard. I've been reading through the manual on extension development (hooking, parsers, etc) and kinda get the idea of how to do it. I'm sure that after a little trial and error and looking at others source, I should be able to get what I need done and give the extension back if anyone would be interested.
Has anyone gone down this road before with trying to render wiki's with CSS?? (We currently have custom Common.css and Print.css files on many wiki sites). Obviously I don't want to have to re-invent the wheel, but I have yet to see any extensions that support CSS.
Any feedback would be greatly appreciated!!
TIA, max
Also sprach N. Max Pierson:
Has anyone gone down this road before with trying to render wiki's with CSS?? (We currently have custom Common.css and Print.css files on many wiki sites). Obviously I don't want to have to re-invent the wheel, but I have yet to see any extensions that support CSS.
There's a set of case studies here:
http://www.princexml.com/samples/#wiki
You can use Prince for free for non-commercial purposes.
Wikipedia's HTML markup is suboptimal for printing, ofte due to the use of the 'style' attribute which hardcodes presentations for screens.
http://www.princexml.com/bb/viewtopic.php?f=2&t=3823
In Norway, we have started a project to exterminate the 'style' attribute. Here's a description (in Norwegian):
http://no.wikipedia.org/wiki/Wikipedia:Underprosjekter/Utryddelse_av_%C2%ABs...
Good progress has been made in the templates. Most of the remaining issues are in the Mediawiki software itself. For example, in markup like this:
<div class="thumbinner" style="width:222px;"><a href="/wiki/Fil:FeleHel_(2).jpg" class="image"><img alt="" src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/bf/FeleHel_%282%29.jpg/220px-FeleHel_%282%29.jpg" width="220" height="431" class="thumbimage" /></a>
Perhaps one could create classes for the most common sizes? (220px seems quite common)
Then there's these:
<div id="mw-js-message" style="display:none;"></div>
<div id="p-logo"><a style="background-image: url(http://upload.wikimedia.org/wikipedia/no/b/bc/Wiki.png);%22href=%22/wiki/Por..." title="Hovedside"></a></div>
<div style="clear:both"></div>
Efforts to rmove these -- by turning them into classes -- will be much appreciated.
Cheers,
-h&kon
http://people.opera.com/howcome http://www.princexml.com/howcome
On Fri, Aug 5, 2011 at 9:56 PM, N. Max Pierson nmaxpierson@gmail.comwrote:
New to the list, so please tell me to RTFM if I've missed anything or this is the incorrect place for my question.
We've been using and hacking 2 different extensions (PdfBook and PdfExport) to render our wiki pages to PDF. Both extensions have been hacked enough to work quite well but now we I a requirement to include CSS so that the rendered PDF looks EXACTLY like the wiki.
A couple years ago I did some brief experiments using this tool:
http://code.google.com/p/wkhtmltopdf/
It simply uses the WebKit HTML renderer implementation and PDF output implementation available in the common Qt framework library to render any given web page to PDF, just as if you had printed / saved to PDF from a browser.
If you only need to render out individual pages (as opposed to bundling collections of pages for book-style publishing with additional credit & license information), this sort of thing is probably a far better option than anything that tries to work with the low-level wiki markup (necessitating reimplementation of all of MediaWiki's parser, any plugins used, and of course... an HTML renderer.)
-- brion
Thanks for the replies.
Håkon,
First I would like to thank you for the effort you and your colleagues have put into PrinceXML (and Opera as well). I read through the Norwegian wiki and I must say that it would be a great approach to fixing the issue. It would be quite an undertaking to move all of the "style" tags into a class however and would take way more time than I have to do a complete wiki fix. Since the most important wiki I maintain very rarely has any back end changes made to it, it would not be that difficult to simply rip out the wiki markup before it is sent to the browser, clean it up with some regular expressions, and then send it to the PDF renderer.
For my current projects, I could use Prince on 2 of the non-commercial wiki's I maintain, however my other wiki site is commercial and there simply isn't enough in my budget for the license. It is a wonderful binary and does the job exactly as expected during my testing, but I just do not have any say so over the money for this project. Now that I see that this seems to be a pretty common problem, I may be able to start a new side project that could help remedy this problem, but as stated before, I have several projects in line before I could even begin thinking about helping out with the changes to mediawiki itself. I see that Jon Harald Søby is the lead on this project and it it is still in the draft process, but I am very interested in the idea and will keep up with the process even though I cannot contribute at this moment in time.
Brion,
This is exactly what i'm looking to do. At the moment, I just need to render one wiki article at a time and wkhtmltopdf works perfectly when I tested it with some simple scripting. I need to learn a little more about the global object variables, but I believe I've read through the development manuals enough and have ripped apart a few similar extensions to give me a great place to start. If this turns out to be something useful across all three of my wiki sites, I will probably register it on the mediawiki site so that others may benefit from it since there seems to be more than just myself in need of rendering CSS to PDF.
Once again thanks for the replies all.
Regards, max
* * On Sat, Aug 6, 2011 at 2:07 AM, Brion Vibber brion@pobox.com wrote:
On Fri, Aug 5, 2011 at 9:56 PM, N. Max Pierson <nmaxpierson@gmail.com
wrote:
New to the list, so please tell me to RTFM if I've missed anything or
this
is the incorrect place for my question.
We've been using and hacking 2 different extensions (PdfBook and
PdfExport)
to render our wiki pages to PDF. Both extensions have been hacked enough
to
work quite well but now we I a requirement to include CSS so that the rendered PDF looks EXACTLY like the wiki.
A couple years ago I did some brief experiments using this tool:
http://code.google.com/p/wkhtmltopdf/
It simply uses the WebKit HTML renderer implementation and PDF output implementation available in the common Qt framework library to render any given web page to PDF, just as if you had printed / saved to PDF from a browser.
If you only need to render out individual pages (as opposed to bundling collections of pages for book-style publishing with additional credit & license information), this sort of thing is probably a far better option than anything that tries to work with the low-level wiki markup (necessitating reimplementation of all of MediaWiki's parser, any plugins used, and of course... an HTML renderer.)
-- brion _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org