Heloo all. First post to the mailing list(!)
I found the post about the toolserver PDF generator in the archives. Instead of leaving a lengthy post on-wiki, I wanted to formulate something a bit more formal. Matt touched on this, but I don't think anything came of it at the time.
The PDF generator is a tool by Magnus, which automatically generates PDFs - that much is self evident. But for Wikibooks, I think it has much more potential in the realm of quality control than that (seemingly-)simple function. Since it creates PDFs (mostly) on a per-page(=module) basis, it can be used to keep a stable version up-to-date very easily. If you've noticed, I uploaded a PDF version of my pet project [[First Aid]], which is massive, and will be difficult to keep up-to-date. Although the method I used (PDFLaTeX) allows me much more flexibility in terms of output and customization, manual creation isn't very useful to maintain a stable but up-to-date version.
Enter stage right the PDF generator. As (modules|chapters|books) get to a point where they're in a state which is ready to be preserved in PDF, just trundle off, enter the appropriate URL, and out pops wiki.pdf. It's not perfect; there will be errors: it messes up diacritical marks (as mentioned below), and it can't handle some/most/all? templates (take a look at http://tools.wikimedia.de/~magnus/pdf.php?title=First_Aid/Suspected_Spinal_I njury&project=wikibooks) though, surprisingly, it handles images quite well (but not their captions): http://tools.wikimedia.de/~magnus/pdf.php?title=First_Aid/Diabetes&proje... ikibooks
All that said, the output ''is'' readable, and it's quick and easy to do. The keyword for the quality control aspect is ''quick''. This is something that's not hard to do, so just about any user can decide that content is ready to have a stable version. But it's just as easy to create a new stable version if there are changes you want to have reflected in the PDF.
I'd like to see this tool improved, of course, but I think it's at a state where we can advertise it to users who want to create a (semi-)permanent PDF version, and also for more-frequently-replaced stable versions.
The question becomes, then, "Do we want stable versions of one kind or another?" Further, if we do, then are PDF versions the way we want to do it. In my view, at least, stable versions are a must (and for Wikibooks, essential if only for the ability to print them easily) and until such time as a stable version feature is integrated with the MediaWiki software, this scheme is a surprisingly-close-to-ideal method to achieve that goal.
-Mike PS: Sorry for the length, both of the post and of the title. I just finished a paper, and we use verbose titles in neuroscience especially; I don't have an excuse for the length of the post ;)
==Previous related posts==
From: Magnus Manske <magnusmanske at googlemail.com> Date: 13 Oct 2007 09:16 Subject: [Wikipedia-l] PDF generation on the toolserver To: wikipedia-l at lists.wikimedia.org
On a note by [[User:Korrigan]], I have adapted (rather, rewritten) the PDF Export extension for the toolserver. You can now get a single or multiple Wiki(m|p)edia pages as a PDF, by entering/linking to an URL. As the extension, I am using HTMLDOC, so the output is as good (or as bad) as that package. Don't blame me.
I shall demonstrate using our new meme overlord, [[Horse-ripping]], and the related article [[Zoosadism]] from en.wikipedia: http://tools.wikimedia.de/~magnus/pdf.php?title=Horse-ripping%7CZoosadism
Additional parameters (add them to the URL): &language=XX (XX being the language code; en is default) &project=wikibooks (default:wikipedia) &nogfdl (3 pages of GFDL are appended by default; add this parameter to prevent that)
I haven't figured out how to make HTMLDOC generate a TOC. Maybe tomorrow.
Cheers, Magnus
Not so good with the UTF-8, as demonstrated with http://tools.wikimedia.de/~magnus/pdf.php?title=Alain_Lefèvre&language=fr.
Well, it found the article, but the title page was a bit mungled.
Hey all,
Sorry, but I don't have much in the way of positive encouragement today. Just a couple of wait-is-this-really-what-we-should-be-doing-with-our-time sort of comments.
On 11/24/07, mike.lifeguard mike.lifeguard@gmail.com wrote:
The question becomes, then, "Do we want stable versions of one kind or another?"
Do we need stable versions. I don't really see the need. Perhaps someone could give examples of stable books that have deteriorated due to the ease of editing. I understand the motivation of wanting to have some type of "stable" stamp to give books a stamp of validity, the absence of which a wiki might imply.
I have nothing seriously against stable versions, though I think the stable/unstable branching is unhelpful (what was that policy that had the unstable version ... <dripping sarcasm>boy was that a good idea</dripping sarcasm>). I think like-minded people wouldn't really jump out and oppose this feature as it seems it's purely optional (I'll oppose it in my corner of Wikibooks) but I'd like to ask proponents of the feature to consider the possibility that, though noble and interesting, this little project may not be the most useful or needed one and that there may be other areas that could use their time and skill better.
That said, it's a volunteer project and in the spirit of wiki, I'm glad for anyones bold contributions to it.
Further, if we do, then are PDF versions the way we want to do it.
They wouldn't be my first choice. First of all, as a transition technology until the server software solution would arrive, I'm still doubtful that that there really is such a pressing need. The dead-easy method is, of course just to copy a version to /Stable_001/ (this would of course be as easily editable as a PDF version would be reuploadable).
The first time I saw the idea of PDF versions, they were posed in the context of creating printable versions. Is this possibly a solution in search of a problem?
If we want printable versions, I suggest we go for CSS print versions. They're simple, flexible and made for electronic media. They have low overhead in typesetting which is valuable since I doubt many Wikibooks will make it to print.
Not that they /shouldn't/ get distributed, but rather that most won't get distributed on paper. Yes, I prefer reading off paper but that comes at a price and the loss of interlinking. With ever more mobile devices and technologies like the (currently awfully expensive) Sony Reader the time is coming for paper to exit ... stage left.
I have Adobe Acrobat Pro, so when I've done PDFs in the past I use that and the results are typically very good and very accurate. i know that most other people don't have access to it (I only have a license through school). The PDF generator on the toolserver is a great idea, and though there are some problems, I thnk it's a great start. A few points/retorts:
Do we need stable versions. I don't really see the need. Perhaps someone could give examples of stable books that have deteriorated due to the ease of editing.
It has nothing to do with deterioration over time, it has more to do with the stable dependability that teachers and students will depend on over a semester. A teacher needs to be able to say "The homework is on page 95", and have all the faith in the world that page 95 today is the same as page 95 tomorrow. With that said, "stable versions" is basically a misnomer, because they aren't stable in a general sense, only with respect to a particular audience. That is, we could get a request that says "i'm teaching X class, and i want Y book to be stable for the duration of the semester." We could then create a stabilized version of that book for use in the class, while continuing to develop the book on the wiki in the background.
In short, the editing never stops, and students/teachers have the stability that they need in order to use our books in their classrooms.
Further, if we do, then are PDF versions the way we want to do it.
I think so, at least in part. PDF versions have the benefit that they can be set up the way we want them (with the GFDL text automatically included where it needs to be), etc. Also, PDFs can be downloaded to people who don't have guaranteed internet access, it can be distributed on CD, etc. Generating a PDF can be a pain in the ass too, but I think that having to make copies of pages, protect them, tag pages with a notice that "there is a protected version at..." can be just as big a pain in the ass. If we have a "good" PDF generator tool on the toolserver that we can use automagically, then it might actually be a better option, at least in terms of effort involved.
I like the method we have now, where books have printable versions (on wiki) and PDF versions. If we also had stable versions (especially if we had a tool for automagically creating such versions without requiring lots of copying, page protecting, etc), that would just be a bonus. In short: we should have many methods.
--Andrew whitworth
Andrew Whitworth wrote:
I have Adobe Acrobat Pro, so when I've done PDFs in the past I use that and the results are typically very good and very accurate. i know that most other people don't have access to it (I only have a license through school). The PDF generator on the toolserver is a great idea, and though there are some problems, I thnk it's a great start. A few points/retorts:
Do we need stable versions. I don't really see the need. Perhaps someone could give examples of stable books that have deteriorated due to the ease of editing.
It has nothing to do with deterioration over time, it has more to do with the stable dependability that teachers and students will depend on over a semester. A teacher needs to be able to say "The homework is on page 95", and have all the faith in the world that page 95 today is the same as page 95 tomorrow. With that said, "stable versions" is basically a misnomer, because they aren't stable in a general sense, only with respect to a particular audience. That is, we could get a request that says "i'm teaching X class, and i want Y book to be stable for the duration of the semester." We could then create a stabilized version of that book for use in the class, while continuing to develop the book on the wiki in the background.
<*snip*>
I like the method we have now, where books have printable versions (on wiki) and PDF versions. If we also had stable versions (especially if we had a tool for automagically creating such versions without requiring lots of copying, page protecting, etc), that would just be a bonus. In short: we should have many methods.
--Andrew whitworth
This is a general comment about "automated" PDF files that are created with something like the toolserver. While I am not necessarily against somebody spending the effort trying to make an automated process directly from the MediaWiki database of Wikibooks, I would have to consider any such effort created in this manner to be nothing more than a rough draft, and a considerable amount of additional effort would have to be done in order to create something that is book-like.
More explicit and to the point, anything that is automated from a bunch of web pages is simply going to be ugly. Now I admit this is subjective, but web pages simply aren't books, as hard as you try. Wikibooks via a simple web interface is an important first step, and it does get us to gather the raw content which can be transformed into a book, but it is only about half-way to the goal of publishing a textbook even when you have some beautiful web pages and the written content is letter-perfect.
So far, there isn't anything on Wikibooks resembling what professional editors do in commercial publishing circles. About the best example I've seen anywhere on a large-scale volunteer project is what the Distributed Proofreaders do with the Gutenberg Project draft texts, where there is a completely separate team of volunteers who review the formatting of the document after the actual grammar of the text has been been considered in the final form.
I'd also challenge that a book, a really good book, is in many ways a work of art unto itself. Particularly a good textbook. If we want to make textbooks that really revolutionize the publishing industry, and have them used in actual classrooms, we need to treat them as art forms and not something which massive shortcuts have been taken.
Is it possible to also include the unique book-only markup tags within the Wiki as well? Yes, and I'll admit that. But it isn't going to be easy to get going either. I expect that a development team working very closely with Wikibooks participants to generate a very good combination of markup tags for book publishing that would generate quality PDF files will take years if not a full decade to perfect. On this I am staking my experience and knowledge of software development cycles I've gained over several decades of writing computer software, and years of being involved with Wikibooks.
By far and away the current best "free" method of generating a PDF file is to import the text into Open Office, formatting the content to be much more book-like, and exporting the PDF file. Acrobat Pro certainly does better (Adobe wrote the spec, so they understand it better), but as you said, not everybody on Wikibooks can afford that software.
-- Robert Horning
By far and away the current best "free" method of generating a PDF file is to import the text into Open Office, formatting the content to be much more book-like, and exporting the PDF file. Acrobat Pro certainly does better (Adobe wrote the spec, so they understand it better), but as you said, not everybody on Wikibooks can afford that software.
I think htmldoc and scribus are two other interesting ways to export to PDF. htmldoc is great for converting HTML -> PDF, either on the commandline or with a GUI. Scribus is more for taking content and formatting it for publication ala Desktop Publishing. Scribus does a great job of exporting to PDF.
htmldoc and scribus are both GPL
adam
-- Robert Horning
Textbook-l mailing list Textbook-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/textbook-l
adam hyde wrote:
By far and away the current best "free" method of generating a PDF file is to import the text into Open Office, formatting the content to be much more book-like, and exporting the PDF file. Acrobat Pro certainly does better (Adobe wrote the spec, so they understand it better), but as you said, not everybody on Wikibooks can afford that software.
I think htmldoc and scribus are two other interesting ways to export to PDF. htmldoc is great for converting HTML -> PDF, either on the commandline or with a GUI. Scribus is more for taking content and formatting it for publication ala Desktop Publishing. Scribus does a great job of exporting to PDF.
htmldoc and scribus are both GPL
adam
As I said, doing something like this is an excellent first draft if you are trying to design a book. But if you think this is going to be something acceptable in a classroom compared to commercially published textbooks, I think the results are going to be absolutely horrid from an aesthetic viewpoint. It simply can't be automated without either an incredible amount of additional effort and significiant advances in artificial intelligence, or involving some grey matter from volunteers to make it look nice. Perhaps some other software applications can be used, and if you like them, have fun! I'm not suggesting that Open Office is the only one to use, but it is something that many individuals may already have or be using, is available under multiple operating systems, has a very simple installation path, and is also open source. That is a tough combination to try and beat.
BTW, thanks for suggesting Scribus. I'll have to download it and try it out.
--Robert Horning
@Adam: HTMLDOC is the basis for the toolserver PDF generator, I think. Thanks for the recommendation on Scribus as well; I'll be looking into that.
@Everyone: I do recognize that this has (big) limitations; that's why I think it's place (for now) is a)allow easy "stable versions" (Andrew got it right; also, keep in mind that these are just as easy to update) until we get some consistent system we like in the software but b)not replace hand-done work for creating a publishable PDF book. These *are* rough drafts, and aren't meant to be sent to the printer. Instead this tool is a middle ground between having only a (potentially) unstable version on-wiki and taking lots of time and effort to make a publishable PDF book. Reducing the effort threshold for PDF creation will probably make our books more usable. -Mike
==Quoting== As I said, doing something like this is an excellent first draft if you are trying to design a book. But if you think this is going to be something acceptable in a classroom compared to commercially published textbooks, I think the results are going to be absolutely horrid from an aesthetic viewpoint. It simply can't be automated without either an incredible amount of additional effort and significiant advances in artificial intelligence, or involving some grey matter from volunteers to make it look nice. Perhaps some other software applications can be used, and if you like them, have fun! I'm not suggesting that Open Office is the only one to use, but it is something that many individuals may already have or be using, is available under multiple operating systems, has a very simple installation path, and is also open source. That is a tough combination to try and beat.
BTW, thanks for suggesting Scribus. I'll have to download it and try it out.
--Robert Horning
_______________________________________________ Textbook-l mailing list Textbook-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/textbook-l
mike.lifeguard wrote:
@Adam: HTMLDOC is the basis for the toolserver PDF generator, I think. Thanks for the recommendation on Scribus as well; I'll be looking into that.
@Everyone: I do recognize that this has (big) limitations; that's why I think it's place (for now) is a)allow easy "stable versions" (Andrew got it right; also, keep in mind that these are just as easy to update) until we get some consistent system we like in the software but b)not replace hand-done work for creating a publishable PDF book. These *are* rough drafts, and aren't meant to be sent to the printer. Instead this tool is a middle ground between having only a (potentially) unstable version on-wiki and taking lots of time and effort to make a publishable PDF book. Reducing the effort threshold for PDF creation will probably make our books more usable. -Mike
I make this distinction about the rough drafts mainly because whenever I see a suggestion to automate PDF files, there is a presumption that somehow you can take an edit you just put in 30 seconds ago and somehow have it show up "magically" in a textbook that can be printed by simply pressing a "print book" button somewhere. Or to completely ignore the problems of trying to format HTML into something that can be rendered into a PDF version of the book.
As far as quality control is concerned, I'd love to see something akin to a "published page" tab and "revised" or "draft" page that could be edited, with some sort of admin-type tool that could be used to select a specific version of the page edits to be noted as the "published page" that could be used by casual readers to the wiki. The decision for what page would actually be selected is something to be decided by the participants of that page. I don't believe this would take too much additional programming in terms of adding the extension to MediaWiki, and it would only require the creation of one additional and smallish table to keep track of: what the current "published version" of the page is at the moment. It might even help with caching issues as the published version would be the one to cache, not necessarily the one with the latest edit. Just a thought here, and something on my personal wish list of future Wiki tools I'd love to see.
-- Robert Horning
Or to completely ignore the problems of trying to format HTML into something that can be rendered into a PDF version of the book.
I think that a lot of the problems involved in this process stem from book authors who are trying to be overly creative with their formatting, and are not conscious of the fact that things don't render the same on different display media. I think there are plenty of ways to make books that will render just as well on screen, in print, or in a PDF file.
As far as quality control is concerned, I'd love to see something akin to a "published page" tab and "revised" or "draft" page that could be edited, with some sort of admin-type tool that could be used to select a specific version of the page edits to be noted as the "published page" that could be used by casual readers to the wiki.
I love this idea! Tabs at the top of the page for "Printable Version", "PDF version", "Stable Version" (especially once the FlaggedRevs extension is activated) would be much better then the myriad of templates we try to use for the purpose now. I believe that there are many user-interface changes and enhancements that Wikibooks needs, so many that perhaps we would do well to build our own skin from the ground up. I perhaps would like to put this off until after we have a new logo, but some things like adding new tabs could be done immediately.
The decision for what page would actually be selected is something to be decided by the participants of that page.
And if no particular version needs to be stabilized, we likely don't need to have one at all. I really envision only stabilizing a particular version for the use of particular classes, and allowing things to be dynamic otherwise.
--Andrew Whitworth
And if no particular version needs to be stabilized, we likely don't need to have one at all. I really envision only stabilizing a particular version for the use of particular classes, and allowing things to be dynamic otherwise.
Absolutely! This is the '''only''' reason I bothered to make a PDF of [[First Aid]] - because I was going to be teaching a class, and I wanted to see how the book fared. In the end the class was cancelled, but the point remains.
I don't think people should be overly concerned with "book authors who are trying to be overly creative with their formatting, and are not conscious of the fact that things don't render the same on different display media" though. If formatting doesn't render well, then that book will just have bad PDFs. (For that reason we may want to say "this page is hopeless with respect to the automatic PDFs, so don't give it the PDF tab" on a per-page basis. Not sure how that might work though.)
I love this idea! Tabs at the top of the page for "Printable Version", "PDF version", "Stable Version" (especially once the FlaggedRevs extension is activated) would be much better then the myriad of templates we try to use for the purpose now.
This is actually something I hadn't thought of before. I don't know how server-intensive the PDF tool is (and that should be checked by someone who the answers will make sense for) but if it'd work, a PDF tab at the top could be awesome. For pages that have a cached PDF version (ie from the toolserver), that would get served. For pages that don't, one could perhaps get automatically made and cached (ie just link to the toolserver). Then we need to have some rule for when the PDFs expire. This would become much easier once FlaggedRevs is in place - whenever a page is reviewed, then a PDF is automatically generated and cached (and has the same lifespan as that top reviewed revision. Once the page gets reviewed again, the PDF is replaced. Being that I'm not a programmer, this may or may not work exactly as described.
-Mike
-----Original Message----- From: textbook-l-bounces@lists.wikimedia.org [mailto:textbook-l-bounces@lists.wikimedia.org] On Behalf Of Andrew Whitworth Sent: November 25, 2007 11:06 AM To: Wikimedia textbook discussion Subject: Re: [Textbook-l] The PDF generator on the toolserver hasimplications for quality control on a per-chapter basis
Or to completely ignore the problems of trying to format HTML into something that can be rendered into a PDF version of the book.
I think that a lot of the problems involved in this process stem from book authors who are trying to be overly creative with their formatting, and are not conscious of the fact that things don't render the same on different display media. I think there are plenty of ways to make books that will render just as well on screen, in print, or in a PDF file.
As far as quality control is concerned, I'd love to see something akin to a "published page" tab and "revised" or "draft" page that could be edited, with some sort of admin-type tool that could be used to select a specific version of the page edits to be noted as the "published page" that could be used by casual readers to the wiki.
I love this idea! Tabs at the top of the page for "Printable Version", "PDF version", "Stable Version" (especially once the FlaggedRevs extension is activated) would be much better then the myriad of templates we try to use for the purpose now. I believe that there are many user-interface changes and enhancements that Wikibooks needs, so many that perhaps we would do well to build our own skin from the ground up. I perhaps would like to put this off until after we have a new logo, but some things like adding new tabs could be done immediately.
The decision for what page would actually be selected is something to be decided by the participants of that page.
And if no particular version needs to be stabilized, we likely don't need to have one at all. I really envision only stabilizing a particular version for the use of particular classes, and allowing things to be dynamic otherwise.
--Andrew Whitworth
_______________________________________________ Textbook-l mailing list Textbook-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/textbook-l
I don't think that we absolutely need to use the PDF generator for every single page view of the PDF file, and I can't imagine that we need to take great pains to keep every single PDF for every single book completely up-to-date. It may be good enough to simply create a new PDF when one is specifically needed, and then we do't need to worry about the drain on the toolserver.
--Andrew Whitworth
As far as quality control is concerned, I'd love to see something akin to a "published page" tab and "revised" or "draft" page that could be edited, with some sort of admin-type tool that could be used to select a specific version of the page edits to be noted as the "published page" that could be used by casual readers to the wiki. The decision for what page would actually be selected is something to be decided by the participants of that page. I don't believe this would take too much additional programming in terms of adding the extension to MediaWiki, and it would only require the creation of one additional and smallish table to keep track of: what the current "published version" of the page is at the moment. It might even help with caching issues as the published version would be the one to cache, not necessarily the one with the latest edit. Just a thought here, and something on my personal wish list of future Wiki tools I'd love to see.
just fyi, you may want to look at FLOSS Manuals (http://www.flossmanuals.net)
we have exactly this structure so maybe browse around it and see if this is the kind of idea that you like.
there are published pages: http://www.flossmanuals.net/read
and the 'backend' where manuals are written: http://www.flossmanuals.net/write
the 'published' pages are static, and the backend holds pages that can be dynamically edited.
the PDFs that are linked from the 'published' pages are also static
adam
-- Robert Horning
Textbook-l mailing list Textbook-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/textbook-l
textbook-l@lists.wikimedia.org