PDF/Collection feature live on de.wikibooks

List overview All Threads
Download

newer

older

Logo selection process has...

PDF Customization

Erik Moeller

10 Oct 2008 10 Oct '08

3:46 a.m.

The collection tool, PDF export, and print on demand features are now live on the German Wikibooks edition. PediaPress (who have developed these open source features) are a German company, and want to demonstrate the features at the Frankfurt Book Fair, so it made sense to start in this language. We hope to add the other Wikibooks languages really soon. Next stop: Wikipedia.

We'll also be adding OpenDocument and DocBook export once we've tested them a bit on Wikimedia Labs.

Here's an example full length book rendered with the PDF tool: http://de.wikibooks.org/wiki/Spezial:Sammlung/load_collection/?colltitle=Ben...

(You'll have to click the PDF download button.) As you can see, there are still some hardcoded English texts to get rid of. In terms of output quality, formatting of stuff with underlying HTML in the wiki source texts is the main area of imperfections, since the PDF generator uses wiki-text as a source and gets a bit confused when it encounters HTML. But it should generally ignore what it doesn't understand. If you find cases where it dies, please report them, ideally through the bug tracker at code.pediapress.com (you have to register).

This feature will make it possible to maintain the hierarchical structure of wiki-books through dedicated collection meta-files that are stored in the wiki. The underlying meta-file in the case above is this one:

http://de.wikibooks.org/wiki/Benutzer:Eloquence/Kollektionen/Beispiel-Sammlu...

As you can see, it's a very simple format. These pages can exist either in the user namespace or in the project namespace, and will be automatically detected as "collections" that can then be loaded and exported via the collection toolbox in the sidebar. But for user-friendly PDF download, it's probably easiest to integrate links (in the above format) to ready-made collections into templates, like the existing "printable version" templates.

One of the nicer aspects of this approach is that you can easily have multiple views on the same Wikibook, or create a book pulling from multiple sources. But I also see the collection meta-files potentially useful for other purposes in the future, such as Wikibooks statistics.

When this is available on all projects, I'll write a bit more. If you want to play with an English language version, there's still a demo running at:

http://en.labs.wikimedia.org/wiki/Main_Page

with a full English Wikibooks snapshot database.

Have fun, Erik

-- Erik Möller Deputy Director, Wikimedia Foundation Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

Show replies by date

Derbeth

10 Oct 10 Oct

11:15 a.m.

I wonder about the legal aspects. In my opinion, when you create a ready-to-print version, you have to attach the text of GFDL license to it - directly, not as a link. Like it is done in http://en.wikibooks.org/wiki/Image:LaTeX.pdf.

Secondly, current version of the tool does a plagiarism - beacause it does not mention image authors and does not provide any mean (like by making images clickable) to check these authors. It could generate a list of images with their licenses (as images can have different licenses, not only CC & GFDL, but also FAL), like on page 213 of http://pl.wikibooks.org/wiki/Grafika:C.pdf.

-- http://pl.wikipedia.org - otwarta encyklopedia http://pl.wikinews.org - otwarte źródło informacji http://pl.wikibooks.org - otwarte podręczniki Opera - the fastest browser on Earth!

Erik Moeller

9:22 p.m.

2008/10/10 Derbeth derbeth@wp.pl:

...

I wonder about the legal aspects. In my opinion, when you create a ready-to-print version, you have to attach the text of GFDL license to it - directly, not as a link. Like it is done in http://en.wikibooks.org/wiki/Image:LaTeX.pdf.

Yes, I agree, and I've already noted this. The code accepts any text insertion here, so that should be reasonably straightforward. (Localization is trickier.)

...

Secondly, current version of the tool does a plagiarism - beacause it does not mention image authors and does not provide any mean (like by making images clickable) to check these authors.

Ouch, thanks for pointing that out. Tricky to do this automatically since it's all wiki-text with templates, but we'll investigate a solution here.

-- Erik Möller Deputy Director, Wikimedia Foundation Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

Derbeth

9:27 p.m.

On Fri, 10 Oct 2008 21:22:49 +0200, Erik Moeller wrote:

...

...
Secondly, current version of the tool does a plagiarism - beacause it does not mention image authors and does not provide any mean (like by making images clickable) to check these authors.

Ouch, thanks for pointing that out. Tricky to do this automatically since it's all wiki-text with templates, but we'll investigate a solution here.

Fortunately most images on Commons use {{Information}} template; in other cases it would be quite reasonable to simply assume, that names from links to User: namespace in image description are names of the authors.

That's a good example why it's so important to follow standards on Commons.

-- http://pl.wikipedia.org - otwarta encyklopedia http://pl.wikinews.org - otwarte źródło informacji http://pl.wikibooks.org - otwarte podręczniki Opera - the fastest browser on Earth!

Mike.lifeguard

9:53 p.m.

And also why free content is preferred to be on Commons in the first place.

On Fri, 10 Oct 2008 21:27:39 +0200, "Derbeth" derbeth@wp.pl said:

...

On Fri, 10 Oct 2008 21:22:49 +0200, Erik Moeller wrote:

...
...
Secondly, current version of the tool does a plagiarism - beacause it does not mention image authors and does not provide any mean (like by making images clickable) to check these authors.

Ouch, thanks for pointing that out. Tricky to do this automatically since it's all wiki-text with templates, but we'll investigate a solution here.

Fortunately most images on Commons use {{Information}} template; in other cases it would be quite reasonable to simply assume, that names from links to User: namespace in image description are names of the authors.

That's a good example why it's so important to follow standards on Commons.

-- http://pl.wikipedia.org - otwarta encyklopedia http://pl.wikinews.org - otwarte źródło informacji http://pl.wikibooks.org - otwarte podręczniki

Opera - the fastest browser on Earth!

Textbook-l mailing list Textbook-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/textbook-l

-- Mike.lifeguard mikelifeguard@fastmail.fm

Andrew Whitworth

11 Oct 11 Oct

12:16 a.m.

Maybe now is a good time to revive the "move all our free images to Commons" crusade. I'll grab my pitchfork...

--Andrew Whitworth

On Fri, Oct 10, 2008 at 3:53 PM, Mike.lifeguard mikelifeguard@fastmail.fm wrote:

...

And also why free content is preferred to be on Commons in the first place.

On Fri, 10 Oct 2008 21:27:39 +0200, "Derbeth" derbeth@wp.pl said:

...
On Fri, 10 Oct 2008 21:22:49 +0200, Erik Moeller wrote:

...
...
Secondly, current version of the tool does a plagiarism - beacause it does not mention image authors and does not provide any mean (like by making images clickable) to check these authors.

Ouch, thanks for pointing that out. Tricky to do this automatically since it's all wiki-text with templates, but we'll investigate a solution here.

Fortunately most images on Commons use {{Information}} template; in other cases it would be quite reasonable to simply assume, that names from links to User: namespace in image description are names of the authors.

That's a good example why it's so important to follow standards on Commons.

-- http://pl.wikipedia.org - otwarta encyklopedia http://pl.wikinews.org - otwarte źródło informacji http://pl.wikibooks.org - otwarte podręczniki

Opera - the fastest browser on Earth!

Textbook-l mailing list Textbook-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/textbook-l

-- Mike.lifeguard mikelifeguard@fastmail.fm

Textbook-l mailing list Textbook-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/textbook-l

John-N＠gmx.de

12:54 a.m.

Hi all.

I'd like to introduce three new topics to discuss.

For one I don't like the current payment options of WikiPress. I don't know, if I'm the only one without a credit card, but I'd think it would be useful to accept paypal as option for payment.

My second idea is, that one can link to a free or not so expensive PDF-Editor, or maybe even implement one, so that some minor changes (e.g. size of letters and images) can be done.

Thirdly I think that the images would look better without the frame. The frames have of cause a certain recognition value, so one could argue that they should stay, because it would symbolise Wikis. I'd prefer images without frames though, or with some frame that doesn't stick out too much.

Best wishes,

John

-- ********************************************************************************************** Diese eMail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. This e-mail may contain confidential and/or privileged information. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ********************************************************************************************** Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger

Erik Moeller

12:58 a.m.

2008/10/10 John-N@gmx.de:

...

Thirdly I think that the images would look better without the frame. The frames have of cause a certain recognition value, so one could argue that they should stay, because it would symbolise Wikis. I'd prefer images without frames though, or with some frame that doesn't stick out too much.

It would be nice if a lot of the layout options could be customized through a tab on the collection page - that's definitely high on my wishlist.

-- Erik Möller Deputy Director, Wikimedia Foundation Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

Johannes Beigel

14 Oct 14 Oct

10:38 a.m.

Am 10.10.2008 um 21:22 schrieb Erik Moeller:

...

(Localization is trickier.)

We're already working on a solution but we have to balance elegance/ maintainability, complexity and practicality for the community/ translators.

For the latter point, I guess some i18n.php file, which is made available on Betawiki would be preferable. But from the code side, the obvious (easier) solution would be using gettext.

Note: The PDF generation code (mwlib and mwlib.rl) is almost completely separate from the Collection extension: It's Python code running on another server whose releases/commits are not necessarily synchronized with that of the extnesion etc. So sending messages from the extension to the render server could result in untranslated strings. But so does a less maintained gettext .po file :-)

-- Johannes Beigel

Erik Moeller

15 Oct 15 Oct

4:40 a.m.

2008/10/14 Johannes Beigel johannes.beigel@pediapress.com:

...

For the latter point, I guess some i18n.php file, which is made available on Betawiki would be preferable. But from the code side, the obvious (easier) solution would be using gettext.

Understood. I believe there are plans (?) to support gettext in Betawiki; I've made a separate email introduction regarding this.

That said, I can imagine that in future, the Collection extension might support a set of advanced export options (layout options, etc.), and we might have to do a version check against the mw-pdf server anyway to see whether both are in sync. So if you feel that passing the i18n strings on to the mw-pdf server would be viable, it would probably be my preferred solution.

-- Erik Möller Deputy Director, Wikimedia Foundation Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

Johannes Beigel

14 Oct 14 Oct

12:12 p.m.

New subject: License information (was: PDF/Collection feature live on de.wikibooks)

Am 10.10.2008 um 21:22 schrieb Erik Moeller:

...

2008/10/10 Derbeth derbeth@wp.pl:

...
I wonder about the legal aspects. In my opinion, when you create a ready-to-print version, you have to attach the text of GFDL license to it - directly, not as a link. Like it is done in http://en.wikibooks.org/wiki/Image:LaTeX.pdf.

As Erik wrote: This is already implemented (either a title of an article or a URL to some license text can be set in LocalSettings.php), but it's currently not configured.

...

...
Secondly, current version of the tool does a plagiarism - beacause it does not mention image authors and does not provide any mean (like by making images clickable) to check these authors.

Ouch, thanks for pointing that out. Tricky to do this automatically since it's all wiki-text with templates, but we'll investigate a solution here.

We'd highly appreciate input from the community regarding this topic!

The printed books from PediaPress contain a list of figures where the license of each image is listed, together with the URL to the image description page. As some kind of "hotfix" this solution could be implemented in the PDF export of the Collection extension, too. But this doesn't really solve the problem.

We think it's more of a technical/software thing, so I cross-posted (and set Reply-To) to Wikitech-l.

In our opinion, license management/handling must be a core feature of MediaWiki, because the software is explicitely developed for the collaborative distribution of free content. Licenses of the containing articles and images should not be represented via some agreed-upon convention but via structured (and machine-readable) information, available for each relevant object in the wiki.

Some information that would be desired:

- Full (official) name of the license(s). - Whether the full text of the license has to be included or a reference sufficient. - Reference to the full text of the license(s) (in some rigidly defined format like wikitext). - Whether attribution is required. If so: The list of required attributions.

So, basically all the information that's required to check if it's possible to take some part of the MediaWiki and use it somewhere else and all the information that has to be included in that other place. This information could be made accessible via MediaWiki API, but ideally it's contained in the wikitext and/or XHTML, too.

All this could be handled via microformats, even inside of templates, but the main point is that any kind of new technique has to be enforced, ideally via MediaWiki software itself: In the commons wikis there are some conventions that can be used in software by people/ companies like us (although we have to work with hacks and workarounds), but oftentimes, in wikis with smaller communities this information doesn't even exist at all.

-- Johannes Beigel

Johannes Beigel

1:56 p.m.

New subject: [Wikitech-l] License information (was: PDF/Collection feature live on de.wikibooks)

BTW: PediaPress has a stand on the Frankfurter Buchmesse (Frankfurt Book Fair), booth E427 in hall 4.2. We'd be really happy to meet people from the community to talk about all kinds of MediaWiki related stuff.

So, if some of you are there and can make it... we're looking forward to meet you!

-- Johannes Beigel

5921

Age (days ago)

5926

Last active (days ago)

textbook-l@lists.wikimedia.org

11 comments

6 participants

tags (0)

participants (6)

Andrew Whitworth
Derbeth
Erik Moeller
Johannes Beigel
John-N＠gmx.de
Mike.lifeguard