I publish books (mainly public domain) on CD and DVD. You can see my offerings at http://store.yahoo.com/samizdat
I would like to publish a DVD that includes the full text of Wikipedia, peferably as a set of interlinked html pages (easy for a novice to use, and with no need for a database). The DVD would also include dozens of other reference books.
I understand (thanks to Lars Aronsson) that Directmedia Publishing in Berlin (www.directmedia.de) put the German Wikipedia on DVD (ISBN 3-86640-001-2). It sells for $10 on www.amazon.de, and $1 of that goes to the German branch of the Wikimedia Foundation.
I would like to do something similar for the English version. I would sell it for $12 since it includes other works as well and also since I provide free shipping inside the US. I could also contribute $1 for each DVD sold to the Wikimedia Foundation (or whatever branch of that is appropriate). In addition, I would like to update the DVD about once a month.
I downloaded 20051105_pages_articles.xml.bz2 823 Mbytes I uncompressed that file (using ZipZag) to 20051105_pages_artcles.xml 3.5 Gigabytes
But what can I/should I do next?
The xml file is too big to open in my IE browser (and too big for any customer to open either). I was hoping to get a set of inter-linked files (similar to the way downloads of the CIA World Factbook are presented), so anyone could open a home page and navigate to the the rest.
Is there anything I can do (with my Windows PC) to convert the downloadable files to this format? Or could anyone out there in the Wikipedia community help me?
Thanks very much.
Best wishes.
Richard
Richard Seltzer, seltzer@samizdat.com, 617-469-2269, http://www.samizdat.com A library for the price of a book http://store.yahoo.com/samizdat A summary of our book publishing projects http://www.samizdat.com/orientation.html
On 5 Jan 2006, at 18:55, Richard Seltzer wrote:
I publish books (mainly public domain) on CD and DVD. You can see my offerings at http://store.yahoo.com/samizdat
I would like to publish a DVD that includes the full text of Wikipedia, peferably as a set of interlinked html pages (easy for a novice to use, and with no need for a database). The DVD would also include dozens of other reference books.
I understand (thanks to Lars Aronsson) that Directmedia Publishing in Berlin (www.directmedia.de) put the German Wikipedia on DVD (ISBN 3-86640-001-2). It sells for $10 on www.amazon.de, and $1 of that goes to the German branch of the Wikimedia Foundation.
I would like to do something similar for the English version. I would sell it for $12 since it includes other works as well and also since I provide free shipping inside the US. I could also contribute $1 for each DVD sold to the Wikimedia Foundation (or whatever branch of that is appropriate). In addition, I would like to update the DVD about once a month.
I downloaded 20051105_pages_articles.xml.bz2 823 Mbytes I uncompressed that file (using ZipZag) to 20051105_pages_artcles.xml 3.5 Gigabytes
But what can I/should I do next?
The xml file is too big to open in my IE browser (and too big for any customer to open either). I was hoping to get a set of inter- linked files (similar to the way downloads of the CIA World Factbook are presented), so anyone could open a home page and navigate to the the rest.
Is there anything I can do (with my Windows PC) to convert the downloadable files to this format? Or could anyone out there in the Wikipedia community help me?
You need to install the mediawiki software, import the xml into the database and then write a script to generate the html pages. You probably want to find a tame hacker to help set it up for the first time (you will need some customisation, set up a look and feel etc; also if you are not going to have the pictures you may want to strip out the captions, and you need to work out what to do with red links etc).
Suggest you mail the technical list to find someone who can do the detailed work. Its not a huge amount although getting it right may take a while as there is a lot to check.
Best wishes
Justinc
On 05/01/06, Richard Seltzer seltzer@samizdat.com wrote:
I understand (thanks to Lars Aronsson) that Directmedia Publishing in Berlin (www.directmedia.de) put the German Wikipedia on DVD (ISBN 3-86640-001-2). It sells for $10 on www.amazon.de, and $1 of that goes to the German branch of the Wikimedia Foundation.
Euro not dollars, but yeah. Handily, ten euro is almost exactly the $12 you mention below.
You might be interested to read http://en.wikipedia.org/wiki/German_Wikipedia, where Axel Boldt has given a short explanation of the various .de publishing projects.
It may be worth noting that the German publication wasn't a straight copy of the "live dump", but rather was a slightly-filtered version - a largish team had vetted the whole set of articles and then put it on DVD. I'm not sure what level of quality you're looking at for this DVD, but without a step like this you'd be leaving a lot of dross (and a lot of potential copyright problems) in place. They also had some home-brewed searching software, rather than just a webbrowser, IIRC.
As to producing static HTML versions, I believe Tim Starling was working on tools to do just this, but I can't offhand find anything. Searching the archives of this list might be worthwhile.
One final point - you mention donations to Wikimedia. I assume (since it makes good business sense) you'd want to publicise this, and thus use the Wikipedia/Wikimedia name - this may well be preaching to the choir, but please contact the Foundation, to avoid running into trademark issues and any associated unpleasantness. (I certainly don't know what the status of using the name was in Germany...)
But all that aside, the donations are of course appreciated :-)
Best of luck, -- - Andrew Gray andrew.gray@dunelm.org.uk
Thanks for your response.
Sounds like this could be very difficult. It's hard to imagine "filtering" the entire Wikipedia. I was hoping that the "articles only" version (without comments) would be clean and reliable and free of potential copyright problems. And I would not want to include search software. The basic search capabilities within Windows could be used. And, of course, a customer could install any other search program as well. But, ideally, it should be possible to simply navigate with links (as one does with the CIA World Factbook).
Do you have an email address for Tim Starling?
I was hoping that there would be a way to convert regularly updated Wikipedia files to a format appropriate for DVD, so it would be possible for me to offer regular frequent updates.
And is there an email address to officially contact the Foundation?
Thanks again.
Richard
Richard Seltzer, seltzer@samizdat.com, 617-469-2269, http://www.samizdat.com A library for the price of a book http://store.yahoo.com/samizdat A summary of our book publishing projects http://www.samizdat.com/orientation.html
----- Original Message ----- From: "Andrew Gray" shimgray@gmail.com To: "English Wikipedia" wikien-l@wikipedia.org; seltzer@samizdat.com Sent: Friday, January 06, 2006 6:55 AM Subject: Re: [WikiEN-l] Want to publish English version of Wikipedia on DVD and need help/advice
On 05/01/06, Richard Seltzer seltzer@samizdat.com wrote:
I understand (thanks to Lars Aronsson) that Directmedia Publishing in Berlin (www.directmedia.de) put the German Wikipedia on DVD (ISBN 3-86640-001-2). It sells for $10 on www.amazon.de, and $1 of that goes to the German branch of the Wikimedia Foundation.
Euro not dollars, but yeah. Handily, ten euro is almost exactly the $12 you mention below.
You might be interested to read http://en.wikipedia.org/wiki/German_Wikipedia, where Axel Boldt has given a short explanation of the various .de publishing projects.
It may be worth noting that the German publication wasn't a straight copy of the "live dump", but rather was a slightly-filtered version - a largish team had vetted the whole set of articles and then put it on DVD. I'm not sure what level of quality you're looking at for this DVD, but without a step like this you'd be leaving a lot of dross (and a lot of potential copyright problems) in place. They also had some home-brewed searching software, rather than just a webbrowser, IIRC.
As to producing static HTML versions, I believe Tim Starling was working on tools to do just this, but I can't offhand find anything. Searching the archives of this list might be worthwhile.
One final point - you mention donations to Wikimedia. I assume (since it makes good business sense) you'd want to publicise this, and thus use the Wikipedia/Wikimedia name - this may well be preaching to the choir, but please contact the Foundation, to avoid running into trademark issues and any associated unpleasantness. (I certainly don't know what the status of using the name was in Germany...)
But all that aside, the donations are of course appreciated :-)
Best of luck, -- - Andrew Gray andrew.gray@dunelm.org.uk
Also don't forget, that to comply with the GFDL, you need to provide a list of contributors for each article. Or at the very least link to the online history of the article in question.
Mgm
On 1/6/06, Richard Seltzer seltzer@samizdat.com wrote:
Thanks for your response.
Sounds like this could be very difficult. It's hard to imagine "filtering" the entire Wikipedia. I was hoping that the "articles only" version (without comments) would be clean and reliable and free of potential copyright problems. And I would not want to include search software. The basic search capabilities within Windows could be used. And, of course, a customer could install any other search program as well. But, ideally, it should be possible to simply navigate with links (as one does with the CIA World Factbook).
Do you have an email address for Tim Starling?
I was hoping that there would be a way to convert regularly updated Wikipedia files to a format appropriate for DVD, so it would be possible for me to offer regular frequent updates.
And is there an email address to officially contact the Foundation?
Thanks again.
Richard
Richard Seltzer, seltzer@samizdat.com, 617-469-2269, http://www.samizdat.com A library for the price of a book http://store.yahoo.com/samizdat A summary of our book publishing projects http://www.samizdat.com/orientation.html
----- Original Message ----- From: "Andrew Gray" shimgray@gmail.com To: "English Wikipedia" wikien-l@wikipedia.org; seltzer@samizdat.com Sent: Friday, January 06, 2006 6:55 AM Subject: Re: [WikiEN-l] Want to publish English version of Wikipedia on DVD and need help/advice
On 05/01/06, Richard Seltzer seltzer@samizdat.com wrote:
I understand (thanks to Lars Aronsson) that Directmedia Publishing in Berlin (www.directmedia.de) put the German Wikipedia on DVD (ISBN 3-86640-001-2). It sells for $10 on www.amazon.de, and $1 of that goes to the German branch of the Wikimedia Foundation.
Euro not dollars, but yeah. Handily, ten euro is almost exactly the $12 you mention below.
You might be interested to read http://en.wikipedia.org/wiki/German_Wikipedia, where Axel Boldt has given a short explanation of the various .de publishing projects.
It may be worth noting that the German publication wasn't a straight copy of the "live dump", but rather was a slightly-filtered version - a largish team had vetted the whole set of articles and then put it on DVD. I'm not sure what level of quality you're looking at for this DVD, but without a step like this you'd be leaving a lot of dross (and a lot of potential copyright problems) in place. They also had some home-brewed searching software, rather than just a webbrowser, IIRC.
As to producing static HTML versions, I believe Tim Starling was working on tools to do just this, but I can't offhand find anything. Searching the archives of this list might be worthwhile.
One final point - you mention donations to Wikimedia. I assume (since it makes good business sense) you'd want to publicise this, and thus use the Wikipedia/Wikimedia name - this may well be preaching to the choir, but please contact the Foundation, to avoid running into trademark issues and any associated unpleasantness. (I certainly don't know what the status of using the name was in Germany...)
But all that aside, the donations are of course appreciated :-)
Best of luck,
- Andrew Gray
andrew.gray@dunelm.org.uk
WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l
On 1/6/06, Richard Seltzer seltzer@samizdat.com wrote:
Do you have an email address for Tim Starling?
It's on his user page: http://en.wikipedia.org/wiki/User:Tim_Starling.
And is there an email address to officially contact the Foundation?
board at wikimedia.org is an address you can use if you need to contact the board or people involved with the Foundation. For non-board issues, the public mailing lists (like http://mail.wikipedia.org/mailman/listinfo/foundation-l) are preferable.
Angela.