Hi everyone, we need your help.
We are from Python Argentina, and we are working on adapting our
cdpedia project to make a DVD together with educ.ar and Wikimedia
Foundation, holding the entire Spanish Wikipedia that will be sent
soon to Argentinian schools.
Hernán and Diego are the two interns tasked with updating the data
that cdpedia uses to make the cd (it currently uses a static html dump
dated June 2008), but they are encountering some problems while trying
to make an up to date static html es-wikipedia dump.
I'm ccing this list of people, because I'm sure you've faced similar
issues when making your offline wikipedias, or because maybe you know
someone who can help us.
Following is an email from Hernán describing the problems he's found.
thanks!
--
alecu - Python Argentina
2010/4/30 Hernan Olivera <lholivera(a)gmail.com>:
Hi everybody,
I've been working on making an up to date static html dump for the
spanish wikipedia, to use as a basis for the DVD.
I've followed the procedures detailed in the pages below, that were
used to generate the current (and out of date) static html dumps:
1) installing and setting up a mediawiki instance
2) importing the xml from [6] with mwdumper
3) exporting the static html with mediawiki's tool
The procedure finishes without throwing any errors, but the xml import
produces malformed html pages that have visible wikimarkup.
We would really need to have a successful import from the spanish xmls
to a mediawiki instance so we can produce the up to date static html
dump.
Links to the info I used:
[0] http://www.mediawiki.org/wiki/Manual:Installation_guide/es
[1] http://www.mediawiki.org/wiki/Manual:Running_MediaWiki_on_Ubuntu
[2] http://en.wikipedia.org/wiki/Wikipedia_database
[3] http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps
[4] http://meta.wikimedia.org/wiki/Importing_a_Wikipedia_database_dump_into_Med…
[5] http://meta.wikimedia.org/wiki/Data_dumps
[6] http://dumps.wikimedia.org/eswiki/20100331/
[7] http://www.mediawiki.org/wiki/Alternative_parsers
(among others)
Cheers,
--
Hernan Olivera
PS: unluckily I didn't write down every step in detail. I did a lot
more tests than what I wrote here. To make a detailed report I'd like
to go thru the procedure again writing down every option (and to check
if I missed something). I'm finishing installing a server just for
this, because this processes take forever and they blocked other tasks
while making this tests.
2009/10/23 Samuel Klein <meta.sj(a)gmail.com>:
> Jimbo - thanks for the spur to clean up the existing work.
>
> All - Let's start by cleaning up the mailing lists and setting a few
> short-term goals :-) It's a good sign that we have both charity and love
> converging to make something happen.
>
> * For all-platform all-purpose wikireaders, let's use
> offline-l(a)lists.wikimedia, as we discussed a month ago in the aftermath of
> Wikimania (Erik, were you going to set this up? I think we agreed to
> deprecate wiki-offline-reader-l and replace it with offline-l.)
>
> * For wikireaders such as WikiBrowse and Infoslicer on the XO, please
> continue to use wikireader(a)lists.laptop
>
>
> I would like to see WikiBrowse become the 'sugarized' version of a reader
> that combines the best of that and the openZim work. A standalone DVD or
> USB drive that comes with its own search tools would be another version of
> the same. As far as merging codebases goes, I don't think the WikiBrowse
> developers are invested in the name.
>
> I think we have a good first cut at selecting articles, weeding out stubs,
> and including thumbnail images. Maybe someone working on openZim can
> suggest how to merge the search processes, and that file format seems
> unambiguously better.
>
> Kul - perhaps part of the work you've been helping along for standalone
> usb-key snapshots would be useful here.
>
>
> Please continue to update this page with your thoughts and progress!
> http://meta.wikimedia.org/wiki/Offline_readers
>
> SJ
>
>
> 2009/10/23 Iris Fernández <irisfernandez(a)gmail.com>
>>
>> On Fri, Oct 23, 2009 at 1:37 PM, Jimmy Wales <jwales(a)wikia-inc.com> wrote:
>> >
>> > My dream is quite simple: a DVD that can be shipped to millions of
>> > people with an all-free-software solution for reading Wikipedia in Spanish.
>> > It should have a decent search solution, doesn't have to be perfect, but it
>> > should be full-text. It should be reasonably fast, but super-perfect is not
>> > a consideration.
>> >
>>
>> Hello! I am an educator, not a programmer. I can help selecting
>> articles or developing categories related to school issues.
>
> Iris - you know the main page of WikiBrowse that you see when the reader
> first loads? You could help with a new version of that page. Madeleine
> (copied here) worked on the first one, but your thoughts on improving it
> would be welcome.
>
>
>
Le mer 07/07/10 09:35, "Hernan Olivera" lholivera(a)gmail.com a écrit:
> Hi everyone. We need your help again.
>
> We finally have a working mirror for generating the static html
> version of ESWIKI we need for Cd-Pedia using DumpHTML extension.
> But it seems that the process will take about 3000 hours of
> processing in our little sempron server (4 months!).
I have in the past prepared a full ZIM file (with thumbnails) of WPIT and it tooks ~3 weeks (but not only the dumpHTML part of the process. also mirroring, optimizing, etc.).
http://tmp.kiwix.org/zim/0.9/wikipedia_it_all_kiwix_03_alpha2.zim
I'm currently preparing something similar in French and Spanish. I'm doing that on a new server so I think this will take agains a few weeks. I should release a first version for both Wikipedias during August.
Emmanuel