Setting up a wikipedia mirror

List overview All Threads
Download

newer

older

Darwinian search on wikipedia

Bugzilla components for Extensions

Christian Reitwießner

25 May 2009 25 May '09

10:19 a.m.

Hi!

I'm trying to setup a wikipedia mirror from the dumps and I already wrote about it to this list: Most of the images are missing. Marcus Buck suggested to me to also import Wikimedia commons. I downloaded both dumps and tried to merge them in one database, but there are pages that exist in both mediawikis, so there are duplicate key errors. How is it supposed to be done? What is the setup of the official wikipedias? Thanks for you help!

Kind regards, Christian Reitwießner

Show replies by date

K. Peachey

25 May 25 May

11:04 a.m.

Instead of downloading the commons dump, have you considered using $wgForeignFileRepos compared to the sumps, althought this would need a web connection for it to work properly. They have a example for how to set it up to work with files from commons: http://www.mediawiki.org/wiki/Manual:%24wgForeignFileRepos#Using_files_from_...

Christian Reitwießner

12:18 p.m.

K. Peachey schrieb:

...

Instead of downloading the commons dump, have you considered using $wgForeignFileRepos compared to the sumps, althought this would need a web connection for it to work properly. They have a example for how to set it up to work with files from commons: http://www.mediawiki.org/wiki/Manual:%24wgForeignFileRepos#Using_files_from_...

Thanks for the hint! Perhaps I should explain a bit more. I have modified the DumpHTML extension and want to use it to create a compressend wikipedia dump for offline use. This dump will not contain images but it would be good if it at least contained the links to the images so that they can be displayed if there is a connection to the internet. I think using commons as external file repository would solve all these problems.

So if I use commons as external file repository, it will be queried for every file in wikipedia during the dump process, right? Is that acceptable for the wikimedia servers?

Kind regards, Christian Reitwießner

Chad

12:50 p.m.

On Mon, May 25, 2009 at 6:18 AM, Christian Reitwießner christian@reitwiessner.de wrote:

...

K. Peachey schrieb:

...
Instead of downloading the commons dump, have you considered using $wgForeignFileRepos compared to the sumps, althought this would need a web connection for it to work properly. They have a example for how to set it up to work with files from commons: http://www.mediawiki.org/wiki/Manual:%24wgForeignFileRepos#Using_files_from_...

Thanks for the hint! Perhaps I should explain a bit more. I have modified the DumpHTML extension and want to use it to create a compressend wikipedia dump for offline use. This dump will not contain images but it would be good if it at least contained the links to the images so that they can be displayed if there is a connection to the internet. I think using commons as external file repository would solve all these problems.

So if I use commons as external file repository, it will be queried for every file in wikipedia during the dump process, right? Is that acceptable for the wikimedia servers?

Kind regards, Christian Reitwießner

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Seeing as pages are parsed on import, it would process the images and hit the commons repo for each page you're importing. I'm not the one to answer as to whether this is acceptable or not though.

-Chad

Manuel Schneider

1:21 p.m.

Am Montag, 25. Mai 2009 12:18:32 schrieb Christian Reitwießner:

...

Thanks for the hint! Perhaps I should explain a bit more. I have modified the DumpHTML extension and want to use it to create a compressend wikipedia dump for offline use. This dump will not contain images but it would be good if it at least contained the links to the images so that they can be displayed if there is a connection to the internet. I think using commons as external file repository would solve all these problems.

If you want to have dumps of Wikipedia for offline usage you might want to subscribe to dev-l (at) openzim.org.

The openZIM team has made several dumps and provides software to do so as well as reader applications. Wikimedia is currently working on the dumping process to provide ZIM files on a regular basis in the same process of making XML and SQL dumps.

If you ask on the ML you can get the URLs to download several dumps which are already available, there is at least a full dump of german Wikipedia (1,3 GB) and selected articles from english, spanish and italian Wikipedia with images.

Greets,

Manuel

-- Regards Manuel Schneider Wikimedia CH - Verein zur Förderung Freien Wissens Wikimedia CH - Association for the advancement of free knowledge www.wikimedia.ch

Christian Reitwießner

26 May 26 May

3:51 p.m.

Manuel Schneider schrieb:

...

Am Montag, 25. Mai 2009 12:18:32 schrieb Christian Reitwießner:

...
Thanks for the hint! Perhaps I should explain a bit more. I have modified the DumpHTML extension and want to use it to create a compressend wikipedia dump for offline use. This dump will not contain images but it would be good if it at least contained the links to the images so that they can be displayed if there is a connection to the internet. I think using commons as external file repository would solve all these problems.

If you want to have dumps of Wikipedia for offline usage you might want to subscribe to dev-l (at) openzim.org.

The openZIM team has made several dumps and provides software to do so as well as reader applications. Wikimedia is currently working on the dumping process to provide ZIM files on a regular basis in the same process of making XML and SQL dumps.

If you ask on the ML you can get the URLs to download several dumps which are already available, there is at least a full dump of german Wikipedia (1,3 GB) and selected articles from english, spanish and italian Wikipedia with images.

Thank you for the hint, I'll check if it can help in my project.

I now have some different problem when setting up the mirror:

The page

http://vitels.cs.uni-tuebingen.de:20204/index.php/Special:Version

shows some strange information, the ParserFunctions extension is not installed correctly, although it was before importing the german content into the text, revision and page tables (that's all I did and I did it directly via INSERT INTO's without touching the other tables). Is it possible that I have to update some other tables for the extension to work correctly?

Kind regards, Christian Reitwießner

5664

Age (days ago)

5665

Last active (days ago)

wikitech-l@lists.wikimedia.org

5 comments

4 participants

tags (0)

participants (4)

Chad
Christian Reitwießner
K. Peachey
Manuel Schneider