Hi!
I'm trying to setup a wikipedia mirror from the dumps and I already wrote about it to this list: Most of the images are missing. Marcus Buck suggested to me to also import Wikimedia commons. I downloaded both dumps and tried to merge them in one database, but there are pages that exist in both mediawikis, so there are duplicate key errors. How is it supposed to be done? What is the setup of the official wikipedias? Thanks for you help!
Kind regards, Christian Reitwießner
Instead of downloading the commons dump, have you considered using $wgForeignFileRepos compared to the sumps, althought this would need a web connection for it to work properly. They have a example for how to set it up to work with files from commons: http://www.mediawiki.org/wiki/Manual:%24wgForeignFileRepos#Using_files_from_...
K. Peachey schrieb:
Instead of downloading the commons dump, have you considered using $wgForeignFileRepos compared to the sumps, althought this would need a web connection for it to work properly. They have a example for how to set it up to work with files from commons: http://www.mediawiki.org/wiki/Manual:%24wgForeignFileRepos#Using_files_from_...
Thanks for the hint! Perhaps I should explain a bit more. I have modified the DumpHTML extension and want to use it to create a compressend wikipedia dump for offline use. This dump will not contain images but it would be good if it at least contained the links to the images so that they can be displayed if there is a connection to the internet. I think using commons as external file repository would solve all these problems.
So if I use commons as external file repository, it will be queried for every file in wikipedia during the dump process, right? Is that acceptable for the wikimedia servers?
Kind regards, Christian Reitwießner
On Mon, May 25, 2009 at 6:18 AM, Christian Reitwießner christian@reitwiessner.de wrote:
K. Peachey schrieb:
Instead of downloading the commons dump, have you considered using $wgForeignFileRepos compared to the sumps, althought this would need a web connection for it to work properly. They have a example for how to set it up to work with files from commons: http://www.mediawiki.org/wiki/Manual:%24wgForeignFileRepos#Using_files_from_...
Thanks for the hint! Perhaps I should explain a bit more. I have modified the DumpHTML extension and want to use it to create a compressend wikipedia dump for offline use. This dump will not contain images but it would be good if it at least contained the links to the images so that they can be displayed if there is a connection to the internet. I think using commons as external file repository would solve all these problems.
So if I use commons as external file repository, it will be queried for every file in wikipedia during the dump process, right? Is that acceptable for the wikimedia servers?
Kind regards, Christian Reitwießner
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Seeing as pages are parsed on import, it would process the images and hit the commons repo for each page you're importing. I'm not the one to answer as to whether this is acceptable or not though.
-Chad
Am Montag, 25. Mai 2009 12:18:32 schrieb Christian Reitwießner:
Thanks for the hint! Perhaps I should explain a bit more. I have modified the DumpHTML extension and want to use it to create a compressend wikipedia dump for offline use. This dump will not contain images but it would be good if it at least contained the links to the images so that they can be displayed if there is a connection to the internet. I think using commons as external file repository would solve all these problems.
If you want to have dumps of Wikipedia for offline usage you might want to subscribe to dev-l (at) openzim.org.
The openZIM team has made several dumps and provides software to do so as well as reader applications. Wikimedia is currently working on the dumping process to provide ZIM files on a regular basis in the same process of making XML and SQL dumps.
If you ask on the ML you can get the URLs to download several dumps which are already available, there is at least a full dump of german Wikipedia (1,3 GB) and selected articles from english, spanish and italian Wikipedia with images.
Greets,
Manuel
Manuel Schneider schrieb:
Am Montag, 25. Mai 2009 12:18:32 schrieb Christian Reitwießner:
Thanks for the hint! Perhaps I should explain a bit more. I have modified the DumpHTML extension and want to use it to create a compressend wikipedia dump for offline use. This dump will not contain images but it would be good if it at least contained the links to the images so that they can be displayed if there is a connection to the internet. I think using commons as external file repository would solve all these problems.
If you want to have dumps of Wikipedia for offline usage you might want to subscribe to dev-l (at) openzim.org.
The openZIM team has made several dumps and provides software to do so as well as reader applications. Wikimedia is currently working on the dumping process to provide ZIM files on a regular basis in the same process of making XML and SQL dumps.
If you ask on the ML you can get the URLs to download several dumps which are already available, there is at least a full dump of german Wikipedia (1,3 GB) and selected articles from english, spanish and italian Wikipedia with images.
Thank you for the hint, I'll check if it can help in my project.
I now have some different problem when setting up the mirror:
The page
http://vitels.cs.uni-tuebingen.de:20204/index.php/Special:Version
shows some strange information, the ParserFunctions extension is not installed correctly, although it was before importing the german content into the text, revision and page tables (that's all I did and I did it directly via INSERT INTO's without touching the other tables). Is it possible that I have to update some other tables for the extension to work correctly?
Kind regards, Christian Reitwießner
wikitech-l@lists.wikimedia.org