Xmldatadumps-l June 2011

xmldatadumps-l@lists.wikimedia.org

13 participants
6 discussions

by emijrp

Hi. I forward this e-mail, I hope there are people interested on this map. ---------- Forwarded message ---------- From: emijrp <emijrp(a)gmail.com> Date: 2011/6/11 Subject: Wikis around Europe! To: wikiteam-discuss(a)googlegroups.com Hi all; A friend of mine has sent me this link about wikis (locapedias) around Europe.[1] I'm very surprised about the huge amount of wikis available. Time to archive all of them.[2] I have been working on Spanish ones. If you want to help archiving one country, please, reply to this message to coordinate. If not, I will try to archive entire Europe! Regards, emijrp [1] http://maps.google.com/maps/ms?ie=UTF8&t=h&msa=0&msid=115570622864617231547… [2] http://code.google.com/p/wikiteam/

12 years, 10 months

Re: [Xmldatadumps-l] [Wiki-research-l] Wikipedia dumps downloader

by emijrp

Hi; @Derrick: I don't trust Amazon. Really, I don't trust Wikimedia Foundation either. They can't and/or they don't want to provide image dumps (what is worst?). Community donates images to Commons, community donates money every year, and now community needs to develop a software to extract all the images and packed them, and of course, host them in a permanent way. Crazy, right? @Milos: Instead of spliting image dump using the first letter of filenames, I thought about spliting using the upload date (YYYY-MM-DD). So, first chunks (2005-01-01) will be tiny, and recent ones of several GB (a single day). Regards, emijrp 2011/6/28 Derrick Coetzee <dcoetzee(a)eecs.berkeley.edu> > As a Commons admin I've thought a lot about the problem of > distributing Commons dumps. As for distribution, I believe BitTorrent > is absolutely the way to go, but the Torrent will require a small > network of dedicated permaseeds (servers that seed indefinitely). > These can easily be set up at low cost on Amazon EC2 "small" instances > - the disk storage for the archives is free, since small instances > include a large (~120 GB) ephemeral storage volume at no additional > cost, and the cost of bandwidth can be controlled by configuring the > BitTorrent client with either a bandwidth throttle or a transfer cap > (or both). In fact, I think all Wikimedia dumps should be available > through such a distribution solution, just as all Linux installation > media are today. > > Additionally, it will be necessary to construct (and maintain) useful > subsets of Commons media, such as "all media used on the English > Wikipedia", or "thumbnails of all images on Wikimedia Commons", of > particular interest to certain content reusers, since the full set is > far too large to be of interest to most reusers. It's on this latter > point that I want your feedback: what useful subsets of Wikimedia > Commons does the research community want? Thanks for your feedback. > > --=20 > Derrick Coetzee > User:Dcoetzee, English Wikipedia and Wikimedia Commons administrator > http://www.eecs.berkeley.edu/~dcoetzee/ > >

12 years, 10 months

Wikipedia dumps downloader

by emijrp

Hi all; Can you imagine a day when Wikipedia is added to this list?[1] WikiTeam have developed a script[2] to download all the Wikipedia dumps (and her sister projects) from dumps.wikimedia.org. It sorts in folders and checks md5sum. It only works on Linux (it uses wget). You will need about 100GB to download all the 7z files. Save our memory. Regards, emijrp [1] http://en.wikipedia.org/wiki/Destruction_of_libraries [2] http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py

12 years, 10 months

Re: [Xmldatadumps-l] Wikipedia dumps downloader

by emijrp

Hi Richard; Yes, a distributed project would be probably the best solution, but it is not easy to develop, unless you use a library like bittorrent, or similar and you have many peers. Althought most of the people don't seed the files long time, so sometimes is better to depend on a few committed persons than a big but ephemeral crowd. Regards, emijrp 2011/6/26 Richard Farmbrough <richard(a)farmbrough.co.uk> > ** > It would be useful to have an archive of archives. I have to delete my > old data dumps as time passes, for space reasons, however a team could, > between them, maintain multiple copies of every data dump. This would make a > nice distributed project. > > On 26/06/2011 13:53, emijrp wrote: > > Hi all; > > Can you imagine a day when Wikipedia is added to this list?[1] > > WikiTeam have developed a script[2] to download all the Wikipedia dumps > (and her sister projects) from dumps.wikimedia.org. It sorts in folders > and checks md5sum. It only works on Linux (it uses wget). > > You will need about 100GB to download all the 7z files. > > Save our memory. > > Regards, > emijrp > > [1] http://en.wikipedia.org/wiki/Destruction_of_libraries > [2] > http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py > > > _______________________________________________ > Xmldatadumps-l mailing listXmldatadumps-l@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l > > >

12 years, 10 months

zh dump has stopped

by Andreas Meier

see http://dumps.wikimedia.org/zhwiki/20110521/ Best regards, Andreas

12 years, 10 months

Errors in mirror

by 蔡超

hi, I want to install a zh-wiki mirror. I have tried several dumps, there are always some errors. Error 1: when I visit any wiki pages, explorer says "'mw.util.addPortletLink' is null or not an object" Error 2: Templates can not be converted into proper html code, as the picture attached. Error 3: Can not translate Simplified Chinese and Traditional Chinese. Here is my steps to build the mirror: 1. download latest dumps(zhwiki-20110521), mwdumper java source and mediawiki 1.7.0.beta, 2. install mediawiki with name "Wikipedia", the localsetting.php is attached. 3. alter database table "page", "revision" and "text", remove key constrain, auto_increment and index. 4. import page-article.xml with mwdumper.jar 5. alter database table "page", "revision" and "text", add key constrain, auto_increment and index. 6. import zhwiki-20110521-image.sql, imagelinks.sql, interwiki.sql, iwlinks.sql, langlinks.sql, pagelinks.sql, redirect.sql, templatelinks.sql. Is there anything wrong with my operations? Best regards Toppi

12 years, 10 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Xmldatadumps-l June 2011