By the way, each version I need is about 11~13 GB whole English Wikipedia data.
E.g. In this page on 201411 data: http://dumps.wikimedia.org/enwiki/20141106/ It should be this one:
- enwiki-20141106-pages-articles-multistream.xml.bz2 http://dumps.wikimedia.org/enwiki/20141106/enwiki-20141106-pages-articles-multistream.xml.bz2 11.3 GB
Best, Xin
On Tue, Jun 9, 2015 at 10:48 AM, Xin Jin xin.jin1020@gmail.com wrote:
Hi Richard,
Thanks very much for your fast reply!
The ones I need are the 13 versions from 201304 to 201404. May I ask if you have all the 13 versions?
Also, I understand your network is slow and we can treat yours as a second plan. Just want to confirm you have the 13 versions right?
At the same time, I will wait to see if others have the 13 versions :)
Thanks! Best, Xin
On Mon, Jun 8, 2015 at 11:10 PM, Richard Jelinek rj@petamem.com wrote:
On Mon, Jun 08, 2015 at 09:15:39PM -0700, Xin Jin wrote:
Thanks for your information Hydriz.
If anyone has a copy of them, could I copy them please?
I think we might have something. Unfortunately it is on an archive server with poor upload (from the server perspective) capacity, so I'd like you to see this as some last resort if no one else seems to have the data.
Wikipedia
-rw-r--r-- 1 root root 9.0G Dec 26 00:52 eng-20130102.xml.bz2 -rw-r--r-- 1 root root 11G Dec 31 10:07 eng-20141208.xml.bz2 -rw-r--r-- 1 root root 11G Jan 28 22:12 eng-20150112.xml.bz2 -rw-r--r-- 1 root root 11G Feb 24 06:58 eng-20150205.xml.bz2 -rw-r--r-- 1 root root 12G May 12 10:59 eng-20150403.xml.bz2
Wiktionary
-rw-r--r-- 1 root root 233M Dec 26 00:24 eng-20120426.xml.bz2 -rw-r--r-- 1 root root 419M Dec 26 00:24 eng-20140728.xml.bz2 -rw-r--r-- 1 root root 433M Jan 10 01:22 eng-20150102.xml.bz2 -rw-r--r-- 1 root root 440M Mar 11 01:33 eng-20150224.xml.bz2 -rw-r--r-- 1 root root 446M May 12 07:18 eng-20150413.xml.bz2
Wikinews
-rw-r--r-- 1 root root 35M Dec 26 00:22 eng-20121222.xml.bz2 -rw-r--r-- 1 root root 37M Dec 26 00:22 eng-20141119.xml.bz2 -rw-r--r-- 1 root root 37M Dec 31 04:10 eng-20141218.xml.bz2 -rw-r--r-- 1 root root 37M Feb 15 18:11 eng-20150214.xml.bz2 -rw-r--r-- 1 root root 37M May 12 04:38 eng-20150426.xml.bz2
etc.
Wikipedia still keeps the old data of the Wikipedia English
Text
snapshots? From this website http://dumps.wikimedia.org/enwiki/, it
seems that
they keep the English snapshot of Wikipedia for the last 10
months.
Does anyone know if they have the snapshot of the ones from
201304
to 201404?
regards,
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH - www.petamem.com Geschäftsführer: Richard Jelinek Language Technology - We Mean IT! Sitz der Gesellschaft: Fürth 2.58921 * 10^8 Mind Units Registergericht: AG Fürth, HRB-9201
-- Xin Jin,
PhD Candidate, Computer Science Department, University of California, Santa Barbara