By the way, each version I need is about 11~13 GB whole English Wikipedia
data.
E.g. In this page on 201411 data:
It should be this one:
- enwiki-20141106-pages-articles-multistream.xml.bz2
<http://dumps.wikimedia.org/enwiki/20141106/enwiki-20141106-pages-articles-multistream.xml.bz2>
11.3
GB
Best,
Xin
On Tue, Jun 9, 2015 at 10:48 AM, Xin Jin <xin.jin1020(a)gmail.com> wrote:
Hi Richard,
Thanks very much for your fast reply!
The ones I need are the 13 versions from 201304 to 201404. May I ask if
you have all the 13 versions?
Also, I understand your network is slow and we can treat yours as a second
plan. Just want to confirm you have the 13 versions right?
At the same time, I will wait to see if others have the 13 versions :)
Thanks!
Best,
Xin
On Mon, Jun 8, 2015 at 11:10 PM, Richard Jelinek <rj(a)petamem.com> wrote:
On Mon, Jun 08, 2015 at 09:15:39PM -0700, Xin Jin
wrote:
Thanks for your information Hydriz.
If anyone has a copy of them, could I copy them please?
I think we might have something. Unfortunately it is on an archive
server with poor upload (from the server perspective) capacity, so I'd
like you to see this as some last resort if no one else seems to have
the data.
Wikipedia
-rw-r--r-- 1 root root 9.0G Dec 26 00:52 eng-20130102.xml.bz2
-rw-r--r-- 1 root root 11G Dec 31 10:07 eng-20141208.xml.bz2
-rw-r--r-- 1 root root 11G Jan 28 22:12 eng-20150112.xml.bz2
-rw-r--r-- 1 root root 11G Feb 24 06:58 eng-20150205.xml.bz2
-rw-r--r-- 1 root root 12G May 12 10:59 eng-20150403.xml.bz2
Wiktionary
-rw-r--r-- 1 root root 233M Dec 26 00:24 eng-20120426.xml.bz2
-rw-r--r-- 1 root root 419M Dec 26 00:24 eng-20140728.xml.bz2
-rw-r--r-- 1 root root 433M Jan 10 01:22 eng-20150102.xml.bz2
-rw-r--r-- 1 root root 440M Mar 11 01:33 eng-20150224.xml.bz2
-rw-r--r-- 1 root root 446M May 12 07:18 eng-20150413.xml.bz2
Wikinews
-rw-r--r-- 1 root root 35M Dec 26 00:22 eng-20121222.xml.bz2
-rw-r--r-- 1 root root 37M Dec 26 00:22 eng-20141119.xml.bz2
-rw-r--r-- 1 root root 37M Dec 31 04:10 eng-20141218.xml.bz2
-rw-r--r-- 1 root root 37M Feb 15 18:11 eng-20150214.xml.bz2
-rw-r--r-- 1 root root 37M May 12 04:38 eng-20150426.xml.bz2
etc.
Wikipedia still keeps the old data of
the Wikipedia English
Text
seems that
they keep the English snapshot of
Wikipedia for the last 10
months.
Does anyone know if they have the
snapshot of the ones from
201304
to 201404?
regards,
--
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH -
www.petamem.com Geschäftsführer: Richard Jelinek
Language Technology - We Mean IT! Sitz der Gesellschaft: Fürth
2.58921 * 10^8 Mind Units Registergericht: AG Fürth, HRB-9201
--
Xin Jin,
PhD Candidate,
Computer Science Department,
University of California, Santa Barbara
--
Xin Jin,
PhD Candidate,
Computer Science Department,
University of California, Santa Barbara