The MediaWiki Core team has opened a discussion about getting more involved
in and maybe redoing the dumps infrastructure. A good starting point is to
understand how folks use the dumps already or want to use them but can't,
and some questions about that are listed here:
https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_Core_Team/Backlog/Improv…
I've added some notes but please go weigh in. Don't be shy about what you
do/what you need, this is the time to get it all on the table.
Ariel
Hi,
> I got stucked with an open source project which calls for enwiki-latest-pages-articles.xml.bz2
> while I only have enwiki-latest-pages-articles-multistream.xml.bz2, the network status is too
> bad for me to download another large file, so I wondered what is the difference between this
> two file, I have read the descriptions from https://dumps.wikimedia.org/ , however, I am
> confused about the concept 'in multiple bz2 streams, 100 pages per stream', could
> anyone explain it for me? thanks!
This file contains multiple bz2 streams - this means it is actually a
concatenation of multiple bz2 compressed files. The file
enwiki-latest-pages-articles-multistream-index.txt.bz2 contains
offsets of individual streams within the big multistream file. Just
make sure you have both files for the same dump version/date.
Best,
Marcin Osowski
Hi there,
I got stucked with an open source project which calls for enwiki-latest-pages-articles.xml.bz2 while I only have enwiki-latest-pages-articles-multistream.xml.bz2, the network status is too bad for me to download another large file, so I wondered what is the difference between this two file, I have read the descriptions from https://dumps.wikimedia.org/ , however, I am confused about the concept 'in multiple bz2 streams, 100 pages per stream', could anyone explain it for me? thanks!
126
Dear all,
Wikivoyage data dumps used to get generated every 2 weeks, but since July
2014 the frequency has started to fall down dramatically. Currently, the
last dump is from 42 days ago.
It results in a level of disappointment as some users spend a lot of time
improving content in preparation for their next trip, hoping to use this
content with Wikivoyage offline browsers like Kiwix, only to find out that
data is very outdated.
The Wikivoyage community loves data dumps, and reuses them in a lot of
applications, for instance offline guides, GPS navigation apps, interactive
maps, and data validation.
Could the data dumps please be generated about every 2 weeks like before?
Thanks a lot for your consideration!
Nicolas Raoul
https://dumps.wikimedia.org/enwikivoyage
Hello,
the latest generated dayly dump was created at 28-Jan-2015 01:40.
No dumps on http://dumps.wikimedia.org/other/incr/ since this time.
Is this known issue?
Sincerely,
Ivan A. Krestinin
Hi,
we're basically mirroring all the generated dumps, extract them,
harvest data etc. Lately I came to examine some of the more exotic
languages and to my surprise they were even more exotic than I
thought. I propose to ditch them.
Afar (aa) Wikipedia
latest at our servers is aar-20141223.xml.bz with 22974 bytes
(we convert into iso639-3)
It seems the wiki has been closed or moved into incubator:
http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Af…
Nevertheless in the xmldumps this wiki keeps showing up and pretending
something is there. I believe we'd be all better off if dums of this
would cease.
---
Basically the same applies for Ndonga Wikipedia
http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Nd…
But the xmldumps keep pouring in:
ndo-20141223.xml.bz2
etc. Same story with several other wikimedia projects in other languages.
So in general: Could we stop dumping closed projects?
kind regards,
--
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH - www.petamem.com Geschäftsführer: Richard Jelinek
Language Technology - We Mean IT! Sitz der Gesellschaft: Fürth
2.58921 * 10^8 Mind Units Registergericht: AG Fürth, HRB-9201