On Sun, Jan 10, 2016 at 4:05 PM, Bernardo Sulzbach < mafagafogigante@gmail.com> wrote:
On Sun, Jan 10, 2016 at 9:55 PM, Neil Harris neil@tonal.clara.co.uk wrote:
Hello! I've noticed that no enwiki dump seems to have been generated so
far
this month. Is this by design, or has there been some sort of dump
failure?
Does anyone know when the next enwiki dump might happen?
I would also be interested.
-- Bernardo Sulzbach
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
CCing the Xmldatadumps mailing list https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l, where someone has already posted https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-January/001214.html about what might be the same issue.
yep, same here!
Also another question about consistency of _IDs in time. I was working with an old version of wikipedia dump, and testing some data models I built on the dump using as pivot a few topics. I might have data corrupted on my side, but just to be sure: are _IDs of article *persistent* over time, or are they subjected to change?
Might happen that due any fallback or merge in an article history, ID would change? E.g. as test article "Mars" would first point to a version _ID ="4285430" and then changed to "14640471"
I need to ensure _IDs will persist. thank you!
On Mon, Jan 11, 2016 at 6:22 AM, Tilman Bayer tbayer@wikimedia.org wrote:
On Sun, Jan 10, 2016 at 4:05 PM, Bernardo Sulzbach < mafagafogigante@gmail.com> wrote:
On Sun, Jan 10, 2016 at 9:55 PM, Neil Harris neil@tonal.clara.co.uk wrote:
Hello! I've noticed that no enwiki dump seems to have been generated so
far
this month. Is this by design, or has there been some sort of dump
failure?
Does anyone know when the next enwiki dump might happen?
I would also be interested.
-- Bernardo Sulzbach
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
CCing the Xmldatadumps mailing list https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l, where someone has already posted https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-January/001214.html about what might be the same issue.
-- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
yep, same here!
Also another question about consistency of _IDs in time. I was working with an old version of wikipedia dump, and testing some data models I built on the dumpusing as pivot a few topics. I might have data corrupted on my side, but just to be sure: are _IDs of article *persistent* over time, or are they subjected to change?
Might happen that due any fallback or merge in an article history, ID would change? E.g. as test article "Mars" would first point to a version _ID ="4285430" and then changed to "14640471"
I need to ensure _IDs will persist. thank you!
*P.S. sorry for cross posting - I've replied from wrong email - could you please delete the other message and keep only this email address? thank you! *
On Mon, Jan 11, 2016 at 11:05 AM, XDiscovery Team info@xdiscovery.com wrote:
yep, same here!
Also another question about consistency of _IDs in time. I was working with an old version of wikipedia dump, and testing some data models I built on the dump using as pivot a few topics. I might have data corrupted on my side, but just to be sure: are _IDs of article *persistent* over time, or are they subjected to change?
Might happen that due any fallback or merge in an article history, ID would change? E.g. as test article "Mars" would first point to a version _ID ="4285430" and then changed to "14640471"
I need to ensure _IDs will persist. thank you!
On Mon, Jan 11, 2016 at 6:22 AM, Tilman Bayer tbayer@wikimedia.org wrote:
On Sun, Jan 10, 2016 at 4:05 PM, Bernardo Sulzbach < mafagafogigante@gmail.com> wrote:
On Sun, Jan 10, 2016 at 9:55 PM, Neil Harris neil@tonal.clara.co.uk wrote:
Hello! I've noticed that no enwiki dump seems to have been generated
so far
this month. Is this by design, or has there been some sort of dump
failure?
Does anyone know when the next enwiki dump might happen?
I would also be interested.
-- Bernardo Sulzbach
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
CCing the Xmldatadumps mailing list https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l, where someone has already posted https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-January/001214.html about what might be the same issue.
-- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
-- *Luigi Assom* Founder & CEO @ XDiscovery - Crazy on Human Knowledge *Corporate* www.xdiscovery.com *Mobile App for knowledge Discovery* APP STORE http://tiny.cc/LearnDiscoveryApp | PR http://tiny.cc/app_Mindmap_Wikipedia | WEB http://www.learndiscovery.com/
T +39 349 3033334 | +1 415 707 9684
Basically, the xml dumps have 2 IDs: page_id and revision_id.
The page_id points to the article. In this case, 14640471 is the page_id for Mars (https://en.wikipedia.org/wiki/Mars)
The revision_id points to the latest revision for the article. For Mars, the latest revision_id is 699008434 which was generated on 2016-01-09 ( https://en.wikipedia.org/w/index.php?title=Mars&oldid=699008434). Note that a revision_id is generated every time a page is edited.
So, to answer your question, the IDs never change. 14640471 will always point to Mars, while 699008434 points to the 2016-01-09 revision for Mars.
That said, different dumps will have different revision_ids, because an article may be updated. If Mars gets updated tomorrow, and the English Wikipedia dump is generated afterwards, then that dump will list Mars with a new revision_id (something higher than 6999008434). However, that dump will still show Mars with a page_id of 1460471. You're probably better off using the page_id.
Finally, you can see also reference the Wikimedia API to get a similar view to the dump: For example: https://en.wikipedia.org/w/api.php?action=query&prop=revisions&title...
Hope this helps.
On Mon, Jan 11, 2016 at 5:09 AM, Luigi Assom luigi.assom@gmail.com wrote:
yep, same here!
Also another question about consistency of _IDs in time. I was working with an old version of wikipedia dump, and testing some data models I built on the dumpusing as pivot a few topics. I might have data corrupted on my side, but just to be sure: are _IDs of article *persistent* over time, or are they subjected to change?
Might happen that due any fallback or merge in an article history, ID would change? E.g. as test article "Mars" would first point to a version _ID ="4285430" and then changed to "14640471"
I need to ensure _IDs will persist. thank you!
*P.S. sorry for cross posting - I've replied from wrong email - could you please delete the other message and keep only this email address? thank you! *
On Mon, Jan 11, 2016 at 11:05 AM, XDiscovery Team info@xdiscovery.com wrote:
yep, same here!
Also another question about consistency of _IDs in time. I was working with an old version of wikipedia dump, and testing some data models I built on the dump using as pivot a few topics. I might have data corrupted on my side, but just to be sure: are _IDs of article *persistent* over time, or are they subjected to change?
Might happen that due any fallback or merge in an article history, ID would change? E.g. as test article "Mars" would first point to a version _ID ="4285430" and then changed to "14640471"
I need to ensure _IDs will persist. thank you!
On Mon, Jan 11, 2016 at 6:22 AM, Tilman Bayer tbayer@wikimedia.org wrote:
On Sun, Jan 10, 2016 at 4:05 PM, Bernardo Sulzbach < mafagafogigante@gmail.com> wrote:
On Sun, Jan 10, 2016 at 9:55 PM, Neil Harris neil@tonal.clara.co.uk wrote:
Hello! I've noticed that no enwiki dump seems to have been generated
so far
this month. Is this by design, or has there been some sort of dump
failure?
Does anyone know when the next enwiki dump might happen?
I would also be interested.
-- Bernardo Sulzbach
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
CCing the Xmldatadumps mailing list https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l, where someone has already posted https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-January/001214.html about what might be the same issue.
-- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
-- *Luigi Assom* Founder & CEO @ XDiscovery - Crazy on Human Knowledge *Corporate* www.xdiscovery.com *Mobile App for knowledge Discovery* APP STORE http://tiny.cc/LearnDiscoveryApp | PR http://tiny.cc/app_Mindmap_Wikipedia | WEB http://www.learndiscovery.com/
T +39 349 3033334 | +1 415 707 9684
-- *Luigi Assom*
T +39 349 3033334 | +1 415 707 9684 Skype oggigigi
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Oh I see, I may have used a revision ID by mistake then. I m interested in using the last version of html article by page_id, not in accessing a particular version. I though that querying a revision must be expensive: Could you please tell if following query is ok for my purpose or can be less costly?
I am currently using this parameters:
action:'query',
prop: 'revisions', rvprop: 'content', rvparse: 1, redirects: 'true',
On Tue, Jan 12, 2016 at 7:06 AM, gnosygnu gnosygnu@gmail.com wrote:
Basically, the xml dumps have 2 IDs: page_id and revision_id.
The page_id points to the article. In this case, 14640471 is the page_id for Mars (https://en.wikipedia.org/wiki/Mars)
The revision_id points to the latest revision for the article. For Mars, the latest revision_id is 699008434 which was generated on 2016-01-09 ( https://en.wikipedia.org/w/index.php?title=Mars&oldid=699008434). Note that a revision_id is generated every time a page is edited.
So, to answer your question, the IDs never change. 14640471 will always point to Mars, while 699008434 points to the 2016-01-09 revision for Mars.
That said, different dumps will have different revision_ids, because an article may be updated. If Mars gets updated tomorrow, and the English Wikipedia dump is generated afterwards, then that dump will list Mars with a new revision_id (something higher than 6999008434). However, that dump will still show Mars with a page_id of 1460471. You're probably better off using the page_id.
Finally, you can see also reference the Wikimedia API to get a similar view to the dump: For example: https://en.wikipedia.org/w/api.php?action=query&prop=revisions&title...
Hope this helps.
On Mon, Jan 11, 2016 at 5:09 AM, Luigi Assom luigi.assom@gmail.com wrote:
yep, same here!
Also another question about consistency of _IDs in time. I was working with an old version of wikipedia dump, and testing some data models I built on the dumpusing as pivot a few topics. I might have data corrupted on my side, but just to be sure: are _IDs of article *persistent* over time, or are they subjected to change?
Might happen that due any fallback or merge in an article history, ID would change? E.g. as test article "Mars" would first point to a version _ID ="4285430" and then changed to "14640471"
I need to ensure _IDs will persist. thank you!
*P.S. sorry for cross posting - I've replied from wrong email - could you please delete the other message and keep only this email address? thank you! *
On Mon, Jan 11, 2016 at 11:05 AM, XDiscovery Team info@xdiscovery.com wrote:
yep, same here!
Also another question about consistency of _IDs in time. I was working with an old version of wikipedia dump, and testing some data models I built on the dump using as pivot a few topics. I might have data corrupted on my side, but just to be sure: are _IDs of article *persistent* over time, or are they subjected to change?
Might happen that due any fallback or merge in an article history, ID would change? E.g. as test article "Mars" would first point to a version _ID ="4285430" and then changed to "14640471"
I need to ensure _IDs will persist. thank you!
On Mon, Jan 11, 2016 at 6:22 AM, Tilman Bayer tbayer@wikimedia.org wrote:
On Sun, Jan 10, 2016 at 4:05 PM, Bernardo Sulzbach < mafagafogigante@gmail.com> wrote:
On Sun, Jan 10, 2016 at 9:55 PM, Neil Harris neil@tonal.clara.co.uk wrote:
Hello! I've noticed that no enwiki dump seems to have been generated
so far
this month. Is this by design, or has there been some sort of dump
failure?
Does anyone know when the next enwiki dump might happen?
I would also be interested.
-- Bernardo Sulzbach
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
CCing the Xmldatadumps mailing list https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l, where someone has already posted https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-January/001214.html about what might be the same issue.
-- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
-- *Luigi Assom* Founder & CEO @ XDiscovery - Crazy on Human Knowledge *Corporate* www.xdiscovery.com *Mobile App for knowledge Discovery* APP STORE http://tiny.cc/LearnDiscoveryApp | PR http://tiny.cc/app_Mindmap_Wikipedia | WEB http://www.learndiscovery.com/
T +39 349 3033334 | +1 415 707 9684
-- *Luigi Assom*
T +39 349 3033334 | +1 415 707 9684 Skype oggigigi
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
I have problems bunzip2ing pages-articles files. WinRAR fails at 37G, and bunzip2 fails at some point >> 14g though it "helpfully" cleans up after itself.
Bunzip2 v 1.0.6
bunzip2 enwiki-20151201-pages-articles.xml.bz2
bunzip2: I/O or other error, bailing out. Possible reason follows. bunzip2: Permission denied
Input file = enwiki-20151201-pages-articles.xml.bz2, output file = enwiki-20151201-pages-articles.xml
bunzip2: Deleting output file enwiki-20151201-pages-articles.xml, if it exists.
Any better tool?
Have you tried 7zip ?
On Fri, Jan 15, 2016 at 8:30 PM, Richard Farmbrough < richard@farmbrough.co.uk> wrote:
I have problems bunzip2ing pages-articles files. WinRAR fails at 37G, and bunzip2 fails at some point >> 14g though it "helpfully" cleans up after itself.
Bunzip2 v 1.0.6
bunzip2 enwiki-20151201-pages-articles.xml.bz2
bunzip2: I/O or other error, bailing out. Possible reason follows. bunzip2: Permission denied
Input file = enwiki-20151201-pages-articles.xml.bz2, output file =
enwiki-20151201-pages-articles.xml
bunzip2: Deleting output file enwiki-20151201-pages-articles.xml, if it exists.
Any better tool?
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Hello, Richard.
I've had some problems with the December dumps myself. I will not guarantee it for you, but if you download and test it, September will work. Good luck. It's not about the tool, the problem seems to be the dump file.
Thanks for the responses, it's given me a few things to try.
Various people wrote...
On 16/01/16 02:30, Richard Farmbrough wrote:
I have problems bunzip2ing pages-articles files. WinRAR fails at 37G, and bunzip2 fails at some point >> 14g though it "helpfully" cleans up after itself.
Bunzip2 v 1.0.6
bunzip2 enwiki-20151201-pages-articles.xml.bz2
bunzip2: I/O or other error, bailing out. Possible reason follows. bunzip2: Permission denied
Input file = enwiki-20151201-pages-articles.xml.bz2, output file = enwiki-20151201-pages-articles.xml
bunzip2: Deleting output file enwiki-20151201-pages-articles.xml, if it exists.
Any better tool?
a) Did you start by verifying the checksum of the downloaded file?
b) The "Permission denied" message looks like a filesystem problem, and not one of the file. What fs are you using? I would guess you did that on a directory you didn't have write access to, but then it wouldn't process 14G.
c) There are some tricks to avoid the cleanup (like using bzcat), and there's also bzip2recover, but if the original file is damaged, there's no point in attempting to recover, when a new one can be produced.
I'll check if that file uncompresses for me.
Best regards
On 16/01/16 02:44, Platonides wrote:
On 16/01/16 02:30, Richard Farmbrough wrote:
I have problems bunzip2ing pages-articles files. WinRAR fails at 37G, and bunzip2 fails at some point >> 14g though it "helpfully" cleans up after itself.
Bunzip2 v 1.0.6
bunzip2 enwiki-20151201-pages-articles.xml.bz2
bunzip2: I/O or other error, bailing out. Possible reason follows. bunzip2: Permission denied
Input file = enwiki-20151201-pages-articles.xml.bz2, output file = enwiki-20151201-pages-articles.xml
bunzip2: Deleting output file enwiki-20151201-pages-articles.xml, if it exists.
Any better tool?
(...) I'll check if that file uncompresses for me.
I downloaded the file enwiki-20160113-pages-articles.xml.bz2 (see hash below), and it decompressed without errors:
$ time sha256sum enwiki-20160113-pages-articles.xml.bz2 560537c3c41397856c7108287de2a8f917ad8ee2586d1d5e43a0edd4c5bc28d5 enwiki-20160113-pages-articles.xml.bz2
$ time bzip2 -kd enwiki-20160113-pages-articles.xml.bz2
real 26m48.517s user 23m5.264s sys 0m33.960s
$ tail enwiki-20160113-pages-articles.xml;echo
[[:Category:Articles created via the Article Wizard]]
http://www.npg.org.uk/collections/search/use-this-image.php?mkey=mw87648
http://www.npg.org.uk/collections/search/use-this-image.php?email=Jonathanai...</text> <sha1>ejicf3kiesjaya50u6fwwziz5incohv</sha1> </revision> </page> </mediawiki>
I can provide partial hashes, or a rdiff(1) patch for whatever file you ended up with, so you don't have to redownload the full 12G again.
Best regards
On Fri, Jan 15, 2016 at 11:44 PM, Platonides platonides@gmail.com wrote:
On 16/01/16 02:30, Richard Farmbrough wrote:
I have problems bunzip2ing pages-articles files. WinRAR fails at 37G, and bunzip2 fails at some point >> 14g though it "helpfully" cleans up after itself.
Bunzip2 v 1.0.6
bunzip2 enwiki-20151201-pages-articles.xml.bz2
bunzip2: I/O or other error, bailing out. Possible reason follows. bunzip2: Permission denied
Input file = enwiki-20151201-pages-articles.xml.bz2, output file = enwiki-20151201-pages-articles.xml
bunzip2: Deleting output file enwiki-20151201-pages-articles.xml, if it exists.
Any better tool?
I'll check if that file uncompresses for me.
Platonides, we were both talking about 20151201 and you tested 20160113, am I correct?
I said I reproduced the mentioned problem with that file, not that all files were problematic.
On 18/01/16 00:36, Bernardo Sulzbach wrote:
Platonides, we were both talking about 20151201 and you tested 20160113, am I correct?
I said I reproduced the mentioned problem with that file, not that all files were problematic.
Whoops. Sorry :(
Seems I didn't notice and just clicked to download the last file. Excuse my dumbness.
Still… may I suggest as a solution to use the newer file? :)
Regards
Sure, I did suggest September, a dump I never had problems with. However, I did not test latest English myself.
However, if _anyone could confirm the issue with English December_, **let's remove it from the page**?
This **matters** for developers that rely on these dumps and will reduce server load (if the developer did not give up, he will download another dump). This is a honest request.
I did not have problems unziping the file enwiki-20151201-pages-articles.xml.bz2
This was done last month, on linux and on my home computer. My program has run multiple times on the file and looks at each entry, so no obvious signs of corruption.
Bryan
On Sun, Jan 17, 2016 at 5:03 PM, Bernardo Sulzbach < mafagafogigante@gmail.com> wrote:
Sure, I did suggest September, a dump I never had problems with. However, I did not test latest English myself.
However, if _anyone could confirm the issue with English December_, **let's remove it from the page**?
This **matters** for developers that rely on these dumps and will reduce server load (if the developer did not give up, he will download another dump). This is a honest request.
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
On Mon, Jan 18, 2016 at 4:04 AM, Bryan White bgwhite@gmail.com wrote:
I did not have problems unziping the file enwiki-20151201-pages-articles.xml.bz2
This was done last month, on linux and on my home computer. My program has run multiple times on the file and looks at each entry, so no obvious signs of corruption.
Bryan
OK. I will assume I was lucky enough to get a corrupted file after download twice.
Bernardo Sulzbach mafagafogigante@gmail.com wrote:
I did not have problems unziping the file enwiki-20151201-pages-articles.xml.bz2
This was done last month, on linux and on my home computer. My program has run multiple times on the file and looks at each entry, so no obvious signs of corruption.
OK. I will assume I was lucky enough to get a corrupted file after download twice.
Did you verify the checksum of the downloaded file?
Tim
I strongly believe so, but I didn't kept the files so I can't do it now to confirm. Also, that Linux installation is gone.
I have seen this problem [with 20151201] occur three times: once in this list (Richard F.), once on my local machine (Bernardo S.) and once in a JIRA (was it JIRA?) tracker. So I **strongly** believe there is something wrong with that specific file.
However, I wouldn't be surprised if it was an issue with all these three machines.
If you could decompress it correctly (20151201), don't mind my report. I just would want it removed if it was confirmed to be a problematic file.
On Wed, Jan 20, 2016 at 7:10 PM, Bernardo Sulzbach mafagafogigante@gmail.com wrote:
If you could decompress it correctly (20151201), don't mind my report. I just would want it removed if it was confirmed to be a problematic file.
The file that I downloaded from dumps.wikimedia.org (208.80.154.11) earlier today was uncompressed by bunzip2 without any problem. Below are the sizes and sha256 checksum.
HTH and regards, GG
--- 8< --- dumps.wikimedia.org is an alias for dataset1001.wikimedia.org. dataset1001.wikimedia.org has address 208.80.154.11 dataset1001.wikimedia.org has IPv6 address 2620:0:861:1:208:80:154:11
12224724 -rw-rw-r-- 1 user user 12518110596 Jan 23 11:55 enwiki-20151201-pages-articles.xml.bz2
$ sha256sum enwiki-20151201-pages-articles.xml.bz2 02547de56314c5243f834605214d99560f05b1aad787e4db25f804a861946a86 enwiki-20151201-pages-articles.xml.bz2
$ bunzip2 -v enwiki-20151201-pages-articles.xml.bz2 enwiki-20151201-pages-articles.xml.bz2: done
$ ls ... 54204496 -rw-rw-r-- 1 user user 55505383577 Jan 23 11:55 enwiki-20151201-pages-articles.xml
$ tail -n 10 enwiki-20151201-pages-articles.xml [[Category:Grade II* listed buildings in Hertfordshire]] [[Category:Grade II* listed houses]] [[Category:Country houses in Hertfordshire]] [[Category:Works of Edwin Lutyens]] [[Category:Arts and Crafts gardens]] [[Category:Gardens in Hertfordshire]]</text> <sha1>cpk3vrjwvzk1r1lnllemz4tmd2ks2ly</sha1> </revision> </page> </mediawiki> --- >8 ---
Hi G. Gonter,
Apparently there was an issue with the multistream one (although I am quite sure that I had problems with the "other one" myself, twice).
You are the second (maybe third) person to report that could decompress it OK, so it possibly was a problem with my machine. I don't even have any logs in there to see which of the files was the problem (the relevant volumes have been formatted and I won't bother to try recover that).
Thanks for your efforts on this, for more, please see https://phabricator.wikimedia.org/T121348
Bernardo Sulzbach, 20/01/2016 19:10:
If you could decompress it correctly (20151201), don't mind my report. I just would want it removed if it was confirmed to be a problematic file.
This is soon unneeded as new dumps are being generated: https://dumps.wikimedia.org/enwiki/20160113/
The issue was identified and fixed https://phabricator.wikimedia.org/T121348 hence it's expected that the new dump will be ok.
Nemo
P.s.: Going back to the original summary, as this is not about bz2 tools.
In fact the multistream dump from December was regenerated and put in place today with new md5 and sha1 sums, based on the new code. Enjoy!
Ariel
On Wed, Jan 20, 2016 at 8:21 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Bernardo Sulzbach, 20/01/2016 19:10:
If you could decompress it correctly (20151201), don't mind my report. I just would want it removed if it was confirmed to be a problematic file.
This is soon unneeded as new dumps are being generated: https://dumps.wikimedia.org/enwiki/20160113/
The issue was identified and fixed https://phabricator.wikimedia.org/T121348 hence it's expected that the new dump will be ok.
Nemo
P.s.: Going back to the original summary, as this is not about bz2 tools.
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Thank you all very much for the good work.
xmldatadumps-l@lists.wikimedia.org