http://dumps.wikimedia.org/enwiki/20110115/
Hi, has anyone got plans to create individual torrents for "All pages with
complete page edit history (.bz2)" ? I downloaded them and turns out I have
several files that seem to be corrupted. I am unable to re-download them but
feel the torrent would be able to fix the corrupted parts. All of the
individual parts for the dumps except 1st,8th,9th,10th ones are complete.
I need these dumps because I will analyse revisions in hopes of better
identifying vandalism on the wikis through machine learning. I however need
the database to process this soon as my assignment is due in about a month.
I have done a small amount of testing, the tests look good. Acccordingly
I have started up one process to do dumps; please get your eyeballs on
them and let me know thumbs up or down. I'd like to start up the rest
of the processes by tomorrow at this time so if you can squeeze in some
time to look at them sooner rather than later that would be awesome.
Thanks!
Ariel
p.s. Yes this means I am done travelling for a while, thank goodness. I
think I am sick of airplanes. And *very* sick of jet lag.
Hello,
As you know, Chinese contains two similar language: "Traditional Chinese"
and "The simplified Chinese" , but it's hard to do translation between them
correctly. I know Wiki can do this translation properly. I think why not
release "Traditional Chinese" Dump and "The simplified Chinese" Dump, rather
than together. This can save a lot of time for Chinese language
researchers.
Thanks. Just a serious advice!
A little bit before the scheduled deployment of the 1.17 branch on our
production servers, I will be halting production of XML dumps.
Deployment is set for Tuesday Feb 8 at 07:00 UTC, so a few hours before
that I'll start shutting down processes.
This is a precautionary measure; after the deployment and any hasty
fixes that may be needed, I will be doing some testing to ensure that
dumps are not impacted, before we restart them. Barring some bizarre
problem, we should be back up and running within a day or two.
Ariel
Hello,
there seems to be a problem with the current jawiki-dump. The size of the complete history dump is only 4.3 GB, but the size of the dump before was 19 GB.
Another issue: Acccording to http://wikitech.wikimedia.org/view/Dumps#Worker_nodes there shoulde be 3 threads for the large dumps, but since a few days there are only 2 running threads.
Best regards,
Andreas
--
NEU: FreePhone - kostenlos mobil telefonieren und surfen!
Jetzt informieren: http://www.gmx.net/de/go/freephone