Hi, I don't know if this is the correct forum to be asking these questions. If not, please refer me to where I can.
1 - There seems to be a problem with the .bz2 dump file wikipedia provides. I tried to download it twice, and it aborted on an error after running for about 7 hours, at about 2GB. Does anyone know how I can know when the next dump will be available? Does wikipedia provide a new dump every so-and-so days?
2 - Is there anyone working with MVDumper.jar?
3 - Does the bump include ambiguous lists, categories and different types of lists? I have been trying to download it for two days, so I am curious to know what I'll find there.
Thanks Osnat
P Please consider the environment before printing this e-mail
On 10/25/07, Osnat Etgar osnate@relegence.co.il wrote:
Hi, I don't know if this is the correct forum to be asking these questions. If not, please refer me to where I can.
1 - There seems to be a problem with the .bz2 dump file wikipedia provides. I tried to download it twice, and it aborted on an error after running for about 7 hours, at about 2GB.
Does your download software support large files? Sounds like it doesn't.
1 - There seems to be a problem with the .bz2 dump file wikipedia provides. I tried to download it twice, and it aborted on an error after running for about 7 hours, at about 2GB.
Generally speaking, the answer to a problem like that is usually in the error message (although not always in a language human beings can understand, so if you don't understand the message, email it to this list and someone might be able to help).
1 - There seems to be a problem with the .bz2 dump file wikipedia provides. I tried to download it twice, and it aborted on an error after running for about 7 hours, at about 2GB.
The dump I pulled of ewiki from 2007-09-10 seems to work great here.
47483ed164a363f97595714afb929b0f enwiki-latest-pages-articles.xml.bz2
$ du -sch enwiki-latest-pages-articles.xml.bz2 2.9G enwiki-latest-pages-articles.xml.bz2 2.9G total
Does anyone know how I can know when the next dump will be available?
Just use the latest one available. Do your needs really _require_ that your copy stay in lockstep with the current Wikipedia public data?
Does wikipedia provide a new dump every so-and-so days?
By the looks of d.w.o, it's roughly every 2 weeks.
2 - Is there anyone working with MVDumper.jar?
I've been using mwdumper.jar exclusively for unpacking, prior to importing into my db.. works great. My version is slightly patched for adding table prefixes and some other mysql tweaks to speed up imports.
3 - Does the bump include ambiguous lists, categories and different types of lists? I have been trying to download it for two days, so I am curious to know what I'll find there.
Which dump are you trying to download? What filesystem are you saving the dump to? What tool are you using to retrieve that dump?
Osnat Etgar wrote:
Hi, I don't know if this is the correct forum to be asking these questions. If not, please refer me to where I can.
1 - There seems to be a problem with the .bz2 dump file wikipedia provides. I tried to download it twice, and it aborted on an error after running for about 7 hours, at about 2GB. Does anyone know how I can know when the next dump will be available? Does wikipedia provide a new dump every so-and-so days?
Are you downloading to a filesystem which allows files greater than 2 Gb?
PS: Downloads can be resumed.
On 10/25/07, Platonides Platonides@gmail.com wrote:
Are you downloading to a filesystem which allows files greater than 2 Gb?
And, as others have mentioned, does your download agent allow files greater than 2 GB?
Thanks to all.
I am now trying to download it again with a download tool
Wish me luck :)
I feel clueless at how to open these files. I downloaded a small .bz2 file, to see how it works.
In mediawiki MWDumper page
http://www.mediawiki.org/wiki/MWDumper#Example_of_using_mwdumper_with_a_ direct_connection_to_MySQL_on_WindowsXP
I tried to run the following commands:
set class=mwdumper.jar;mysql-connector-java-3.1.12/mysql-connector-java-3.1. 12-bin.jar
set data="fullpath_to_myfile.bz2"
java -client -classpath %class% org.mediawiki.dumper.Dumper "--output=mysql://127.0.0.1/myWikipediaDatabase?user=<myusername>&passwo rd=<mypassword>" "--format=sql:1.5" %data%
I bolded the variables I've replaced with my own
Exception in thread "main" java.io.IOException: com.mysql.jdbc.Driver
at org.mediawiki.dumper.Dumper.connectMySql(Unknown Source)
at org.mediawiki.dumper.Dumper.openOutputFile(Unknown Source)
at org.mediawiki.dumper.Dumper.main(Unknown Source)
I installed mysql-connector-java-5.0.4.
What am I doing wrong?
What is org.mediawiki.dumper.Dumper? Where do I get this file from? Am I supposed to have it after installing something?
I tried to run
java -server -classpath mysql-connector-java-3.1.11/mysql-connector-java-3.1.11-bin.jar:mwdumper .jar \ org.mediawiki.dumper.Dumper --output=mysql://127.0.0.1/testwiki?user=wiki&password=wiki \ --format=sql:1.4 20051020_pages_articles.xml.bz2
I don't have a java server installed at the moment so I tried to run with out it.
I got -
Exception in thread "main" java.lang.NoClassDefFoundError: org/mediawiki/dumper/Dumper
Is this something to do with the server? If it's necessary I will install it.
Thanks a lot
Osnat
wikitech-l@lists.wikimedia.org