Hello.
Since just a few hours ago, a new public repository has been created to host WikiXRay database dumps, containing info extracted from public Wikipedia dbdumps. The image is hosted by RedIRIS (in short, the Spanish equivalent of Kennisnet in Netherlands).
http://sunsite.rediris.es/mirror/WKP_research
ftp://ftp.rediris.es/mirror/WKP_research
These new dumps are aimed to save time and effort to other researchers, since they won't need to parse the complete XML dumps to extract all relevant activity metadata. We used mysqldump to create the dumps from our databases..
As of today, only some of the biggest Wikipedias are available. However, in the following days the full set of available languages will be ready for downloading. The files will be updated regularly.
The procedure is as follows:
1. Find the research dump of your interest. Download and decompress it in your local system.
2. Create a local DB to import the information.
3. Load the dump file, using a MySQL user with insert privileges:
$> mysql -u user -p passw myDB < dumpfile.sql
And you're done.
Final warning. 3 fields in the revision table are not reliable yet:
rev_num_inlinks rev_num_outlinks rev_num_trans
All remaining fields/values are trustable (in particular rev_len, rev_num_words, and so forth).
Regards,
Felipe.
Dear Felipe, Is there a newer version of the english dump?
bilal
On Tue, Jun 23, 2009 at 12:26 PM, Felipe Ortega glimmer_phoenix@yahoo.eswrote:
Hello.
Since just a few hours ago, a new public repository has been created to host WikiXRay database dumps, containing info extracted from public Wikipedia dbdumps. The image is hosted by RedIRIS (in short, the Spanish equivalent of Kennisnet in Netherlands).
http://sunsite.rediris.es/mirror/WKP_research
ftp://ftp.rediris.es/mirror/WKP_research
These new dumps are aimed to save time and effort to other researchers, since they won't need to parse the complete XML dumps to extract all relevant activity metadata. We used mysqldump to create the dumps from our databases..
As of today, only some of the biggest Wikipedias are available. However, in the following days the full set of available languages will be ready for downloading. The files will be updated regularly.
The procedure is as follows:
- Find the research dump of your interest. Download and decompress it in
your local system.
Create a local DB to import the information.
Load the dump file, using a MySQL user with insert privileges:
$> mysql -u user -p passw myDB < dumpfile.sql
And you're done.
Final warning. 3 fields in the revision table are not reliable yet:
rev_num_inlinks rev_num_outlinks rev_num_trans
All remaining fields/values are trustable (in particular rev_len, rev_num_words, and so forth).
Regards,
Felipe.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi Felipe,
Thank you for your work.
As far as I understood from the project page there are only page revisions and not other logged actions like protects, user deletions, etc. in the WikiXRay-Dump. Am I right about that?
Best greetings
Marc
wiki-research-l@lists.wikimedia.org