Hi!
I'd like to do some statistics based on the database dumps, so I'd need
a lot of more disk quota (50GB would be fine or at least 2GB for the
cur-dumps :-). It's better to only mirror dump files at one place so I
created the directory
/u01/u/voj/dumps
that is also accesible via
http://tools.wikimedia.de/~voj/dumps/
Unfourtunately I have not been able to add header and footer to the the
directory listing - I made .htaccess and readme readable and writable
for the whole users group - maybe someone can help? We can also move the
dumps directory to another location.
Since you can generate a huge amount of data when analysing the dumps I
wrote a lot of scripts to pipe the date from one task to another. Here
is first example:
/u01/u/voj/dumps/tools/catlinks.pl wp eo 20051009 | wc -l
By the way it's much more faster to bulk import tab-seperated data into
a MySQL database than using INSERT statements.
Greetings,
Jakob
I'd like to start back up with the Wikipedia Signpost RSS feed [1]. It's
python, runs once a day, and requires three modules: PyRSS2Gen,
BeautifulSoup, Scrape 'N' Feed
Will simply posting a request to this list be the standard way for handling
these things in the future?
[1] http://en.wikipedia.org/wiki/User:Alterego/signpost.py
--
Cheers,
Brian Mingus
AIM, Skype: BReflection
Google talk: reflection(a)gmail.com
Phone: (720) 771-2599