On Fri, 2 Apr 2004, Brion Vibber wrote:
On Apr 2, 2004, at 00:35, Timwi wrote:
$sql = "SELECT cur_title as title from cur where cur_namespace=0";
This query sucks big time.
Do you know what this does? This retrieves the titles of ALL ARTICLES in Wikipedia.
That's kinda the point, yeah. It might be better to skip redirects, though; otherwise they should be handled in some distinct way.
It seems that getPageData() retrieves the text of a page. In other words, it performs yet another database query. And you're calling that FOR EVERY ARTICLE in Wikipedia!
Is it possible to leverage the already existing periodic database dump, for example importing it into some machine not live on the web, and generating there the necessary XML dumps and diffs? If it's only working on the cur table, it's not even so heavy on memory usage
Alfio