MySQL has a facility for distributing databases. It sounds as though you all are trying to solve a problem at the application level, when it is a data distribution, caching and integrity problem. And this is exactly the kind of problem that databases can solve.
If the wiki database is synchronized out, clients of that synchronization can do stuff for Yahoo or do whatever they want.
MySQL handles all this "changed since" checking and such. It also does it more efficiently than a PHP script. So, what problem are you all trying to solve?
- ray
On Apr 2, 2004, at 1:19 AM, Alfio Puglisi wrote:
On Fri, 2 Apr 2004, Brion Vibber wrote:
On Apr 2, 2004, at 00:35, Timwi wrote:
$sql = "SELECT cur_title as title from cur where cur_namespace=0";
This query sucks big time.
Do you know what this does? This retrieves the titles of ALL ARTICLES in Wikipedia.
That's kinda the point, yeah. It might be better to skip redirects, though; otherwise they should be handled in some distinct way.
It seems that getPageData() retrieves the text of a page. In other words, it performs yet another database query. And you're calling that FOR EVERY ARTICLE in Wikipedia!
Is it possible to leverage the already existing periodic database dump, for example importing it into some machine not live on the web, and generating there the necessary XML dumps and diffs? If it's only working on the cur table, it's not even so heavy on memory usage
Alfio
Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l