[Wikitech-l] Re: Yahoo! XML feed

2 Apr 2004

      Jason Richey wrote:
...

Someone would look at it (I attached it) and say "this sucks
because..."

OK, since you asked for it... :)
...
$sql = "SELECT cur_title as title from cur where cur_namespace=0";
This query sucks big time.
Do you know what this does? This retrieves the titles of ALL ARTICLES in 
Wikipedia. Do you know how many there are? ...
The Main Page states 239180, and that's only articles that meet certain 
criteria...
...
$data = getPageData($s->title);	

It seems that getPageData() retrieves the text of a page. In other 
words, it performs yet another database query. And you're calling that 
FOR EVERY ARTICLE in Wikipedia!
I'm afraid I don't understand the purpose of the script. It seems to me 
that it is generating one ridiculously huge file that contains all of 
Wikipedia. What use would such a file be to anyone, even Yahoo?
I stress I don't really understand the purpose of the script, nor do I 
know exactly what Yahoo!'s (or anyone else's) requirements are, but it 
would seem way more sensible to me to have several smaller files, each 
of which containing maybe at most 100 articles or perhaps at most 1 MB 
of data or something. Each file should then contain a list of cur_ids, 
and then you can easily check for each file if any of the articles 
therein have changed since the last update.
Of course, that's just a suggestion.
Greetings,
Timwi

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: Yahoo! XML feed