2006/9/7, Platonides platonides@gmail.com:
What about doing it locally with a dump? It seems much more efficient to me.
Good idea but I think that dump should be placed outside my account: 1. other users can use it for tasks which doesn't require making complex SQL queries; 2. I have 256 Mb disk quota while ruwiki dump is about 400 Mb.
2006/9/7, Gregory Maxwell gmaxwell@gmail.com
Are you talking about a query that will be run once or a query that will be executed from a cgi script.
No, that will be run manually (or using cron - one time per day).
select page_namespace, page_title from page; on ruwiki_p takes under a second... I wouldn't call that a long query.
Not all rows of result are fetched right after executing the query. Normal 'mysql' application receives all rows, prints it and exits. My application need (after getting one row of result) to:
1. make one more sql query: fetch page text SELECT old_text, old_flags FROM text WHERE old_id = (SELECT rev_text FROM revision WHERE rev_id = ? ) (where '?' is page_latest from first query) 2. uncompress text if there is 'gzip' in old_flags. 3. analyze text (that's fast, we can ignore this step).
As you can see, there is a small pause between fetching rows of result from first query. If this pause is only 0.05 seconds, the first query will be finished after ~ 83 minutes (for 100000 articles of ruwiki). For all this time this first query will be shown as being in progress (while not consuming real resources).