Hello all,
there is a project in Russian Wikipedia to analyze all articles with
scrict quality requirements (e.g. at least 500 symbols, or at least
1500 symbols for {{stub}}s, or at least 3 internal links and one
external, or at least one section subheader etc.). The result should
be:
* Total number of "normal articles"
* Lists of articles filtered by each requirement
To do so, I should either make a long query to iterate through all
articles or running small queries like 'SELECT page_title, page_latest
FROM page WHERE page_title > ? ORDER BY page_title LIMIT 1' (with
substitution of previous page_title fetched).
The problem is that the first way is much more efficient but I'm not
sure that someone will not kill this query.
--
Edward Chernenko <edwardspec(a)gmail.com>
It would appear that the dewiki replica on toolserver is experiencing
significant corruption.
SELECT count(el_to) FROM externallinks JOIN page ON el_from=page_id
WHERE page_title='Fabrixx' AND page_namespace=0;
returns 1346 rows
I just loaded the recent dumps locally and same query returns 13 rows,
which appears to be the correct result.
This is just an example, the condition occurs on many pages. Either
*links rows have not been deleted.. or they have been inserted to the
wrong id or some other problem which would cause a relational
integrity violation with respect to page table.
I would be surprised if this corruption was limited to externallinks.