I've implemented a bot, which for Ruwiki solves the problem of lonelypages and even isolated articles. The query you've just cited (an analogue to be honest) is the key point there as well and anyway there is a possibility to make it faster than one you cited on the TS. The problem here is that we deal with views, not the db itself, so this blocks further optimization somehow.
If you are interested in the implementation, refer to https://fisheye.toolserver.org/browse/golem/isolated/namespacer.sql, especially the place where pl table is created and where p0 table is created (cache_namespace_pages function).
Mashiah
2008/11/26 Ilmari Karonen nospam@vyznev.net
Ja Ga wrote:
I was investigating a better version of Special:Lonelypages (ignore disambig pages, exclude the already tagged, identify 1- and 2- link orphans as well as no-link orphans, etc.) but when I run the original Lonelypages SQL query on nightshade against enwiki_p it times out after 10 minutes.
SELECT page_namespace, page_title FROM page LEFT JOIN pagelinks ON page_namespace=pl_namespace AND page_title=pl_title WHERE pl_namespace IS NULL AND page_namespace=".NS_MAIN." AND page_is_redirect=0;
I think the problem is simply that, no matter how you optimize it, this is a slow query: it has to go through every page in the main namespace and check if it has incoming links or not. Adding a limit might help, but not much: unlinked pages are fairly rare, so it still has to search through a _lot_ of pages just to find a few dozen unlinked ones.
Anyway, I just tried it with "LIMIT 50", and it finished in two minutes. So it certainly works, it just takes time. I'd suggest just running it with /* SLOW_OK */ and session transaction isolation level set to "READ UNCOMMITTED".
-- Ilmari Karonen
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l