I've implemented a bot, which for Ruwiki solves the problem of lonelypages
and even isolated articles. The query you've just cited (an analogue to be
honest) is the key point there as well and anyway there is a possibility to
make it faster than one you cited on the TS. The problem here is that we
deal with views, not the db itself, so this blocks further optimization
somehow.
If you are interested in the implementation, refer to
https://fisheye.toolserver.org/browse/golem/isolated/namespacer.sql,
especially the place where pl table is created and where p0 table is created
(cache_namespace_pages function).
Mashiah
2008/11/26 Ilmari Karonen <nospam(a)vyznev.net>
Ja Ga wrote:
I was investigating a better version of
Special:Lonelypages (ignore
disambig pages, exclude the already tagged, identify 1- and 2- link
orphans as well as no-link orphans, etc.) but when I run the original
Lonelypages SQL query on nightshade against enwiki_p it times out after
10 minutes.
SELECT page_namespace, page_title FROM page LEFT JOIN pagelinks ON
page_namespace=pl_namespace AND page_title=pl_title WHERE pl_namespace
IS NULL AND page_namespace=".NS_MAIN." AND page_is_redirect=0;
I think the problem is simply that, no matter how you optimize it, this
is a slow query: it has to go through every page in the main namespace
and check if it has incoming links or not. Adding a limit might help,
but not much: unlinked pages are fairly rare, so it still has to search
through a _lot_ of pages just to find a few dozen unlinked ones.
Anyway, I just tried it with "LIMIT 50", and it finished in two minutes.
So it certainly works, it just takes time. I'd suggest just running
it with /* SLOW_OK */ and session transaction isolation level set to
"READ UNCOMMITTED".
--
Ilmari Karonen
_______________________________________________
Toolserver-l mailing list
Toolserver-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/toolserver-l