One of the things I found was that the present query for Wanted pages counts only the distinct pages with a broken link to the wanted page, even if that page has two or more broken links to the same title. It seems to me that's not important--I'd just as soon have it count all links so I know how many to fix, and that's just as good a metric of "wantedness", I think. And it's not very different in any case-- multiple broken links to the same title on one page are rare.
Changing it to count all links speeds it up quite a bit (from 30-40 seconds to 6-7). Also, I'm throwing away all wanted pages with only a single link--that reduces the size of the temp file needed for sorting by number of links. If we ever get to the point where those will be useful, we'll make a feature for them.
At any rate, tell me if you either of those changes is a real problem. 0
On Mon, Jul 08, 2002 at 01:03:45PM -0700, lcrocker@nupedia.com wrote:
One of the things I found was that the present query for Wanted pages counts only the distinct pages with a broken link to the wanted page, even if that page has two or more broken links to the same title. It seems to me that's not important--I'd just as soon have it count all links so I know how many to fix, and that's just as good a metric of "wantedness", I think. And it's not very different in any case-- multiple broken links to the same title on one page are rare.
Changing it to count all links speeds it up quite a bit (from 30-40 seconds to 6-7).
There's two things I don't understand here. At the test site this query now seems to take around 7 seconds. So did you already implement this there? Secondly, I don't see why counting links should be easier unless you are doing something really weird like allowing duplicates in the table 'brokenlinks'. You are still using this table, are you?
Also, I'm throwing away all wanted pages with only a single link--that reduces the size of the temp file needed for sorting by number of links.
Temp file? Are you now doing the sorting yourself?
-- Jan Hidders
On 8 Jul 2002, at 13:03, lcrocker@nupedia.com wrote:
One of the things I found was that the present query for Wanted pages counts only the distinct pages with a broken link to the wanted page, even if that page has two or more broken links to the same title. It seems to me that's not important--I'd just as soon have it count all links so I know how many to fix, and that's just as good a metric of "wantedness", I think. And it's not very different in any case-- multiple broken links to the same title on one page are rare.
Incidently does the system count broken links from #redirects once or does it count all the links to the page with the #redirect on it ?
Imran
wikipedia-l@lists.wikimedia.org