[Wikipedia-l] "Wanted pages" fix

Jan.Hidders hidders at uia.ua.ac.be
Mon Jul 8 21:43:03 UTC 2002


On Mon, Jul 08, 2002 at 01:03:45PM -0700, lcrocker at nupedia.com wrote:
> One of the things I found was that the present query for Wanted pages 
> counts only the distinct pages with a broken link to the wanted page, 
> even if that page has two or more broken links to the same title.  It 
> seems to me that's not important--I'd just as soon have it count all 
> links so I know how many to fix, and that's just as good a metric 
> of "wantedness", I think.  And it's not very different in any case--
> multiple broken links to the same title on one page are rare.
> 
> Changing it to count all links speeds it up quite a bit (from 30-40 
> seconds to 6-7).

There's two things I don't understand here. At the test site this query now
seems to take around 7 seconds. So did you already implement this there?
Secondly, I don't see why counting links should be easier unless you are
doing something really weird like allowing duplicates in the table
'brokenlinks'. You are still using this table, are you?

>  Also, I'm throwing away all wanted pages with only 
> a single link--that reduces the size of the temp file needed for 
> sorting by number of links.

Temp file? Are you now doing the sorting yourself?

-- Jan Hidders



More information about the Wikipedia-l mailing list