On Dec 3, 2003, at 04:29, Brion Vibber wrote:
On Dec 3, 2003, at 04:07, Brion Vibber wrote:
I'm running a quick script to take out the
bad brokenlinks entries
(and clear linkscc). Obviously some problems in rebuildlinks
remain...
'remove-brokenlinks.php' in maintenance subdirectory in unstable cvs.
Output (I forgot to add headers saying which database is which, but
you can pretty much tell):
http://download.wikipedia.org/archives/brokenlinks.out.gz
Altogether 13528 bad brokenlinks items turned up. The script didn't
check which were or weren't also in the links table...
I've gone over rebuildlinks.inc and found what I think is the problem.
It goes through each page looking for links, then for each checks all
the links it hasn't seen before to see if the pages exist. In this bit
was a bug where it discarded the namespace data on the stuff returned
from the database; this could both put incorrect l_to target IDs into
the link tables and skip legitimate links, leaving them to get stuffed
into the brokenlinks table.
I've fixed this in CVS to take the namespace into account. If people
(in particular E23) could look it over and confirm it looks correct, we
can rebuild the live tables again.
-- brion vibber (brion @
pobox.com)