1 million redirects missing from redirect.sql file? - Wikitech-l

23 Oct 2007


      It looks to me like there are a large number (as many as 1 million)
redirects missing from the redirect.sql file.
My script extracts redirects from the redirect.sql file and the page id's
using the page.sql file.  Most of these pages can be resolved (about 1
million).  However, when I scan the page.sql file for page names which are
redirects which were never resolved to any relation in the
redirect.sqlfile, there is about 1 million more.
Here are some examples (the ones on the left are missing from redirect.sql)
which were derived from 20070908 but I believe the problem is not limited to
this date:
  Alstrom's syndrome -> Alstrom syndrome
  Tito's Handmade Vodka -> Tito's Vodka
  Titov_Drvar -> Drvar
Another experiment which seems to confirm this is that I can extract
2.4million redirects from the
page-articles.xml file, which is approximately the number of redirects I get
from redirect.sql + the number which seem missing according to page.sql.
Am I misunderstanding something?
A related question is why the redirect.sql file has the destination link as
a string and not as a page id?  The category-links.sql file does this also.
Is this just for readability, because it takes more effort to construct
linked databases.
I hope I have posted this in the right place.
thanks!!
John