[replies to several messages here] On 10/24/07, GerardM gerard.meijssen@gmail.com wrote:
I am afraid that as the number of articles grows, the existence of redirects becomes increasingly problematic because more and more disambiguation will be needed. Existing redirects are not considered when disambiguation is implemented. Redirects ARE problematic and by automagically creating a vast number of more redirects it becomes even more of a nightmare.
I assume you're talking about the possibility of implementing aliases as genuine redirects: yes, that would cause problems. However, if they were "automagically created", presumably they can be "automagically destroyed". I haven't really thought through this idea much - it scares me.
If not, there is no problem, and the existence of aliases will in fact tend to reduce the number of aliases. A given article with 5 redirects will probably be replaced by just the article and 5 aliases, which must be much less expensive to store - a maximum of 6 table entries in my scheme, with no article text to consider.
Brianna wrote:
For anyone who is considering implementing something like this, please give some thought to how it could work in a multilingual context (defining the equivalent name in other languages)
I don't quite understand - are you talking about interwiki links? Or do you mean, non-latin character sets? Could you be more specific, perhaps with a example problem?
and also for categories (that might be pushing it...).
Do you mean, aliases to categories? Would be nice, but even redirects to categories don't work properly yet. One problem at a time, I think :)
Andrew wrote:
No need for the complex setup you envisiage. For mysql, at least, we could create a new table 'article_aliases', and "select aa_page from article_aliases where 'my_title' like aa_alias". Of course, we'd need to do some built-in, potentially expensive checking on the aliases that would be originally introduced, like checking if any other pages match the regex (if so, block the alias), and if the article title itself matches the regex (if not, block the alias).
Mysql supports regexp-based matches? If so, cool. I only know SQL server which, last time I checked, only supports wildcards, which wouldn't be strong enough. The main reason for my complex scheme is that the two endpoints seem expensive:
Zero-expansion endpoint: every incoming query that doesn't match any real article titles has to be compared against a very large number of aliases - expensive on query time. Complete-expansion endpoint: every alias pattern has to be fully expanded into all the possible matching queries. Say there were 3 million pages (en wiki including non-articles?) with an average of 5 aliases (presumably there will be more aliases than currently there are redirects, because they're so easy to make), that's 15 million entries in a table. That seems expensive, but perhaps not?
Then again, the table would have to be somewhere between 3 million and 15 million entries in that case anyway, so....
My lack of experience with large databases on serious hardware forces me to shut up now and revert to my original appeal for someone more knowledgeable to handle the feasibility side of things :)
Steve