[replies to several messages here]
On 10/24/07, GerardM <gerard.meijssen(a)gmail.com> wrote:
I am afraid that as the number of articles grows, the existence of
redirects
becomes increasingly problematic because more and more disambiguation will
be needed. Existing redirects are not considered when disambiguation is
implemented. Redirects ARE problematic and by automagically creating a
vast
number of more redirects it becomes even more of a nightmare.
I assume you're talking about the possibility of implementing aliases as
genuine redirects: yes, that would cause problems. However, if they were
"automagically created", presumably they can be "automagically
destroyed". I
haven't really thought through this idea much - it scares me.
If not, there is no problem, and the existence of aliases will in fact tend
to reduce the number of aliases. A given article with 5 redirects will
probably be replaced by just the article and 5 aliases, which must be much
less expensive to store - a maximum of 6 table entries in my scheme, with no
article text to consider.
Brianna wrote:
For anyone who is considering implementing something
like this, please
give some thought to how it could work in a multilingual context
(defining the equivalent name in other languages)
I don't quite understand - are you talking about interwiki links? Or do you
mean, non-latin character sets? Could you be more specific, perhaps with a
example problem?
and also for categories (that might be pushing it...).
Do you mean, aliases to categories? Would be nice, but even redirects to
categories don't work properly yet. One problem at a time, I think :)
Andrew wrote:
No need for the complex setup you envisiage. For mysql,
at least, we
could create a new table 'article_aliases', and "select aa_page from
article_aliases where 'my_title' like aa_alias". Of course, we'd need
to do some built-in, potentially expensive checking on the aliases
that would be originally introduced, like checking if any other pages
match the regex (if so, block the alias), and if the article title
itself matches the regex (if not, block the alias).
Mysql supports regexp-based matches? If so, cool. I only know SQL server
which, last time I checked, only supports wildcards, which wouldn't be
strong enough. The main reason for my complex scheme is that the two
endpoints seem expensive:
Zero-expansion
endpoint: every incoming query that doesn't match any real article
titles has to be compared against a very large number of aliases -
expensive on query time.
Complete-expansion endpoint: every alias pattern has to be fully expanded
into all the possible matching queries. Say there were 3 million
pages (en wiki including non-articles?) with an average of
5 aliases (presumably there will be more aliases than currently there
are redirects, because they're so easy to make), that's 15 million
entries in a table. That seems expensive, but perhaps not?
Then again, the table would have to be somewhere between 3 million and 15
million entries in that case anyway, so....
My lack of experience with large databases on serious hardware forces me to
shut up now and revert to my original appeal for someone more knowledgeable
to handle the feasibility side of things :)
Steve