On 10/22/07, William Pietri <william(a)scissor.com> wrote:
You know that these things only change on save, so at that point you
look at the difference between the old aliases and the new and update
the master set. Computationally, it's only a smidgen more expensive than
our current approach. And given that we're such a read-heavy
environment, unnoticeably so.
Yeah, I did some more thinking about this in bed and thought about a
possible implementation.
The wiki code would be a single line, like #ALIASES [City of ][Greater
]Melbourne[, Victoria| (Australia)]
Two tables would store all the aliases: one would store the raw patterns,
and another would expand them, possibly only partially. You could expand say
the first 5 characters:
"City ","of [Greater ]Melbourne[, Victoria| (Australia)]"
"Great", "er Melbourne[, Victoria| (Australia)]"
"Melbo", "urne[, Victoria| (Australia)]
3 entries. That way, once a user types an actual request (say, "Greater
Melbourne"), you
just look up the first 5 characters ("Great"), then iterate over the
matches there. There are lots of algorithms and data
structures that would help here.
(And
that's without addressing the issue of
duplicates. A redirect can only point to one article, a given string
can match the regexps in many articles. Automated disambig pages,
perhaps?)
That'd be a great way to solve that. And the main
bit could be done as
automatically updating our redirect pages. As a first pass, anyhow.
Omg, automated disambig pages. Yes please! Maintaining disambiguation pages
is horribly time consuming. You could conceive of another keyword like
"{{disambigtext|Second largest city in Australia.}}" that would be shown
where necessary. But I'm getting ahead of myself.
From a user experience perspective, I'd be a little
worried about
putting more mysterious Wiki markup at the top of a page. On another
wiki I'm working on, we're moving more of this metadata outside the
markup and to specialized UIs, so that it doesn't clutter the edit box.
So put it at the bottom, next to {{DEFAULTSORT}}. I do agree that
location-independent metadata should
be separated from content though. Categories and interwikis fall into
that category too.
I think the only real abuse potential comes from either
putting in a
giant list or trying to redirect in a bunch of existing articles. But
one you can catch with a size limit, and the other you could fix by
refusing to mess with real articles.
Most likely from having a redirect which expands to too many possibilities,
like [A|b|c|d|e][A|b|c|d|e][A|b|c|d|e][A|b|c|d|e]. But that would be easily
catchable. The trouble is what to do about it, besides failing silently.
Perhaps reject it into a special page that admins can browse from time to
time?
So Steve, I'd say it's a great idea. However,
I'd want to do some user
testing. Since I've been doing regular expressions for so long, they
make instant sense to me, but even this limited version might be too
mysterious for most of our editors. Perhaps the special UI would show
them the list of generated alternatives as they edit?
Well the great thing with such a limited expression language is that there's
very little to learn, and very little to stuff up. And even better, users
can just use the most naive approach imaginable. So, while a CS major would
readily write an expression like:
#ALIASES [Dr] Grace [Smith|Jones]
A beginner user might simply write:
#ALIASES [Dr Grace Smith|Dr Grace Jones|Grace Smith|Grace Jones]
or even:
#ALIASES Dr Grace Smith
#ALIASES Dr Grace Jones
#ALIASES Grace Smith
#ALIASES Grace Jones
A UI tool would obviously help, but that would be a slight departure for
MediaWiki. There's nothing else like that atm (afaik), so it's hard to
picture how it would fit in exactly.
Steve