On 10/22/07, William Pietri william@scissor.com wrote:
You know that these things only change on save, so at that point you look at the difference between the old aliases and the new and update the master set. Computationally, it's only a smidgen more expensive than our current approach. And given that we're such a read-heavy environment, unnoticeably so.
Yeah, I did some more thinking about this in bed and thought about a possible implementation.
The wiki code would be a single line, like #ALIASES [City of ][Greater ]Melbourne[, Victoria| (Australia)]
Two tables would store all the aliases: one would store the raw patterns, and another would expand them, possibly only partially. You could expand say the first 5 characters: "City ","of [Greater ]Melbourne[, Victoria| (Australia)]" "Great", "er Melbourne[, Victoria| (Australia)]" "Melbo", "urne[, Victoria| (Australia)]
3 entries. That way, once a user types an actual request (say, "Greater Melbourne"), you just look up the first 5 characters ("Great"), then iterate over the matches there. There are lots of algorithms and data structures that would help here.
(And that's without addressing the issue of duplicates. A redirect can only point to one article, a given string can match the regexps in many articles. Automated disambig pages, perhaps?)
That'd be a great way to solve that. And the main bit could be done as automatically updating our redirect pages. As a first pass, anyhow.
Omg, automated disambig pages. Yes please! Maintaining disambiguation pages is horribly time consuming. You could conceive of another keyword like "{{disambigtext|Second largest city in Australia.}}" that would be shown where necessary. But I'm getting ahead of myself.
From a user experience perspective, I'd be a little worried about putting more mysterious Wiki markup at the top of a page. On another wiki I'm working on, we're moving more of this metadata outside the markup and to specialized UIs, so that it doesn't clutter the edit box.
So put it at the bottom, next to {{DEFAULTSORT}}. I do agree that location-independent metadata should be separated from content though. Categories and interwikis fall into that category too.
I think the only real abuse potential comes from either putting in a giant list or trying to redirect in a bunch of existing articles. But one you can catch with a size limit, and the other you could fix by refusing to mess with real articles.
Most likely from having a redirect which expands to too many possibilities, like [A|b|c|d|e][A|b|c|d|e][A|b|c|d|e][A|b|c|d|e]. But that would be easily catchable. The trouble is what to do about it, besides failing silently. Perhaps reject it into a special page that admins can browse from time to time?
So Steve, I'd say it's a great idea. However, I'd want to do some user testing. Since I've been doing regular expressions for so long, they make instant sense to me, but even this limited version might be too mysterious for most of our editors. Perhaps the special UI would show them the list of generated alternatives as they edit?
Well the great thing with such a limited expression language is that there's very little to learn, and very little to stuff up. And even better, users can just use the most naive approach imaginable. So, while a CS major would readily write an expression like:
#ALIASES [Dr] Grace [Smith|Jones]
A beginner user might simply write:
#ALIASES [Dr Grace Smith|Dr Grace Jones|Grace Smith|Grace Jones]
or even: #ALIASES Dr Grace Smith #ALIASES Dr Grace Jones #ALIASES Grace Smith #ALIASES Grace Jones
A UI tool would obviously help, but that would be a slight departure for MediaWiki. There's nothing else like that atm (afaik), so it's hard to picture how it would fit in exactly.
Steve