On 10/25/07, Simetrical Simetrical+wikilist@gmail.com wrote:
This would essentially be like regexes, but defined *without* the operation of iteration: only catenation and union are allowed. This is a large benefit because it means there are a finite number of possible patterns, and so they can be stored in enumerated form.
Yes, I'm undecided whether nesting (aka iteration) is a good idea or not. Quite possibly it's a good idea to force people to explicitly state all the variations they intend. If iteration/nesting is not allowed, then multiple #ALIASES statements *should* be allowed, imho, for readability.
All whitespace is equivalent to a single space. So "Boo [Foo]
[Moo] Woo" matches "Boo Woo", rather than "Boo<space><space><space>Woo" for instance.
Generally speaking I would like to see titles that differ only up to compression of whitespace to be considered identical. If this were the case, the searchable forms of all titles would be whitespace-normalized, and this point would be resolved automatically. Until then, I suggest that this aspect of it be brushed under the carpet for aliases as for anything.
I think that's what I was trying to say. :)
- Search term matches one real page, some aliases: takes you to real page.
(Arguably gives you a "did you mean...?" banner, but not critical)
- Search term matches one alias, no real page: takes you to page.
- Search term matches several aliases, no real
page: either an automatically generated disambiguation page, or shows
you
search results with the matching aliases shown first.
I see. Possibly this is better than having the aliases be unique, yes.
Yeah. Ultimately, it's helpful for the reader if they *can* search for "J Smith". Obviously they don't expect it to be unique, but if that's all they have to go on, it's better than nothing.
It can create exponential database rows in the length of the alias
string, yes, so that needs to be dealt with -- if we're doing explicit storage, anyway. I think 20 is probably too low.
The right number is probably easy to come up with if someone can decide how big the table can be. I just don't have a feel for whether 1 million, 10 million, 100 million rows is "too many".
- The role of redirects once this system is in place. One possible
implementation would simply create and destroy redirects as required. In
any
case, they would still be needed for some licensing issues.
Why?
Because when articles get merged, one is turned into a redirect with the history of all the edits that were made. If we kill that redirect, we lose that history, including attribution. Ergo, non-compliance with GFDL.
aliases into account. (Actually, you seem to have caught on to this
point in your last post, written after I wrote that.)
Heh, yeah. I don't do much DB programming these days.
Of course, that wouldn't be quite enough. There would be all sorts of
things expecting particular behavior of redirects, and so this would create a fair amount of backwards incompatibility, and generally confuse things. Ideally I would like to see a proposal that merges redirects and aliases altogether: do we want them to have a corresponding page entry or not? They shouldn't be treated as distinct.
That would be even better, but I wasn't that ambitious. Do you have any ideas? Even better would be something that redefines the concept of disambiguation, which is again, a huge amount of manpower to set up and maintain.
One problem that just occurred to me is what happens when one query matches two aliases *and* a disambiguation page. Every possible outcome looks bad: - Just show the disambiguation page (with two missing entries) - Show a list of aliased pages plus the disambiguation page (what, I have to choose whether I want a real page or a disambiguation page?) - Attempt to jam the alias links somewhere in the disambiguation page (possibly duplicating actual links, or possibly requiring every disambiguation page to be updated with an <aliases> section).
Just like with the category/list dilemma, it doesn't seem possibly to create a fully dynamic disambiguation page that will be "as good as" a hand-edited one. But long term, it would be a very valuable thing if we could come close.
What we're looking for is a way to easily create and maintain
redirects, not some totally new feature, and despite my suggestions above and below, I think that's how the problem should be posed. A special page to easily manage all redirects to a page, including to batch-create and -delete* them, is probably the best way to handle this. Grouping on this redirects page by category would be a good feature to have, for instance, and category management from it as well. But to start with, reversible batch creation and deletion is all that's needed.
Are you thinking in terms of a special GUI, or a wikitext language feature? Say you used the #ALIASES idea, but it constructed actual pages with #REDIRECT text. Those pages could be marked with an "automatically generated" flag, so they would be killed when the corresponding #ALIASES text was modified.
Now, however, you have a different problem with ambiguous redirects: the user adds an #ALIASES tag pointing at the current page, but the redirect already exists and points somewhere else. What happens?
*(Unprivileged users should indeed ideally be allowed to delete
redirects in general if they have no substantial content, as currently they can during moves. However, history and easy reversibility needs to be built into this before it can be deployed, needless to say.)
Steve