On 10/25/07, Simetrical <Simetrical+wikilist(a)gmail.com> wrote:
This would essentially be like regexes, but defined *without* the
operation of iteration: only catenation and union are allowed. This
is a large benefit because it means there are a finite number of
possible patterns, and so they can be stored in enumerated form.
Yes, I'm undecided whether nesting (aka iteration) is a good idea or not.
Quite possibly it's a good idea to force people to explicitly state
all the variations
they intend. If iteration/nesting is not allowed, then multiple #ALIASES
statements *should* be allowed, imho, for readability.
All whitespace is equivalent to a single space. So
"Boo [Foo]
[Moo] Woo" matches "Boo Woo",
rather than
"Boo<space><space><space>Woo" for instance.
Generally speaking I would like to see titles that differ only up to
compression of whitespace to be considered identical. If this were
the case, the searchable forms of all titles would be
whitespace-normalized, and this point would be resolved automatically.
Until then, I suggest that this aspect of it be brushed under the
carpet for aliases as for anything.
I think that's what I was trying to say. :)
- Search term matches one real page, some aliases:
takes you to real page.
(Arguably gives you a "did you
mean...?" banner, but not critical)
- Search term matches one alias, no real page: takes you to page.
- Search term matches several aliases, no real
page: either an automatically generated disambiguation page, or shows
you
search results with the matching aliases shown
first.
I see. Possibly this is better than having the aliases be unique, yes.
Yeah. Ultimately,
it's helpful for the reader if they *can* search for "J Smith".
Obviously they don't expect it to be unique,
but if that's all they have to go on, it's better than nothing.
It can create exponential database rows in the length of the alias
string, yes, so that needs to be dealt with -- if
we're doing explicit
storage, anyway. I think 20 is probably too low.
The right number is probably easy to come up with if someone can
decide how big the table can be. I just don't have
a feel for whether 1 million, 10 million, 100 million rows is "too many".
* The role of redirects once this system is in place.
One possible
implementation would simply create and destroy
redirects as required. In
any
case, they would still be needed for some
licensing issues.
Why?
Because when articles get merged, one is turned into a redirect with
the history of all the edits that were made. If we kill
that redirect, we lose that history, including attribution. Ergo,
non-compliance with GFDL.
aliases into account. (Actually, you seem to have caught on to this
point in your last post, written after I wrote that.)
Heh, yeah. I don't do much DB programming these days.
Of course, that wouldn't be quite enough. There would be all sorts of
things expecting particular behavior of redirects, and
so this would
create a fair amount of backwards incompatibility, and generally
confuse things. Ideally I would like to see a proposal that merges
redirects and aliases altogether: do we want them to have a
corresponding page entry or not? They shouldn't be treated as
distinct.
That would be even better, but I wasn't that ambitious. Do you have
any ideas? Even better would be something that
redefines the concept of disambiguation, which is again, a huge amount
of manpower to set up and maintain.
One problem that just occurred to me is what happens when one query
matches two aliases *and* a disambiguation page.
Every possible outcome looks bad:
- Just show the disambiguation page (with two missing entries)
- Show a list of aliased pages plus the disambiguation page (what, I have to
choose whether I want a real page or a disambiguation page?)
- Attempt to jam the alias links somewhere in the disambiguation page
(possibly duplicating actual links, or possibly requiring every
disambiguation page to be updated with an <aliases> section).
Just like with the category/list dilemma, it doesn't seem possibly to create
a fully dynamic disambiguation page that will be "as good as" a hand-edited
one. But long term, it would be a very valuable thing if we could come
close.
What we're looking for is a way to easily create and maintain
redirects, not some totally new feature, and despite
my suggestions
above and below, I think that's how the problem should be posed. A
special page to easily manage all redirects to a page, including to
batch-create and -delete* them, is probably the best way to handle
this. Grouping on this redirects page by category would be a good
feature to have, for instance, and category management from it as
well. But to start with, reversible batch creation and deletion is
all that's needed.
Are you thinking in terms of a special GUI, or a wikitext language
feature? Say you used the #ALIASES idea, but it
constructed actual pages with #REDIRECT text. Those pages could be marked
with an "automatically generated" flag, so they would be killed when the
corresponding #ALIASES text was modified.
Now, however, you have a different problem with ambiguous redirects: the
user adds an #ALIASES tag pointing at the current page, but the redirect
already exists and points somewhere else. What happens?
*(Unprivileged users should indeed ideally be allowed to delete
redirects in general if they have no substantial
content, as currently
they can during moves. However, history and easy reversibility needs
to be built into this before it can be deployed, needless to say.)
Steve