I have an idea for an improvement to the system of redirects, by using pattern-based aliases. We've discussed it a bit on wikien-l where it has some support, so I'm posting here to find out:
a) If it's feasible (ie, is not computationally too expensive) b) How much work is required to implement it c) If it was implemented, whether it would be enabled at Wikipedia d) If anyone is interested in actually implementing it. If not, I may have a go myself.
The problem: Many pages require a largeish number of redirects, to cope with differences in spelling, optional words, accented characters etc. It's a surprising amount of work to create and maintain these, when the value of each individual redirect is so low. For example, [[Thomas-François Dalibard]] might be spelt four ways, each requiring a redirect: Thomas-Francois Dalibart, Thomas François Dalibard, Thomas Francois Dalibard.
General solution: Instead of having redirects that point to a page, have the page itself specify aliases which can be used to find it. This is specified as a pattern, like a very cut-down regexp: #ALIASES Thomas[-]Fran[ç|c]ois Dalibard
The proposed syntax would be as follows (but is debatable):
Foo - matches Foo [Foo] - matches Foo or blank. [Foo|Moo] - matches Foo or Moo. [Foo|Moo|] or [|Foo|Moo] - matches Foo or Moo or blank. Foo[Moo - matches the literal string Foo[Moo
All whitespace is equivalent to a single space. So "Boo [Foo] [Moo] Woo" matches "Boo Woo", rather than "Boo<space><space><space>Woo" for instance.
When a user searches for a term (using "Go"), MediaWiki would perform a normal query first, and if that fails, do an alias-based search. Thus:
- Search term matches no real pages, no aliases: takes you to some search results. - Search term matches one real page, no aliases: takes you to real page. - Search term matches one real page, some aliases: takes you to real page. (Arguably gives you a "did you mean...?" banner, but not critical) - Search term matches one alias, no real page: takes you to page. - Search term matches several aliases, no real page: either an automatically generated disambiguation page, or shows you search results with the matching aliases shown first.
An automatically generated disambiguation page could make use of some other hypothetical keyword like {{disambig|A 19th century novelist best known for ...}}. So embedding in search results might be simpler, and would work well if it could be forced to show the first sentence or two from the article.
Unresolved issues: * Since pattern matching is prone to abuse, the total number of matching aliases should be restricted in some way, perhaps to 10 or 20. The best way to handle an excessively broad query (eg, [A|b|c|d|e][A|b|c|d|e] etc) is left as an open question. Possibiliies include silently failing, noisily failing (with error message in rendered text), a special page for bad aliases... * Whether there should just be one #ALIASES statement, or whether multiple would be allowed. Allowing several would be much more beginner friendly - they could simply state all the intended redirects explicitly. * The role of redirects once this system is in place. One possible implementation would simply create and destroy redirects as required. In any case, they would still be needed for some licensing issues.
Possible implementation: Without knowing the MediaWiki DB schema at all, I speculated on a possible implementation that would be a good tradeoff between size and speed. Two new tables are needed:
AliasesRaw would contain a constantly updated list of the actual aliases patterns used in articles. Each time an article is saved, this would possibly be updated. AliasesExpanded would contain expansions of these aliases, either fully or partially. So an expansion of #ALIASES [City of ][Greater ]Melbourne[, Victoria| (Australia)] to 5 characters would lead to three rows: "City ","of [Greater ]Melbourne[, Victoria| (Australia)]" "Great", "er Melbourne[, Victoria| (Australia)]" "Melbo", "urne[, Victoria| (Australia)]
This means that if a user searches for "Greater Melbourne", then the search process would go something like: - Look for an article called Greater Melbourne, GREATER MELBOURNE, greater melbourne (as present) - assume this fails. - Look up "Great" in the AliasesExpanded table. Now iterate over the matching results, finding one that matches.
Obviously the number of characters stored in the expanded aliases could be tuned.
I look forward to any comments, Steve
On 10/24/07, Steve Bennett stevagewp@gmail.com wrote:
Possible implementation: Without knowing the MediaWiki DB schema at all, I speculated on a possible implementation that would be a good tradeoff between size and speed. Two new tables are needed:
<snip>
No need for the complex setup you envisiage. For mysql, at least, we could create a new table 'article_aliases', and "select aa_page from article_aliases where 'my_title' like aa_alias". Of course, we'd need to do some built-in, potentially expensive checking on the aliases that would be originally introduced, like checking if any other pages match the regex (if so, block the alias), and if the article title itself matches the regex (if not, block the alias).
I have no idea how portable this would be to postgres and other database engines, but it could potentially work as an extension.
Hoi, I am afraid that as the number of articles grows, the existence of redirects becomes increasingly problematic because more and more disambiguation will be needed. Existing redirects are not considered when disambiguation is implemented. Redirects ARE problematic and by automagically creating a vast number of more redirects it becomes even more of a nightmare. Thanks, GerardM
On 10/24/07, Andrew Garrett andrew@epstone.net wrote:
On 10/24/07, Steve Bennett stevagewp@gmail.com wrote:
Possible implementation: Without knowing the MediaWiki DB schema at all, I speculated on a
possible
implementation that would be a good tradeoff between size and speed. Two
new
tables are needed:
<snip>
No need for the complex setup you envisiage. For mysql, at least, we could create a new table 'article_aliases', and "select aa_page from article_aliases where 'my_title' like aa_alias". Of course, we'd need to do some built-in, potentially expensive checking on the aliases that would be originally introduced, like checking if any other pages match the regex (if so, block the alias), and if the article title itself matches the regex (if not, block the alias).
I have no idea how portable this would be to postgres and other database engines, but it could potentially work as an extension.
-- Andrew Garrett
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
[replies to several messages here] On 10/24/07, GerardM gerard.meijssen@gmail.com wrote:
I am afraid that as the number of articles grows, the existence of redirects becomes increasingly problematic because more and more disambiguation will be needed. Existing redirects are not considered when disambiguation is implemented. Redirects ARE problematic and by automagically creating a vast number of more redirects it becomes even more of a nightmare.
I assume you're talking about the possibility of implementing aliases as genuine redirects: yes, that would cause problems. However, if they were "automagically created", presumably they can be "automagically destroyed". I haven't really thought through this idea much - it scares me.
If not, there is no problem, and the existence of aliases will in fact tend to reduce the number of aliases. A given article with 5 redirects will probably be replaced by just the article and 5 aliases, which must be much less expensive to store - a maximum of 6 table entries in my scheme, with no article text to consider.
Brianna wrote:
For anyone who is considering implementing something like this, please give some thought to how it could work in a multilingual context (defining the equivalent name in other languages)
I don't quite understand - are you talking about interwiki links? Or do you mean, non-latin character sets? Could you be more specific, perhaps with a example problem?
and also for categories (that might be pushing it...).
Do you mean, aliases to categories? Would be nice, but even redirects to categories don't work properly yet. One problem at a time, I think :)
Andrew wrote:
No need for the complex setup you envisiage. For mysql, at least, we could create a new table 'article_aliases', and "select aa_page from article_aliases where 'my_title' like aa_alias". Of course, we'd need to do some built-in, potentially expensive checking on the aliases that would be originally introduced, like checking if any other pages match the regex (if so, block the alias), and if the article title itself matches the regex (if not, block the alias).
Mysql supports regexp-based matches? If so, cool. I only know SQL server which, last time I checked, only supports wildcards, which wouldn't be strong enough. The main reason for my complex scheme is that the two endpoints seem expensive:
Zero-expansion endpoint: every incoming query that doesn't match any real article titles has to be compared against a very large number of aliases - expensive on query time. Complete-expansion endpoint: every alias pattern has to be fully expanded into all the possible matching queries. Say there were 3 million pages (en wiki including non-articles?) with an average of 5 aliases (presumably there will be more aliases than currently there are redirects, because they're so easy to make), that's 15 million entries in a table. That seems expensive, but perhaps not?
Then again, the table would have to be somewhere between 3 million and 15 million entries in that case anyway, so....
My lack of experience with large databases on serious hardware forces me to shut up now and revert to my original appeal for someone more knowledgeable to handle the feasibility side of things :)
Steve
On 24/10/2007, Steve Bennett stevagewp@gmail.com wrote:
Brianna wrote:
For anyone who is considering implementing something like this, please give some thought to how it could work in a multilingual context (defining the equivalent name in other languages)
I don't quite understand - are you talking about interwiki links? Or do you mean, non-latin character sets? Could you be more specific, perhaps with a example problem?
No not interwiki links. Consider Commons galleries. I think so far you have considered it in relation to aliases within a single language only, right?
and also for categories (that might be pushing it...).
Do you mean, aliases to categories? Would be nice, but even redirects to categories don't work properly yet. One problem at a time, I think :)
Yes, well... :) perhaps: consider it for category namespaces as well. or consider it for all (content-ish) namespaces.
cheers Brianna
On 10/24/07, Brianna Laugher brianna.laugher@gmail.com wrote:
No not interwiki links. Consider Commons galleries. I think so far you have considered it in relation to aliases within a single language only, right?
Oh, I see. Yes I had, but it should work equally across languages, particularly if nested patterns are supported. The same goes for names of foreign things on Wikipedia, like Turin/Torino, and even concepts like Mulled wine/Vin chaud/Glühwein. This latter one could be implemented either as:
#ALIASES [Mulled wine|Vin chaud|Glühwein|Glu[e]hwein]
or #ALIASES Mulled wine #ALIASES Vin chaud #ALIASES [Glühwein|Glu[e]hwein]
Convention will probably dictate which is preferable.
IMHO with the number of second-language speakers of English, it would make sense to have even more foreign language aliases to English words, but that's a separate issue.
Is there a benefit, perhaps, in explicitly marking which language the redirect applies to? Not sure. In an ideal world, you could have a situation where a Commons user with an English GUI would have "Worms" auto-disambiguated to the animal, whereas a German user would be taken to the town...but that's probably a bit further down the track.
Steve
On Wed, Oct 24, 2007 at 05:06:31PM +1000, Steve Bennett wrote:
[replies to several messages here] On 10/24/07, GerardM gerard.meijssen@gmail.com wrote:
I am afraid that as the number of articles grows, the existence of redirects becomes increasingly problematic because more and more disambiguation will be needed. Existing redirects are not considered when disambiguation is implemented. Redirects ARE problematic and by automagically creating a vast number of more redirects it becomes even more of a nightmare.
I assume you're talking about the possibility of implementing aliases as genuine redirects: yes, that would cause problems. However, if they were "automagically created", presumably they can be "automagically destroyed". I haven't really thought through this idea much - it scares me.
If not, there is no problem, and the existence of aliases will in fact tend to reduce the number of aliases. A given article with 5 redirects will probably be replaced by just the article and 5 aliases, which must be much less expensive to store - a maximum of 6 table entries in my scheme, with no article text to consider.
I'd like to recommend that anyone who thinks aliases are a Pretty Neat Idea go locate some coverage of the Come From statement in the programming language Intercal, and muse upon the potential similarities.
Cheers, -- jra
On 10/25/07, Jay R. Ashworth jra@baylink.com wrote:
I'd like to recommend that anyone who thinks aliases are a Pretty Neat Idea go locate some coverage of the Come From statement in the programming language Intercal, and muse upon the potential similarities.
Heh, the thought occurred to me too. However, in Intercal, the chaos
arises from a program in execution unexpectedly and suddenly leaping from any point in the code to an arbitrary and unconnected function. In this proposal, the leaping always takes place at exactly the same point: when a user has entered a query, and that query has not matched the name of an actual page in the database.
Then again, I hadn't thought through the behaviour of what happens when you [[link]] to an aliased term. Logically the behaviour ought to be: - If a real page exists, link to that - Otherwise, if a single alias matches, link to that. - Otherwise, link to an automatic disambiguation page.
This actually presents a few complexities, as links themselves are stored in a links table, and would have to be updated if the aliases change. It's also not clear whether the third case above should be a red or blue link. Similarly, if a user links to [[John X Smith]], but the actual page is [[John Xavier Smith]] with an alias, what should happen exactly?
Some other issues that also occur to me: - does template transclusion work on an alias? If not, why not? - does it work in other namespaces? Which namespace does the alias apply in? Does this mean that [[Wikipedia:What Wikipedia is not]] cannot have a line like #ALIASES WP:NOT ? If aliases are applied in the main namespace, how do we stop people using them in user pages etc?
Suggestions welcome.
Steve
On Thu, Oct 25, 2007 at 11:00:39AM +1000, Steve Bennett wrote:
On 10/25/07, Jay R. Ashworth jra@baylink.com wrote:
I'd like to recommend that anyone who thinks aliases are a Pretty Neat Idea go locate some coverage of the Come From statement in the programming language Intercal, and muse upon the potential similarities.
Heh, the thought occurred to me too. However, in Intercal, the chaos arises from a program in execution unexpectedly and suddenly leaping from any point in the code to an arbitrary and unconnected function. In this proposal, the leaping always takes place at exactly the same point: when a user has entered a query, and that query has not matched the name of an actual page in the database.
Sure, but my point is more that "It's Magic!".
Magic violates the Priciple of Least Astonishment.
Then again, I hadn't thought through the behaviour of what happens when you [[link]] to an aliased term. Logically the behaviour ought to be:
- If a real page exists, link to that
- Otherwise, if a single alias matches, link to that.
- Otherwise, link to an automatic disambiguation page.
Will those be extensible, as category pages are? Based on the disambigs *I've* seen, assuming you can *do* that automatically may not be all that safe.
This actually presents a few complexities, as links themselves are stored in a links table, and would have to be updated if the aliases change. It's also not clear whether the third case above should be a red or blue link.
How is that handled with :Category:?
Similarly, if a user links to [[John X Smith]], but the actual page is [[John Xavier Smith]] with an alias, what should happen exactly?
Well, that's the same question as "What happens with redirects now", which I'm always on the wrong side of; right?
Some other issues that also occur to me:
- does template transclusion work on an alias? If not, why not?
Would it work on a redirect? If so, why shouldn't it work on an alias?
- does it work in other namespaces? Which namespace does the alias apply
in? Does this mean that [[Wikipedia:What Wikipedia is not]] cannot have a line like #ALIASES WP:NOT ? If aliases are applied in the main namespace, how do we stop people using them in user pages etc?
Yeah; there are *lots* of potential pitfalls, aren't there?
Are they design? Or merely implementation? Given that they don't seem to be problems for redirects, I suspect they're implementation.
Is there a way to get the good parts of this idea while sticking with redirects as the actual implementation?
Cheers, -- jra
On 10/25/07, Jay R. Ashworth jra@baylink.com wrote:
Sure, but my point is more that "It's Magic!".
Magic violates the Priciple of Least Astonishment.
What I've been proposing honestly doesn't strike me as particularly magical or astonishing. But creating bona-fide redirects now looks like it has other advantages.
Then again, I hadn't thought through the behaviour of what happens when you
[[link]] to an aliased term. Logically the behaviour ought to be:
- If a real page exists, link to that
- Otherwise, if a single alias matches, link to that.
- Otherwise, link to an automatic disambiguation page.
Will those be extensible, as category pages are? Based on the disambigs *I've* seen, assuming you can *do* that automatically may not be all that safe.
In what way are category pages extensible? You mean in the brief text at the top? I was envisageing automatic (probably better called dynamic) disambiguation pages as being completely generated on the fly. If you wanted to tweak something, you would replace it by a real disambiguation page. There are problems with this proposal.
This actually presents a few complexities, as links themselves are stored in
a links table, and would have to be updated if the aliases change. It's
also
not clear whether the third case above should be a red or blue link.
How is that handled with :Category:?
I'm not sure what analogy you're making exactly, but an interesting, weird and possibly relevant thing does happen with categories: linking to a category which contains articles, but does not itself exist as a "page" shows as a red, but functional link.
Some other issues that also occur to me:
- does template transclusion work on an alias? If not, why not?
Would it work on a redirect? If so, why shouldn't it work on an alias?
Yeah, there's no problem transcluding {{clr}} which redirects to {{-}}. Why not? Perhaps because the potential for damage (malicious or otherwise) is greater.
Yeah; there are *lots* of potential pitfalls, aren't there?
Are they design? Or merely implementation? Given that they don't seem to be problems for redirects, I suspect they're implementation.
Is there a way to get the good parts of this idea while sticking with redirects as the actual implementation?
I'll put my thinking cap on. There's a bit of a problem in terms of trying
to make whatever feature "fit in" with the existing MediaWiki feature set and general look and feel, behaviour etc. Is it ok to break that by using lots of javascript to list and edit redirects? Is it ok to write to a page other than the one the user is looking at? Is it ok to pop open a new window to facilitate the user editing multiple pages at once? Is it ok to generate code for a disambiguation page and ask the user to review it?
All of these things would be novel.
Steve
On Thu, Oct 25, 2007 at 02:28:07PM +1000, Steve Bennett wrote:
Will those be extensible, as category pages are? Based on the disambigs *I've* seen, assuming you can *do* that automatically may not be all that safe.
In what way are category pages extensible? You mean in the brief text at the top? I was envisageing automatic (probably better called dynamic) disambiguation pages as being completely generated on the fly. If you wanted to tweak something, you would replace it by a real disambiguation page. There are problems with this proposal.
This actually presents a few complexities, as links themselves are stored in
a links table, and would have to be updated if the aliases change. It's
also
not clear whether the third case above should be a red or blue link.
How is that handled with :Category:?
I'm not sure what analogy you're making exactly, but an interesting, weird and possibly relevant thing does happen with categories: linking to a category which contains articles, but does not itself exist as a "page" shows as a red, but functional link.
Yep, that was what I was talking about. It's red unless someone's made it "actually be a page" by putting content on it... even if there are items there which you will see when you click the redlink.
Weirded me out the first time I noticed it.
Some other issues that also occur to me:
- does template transclusion work on an alias? If not, why not?
Would it work on a redirect? If so, why shouldn't it work on an alias?
Yeah, there's no problem transcluding {{clr}} which redirects to {{-}}. Why not? Perhaps because the potential for damage (malicious or otherwise) is greater.
Yeah; there are *lots* of potential pitfalls, aren't there?
Such was my instinct, yes. Takes me a while to back up those instincts, sometimes, though...
Are they design? Or merely implementation? Given that they don't seem to be problems for redirects, I suspect they're implementation.
Is there a way to get the good parts of this idea while sticking with redirects as the actual implementation?
I'll put my thinking cap on. There's a bit of a problem in terms of trying to make whatever feature "fit in" with the existing MediaWiki feature set and general look and feel, behaviour etc. Is it ok to break that by using lots of javascript to list and edit redirects? Is it ok to write to a page other than the one the user is looking at? Is it ok to pop open a new window to facilitate the user editing multiple pages at once? Is it ok to generate code for a disambiguation page and ask the user to review it?
My instinct on this one is "Installed Base". The basic structure of MW is well known on a sufficiently wide scale that fundamental changes to it -- which I feel this is -- merit fairly deep study.
Cheers, -- jra
On 10/25/07, Steve Bennett stevagewp@gmail.com wrote:
Is it ok to break that by using lots of javascript to list and edit redirects?
Not unless there's a good fallback.
Is it ok to write to a page other than the one the user is looking at?
Sure, why not? I'm not sure you'd want to, though.
Is it ok to pop open a new window to facilitate the user editing multiple pages at once?
Absolutely not. Allow the user to open new windows by Shift-click, middle-click, etc. exactly if they choose to.
Is it ok to generate code for a disambiguation page and ask the user to review it?
Yes, but that strikes me as not the best way to go about it. I'm thinking that disambiguation and redirect pages should work more like category pages than content pages.
On 10/26/07, Simetrical Simetrical+wikilist@gmail.com wrote:
On 10/25/07, Steve Bennett stevagewp@gmail.com wrote:
Is it ok to break that by using lots of javascript to list and edit redirects?
Not unless there's a good fallback.
What if the fallback is "edit redirects by hand, as is done currently"? In other words, is it ok to add a new feature that requires javascript, if it is effectively an optional feature?
Is it ok to write to a page
other than the one the user is looking at?
Sure, why not? I'm not sure you'd want to, though.
Well for instance, creating or updating a redirect would require modifying a page to add the #REDIRECT text...
Yes, but that strikes me as not the best way to go about it. I'm thinking that disambiguation and redirect pages should work more like category pages than content pages.
How would this work? Any ideas? I can't see that dynamic disambiguation
pages could ever fully replace manual disambiguation pages, in the same way that categories don't fully supplant lists. Do we want a mechanism whereby both manual and dynamic disambiguation could take place for the same query?
Steve
On 10/26/07, Steve Bennett stevagewp@gmail.com wrote:
What if the fallback is "edit redirects by hand, as is done currently"? In other words, is it ok to add a new feature that requires javascript, if it is effectively an optional feature?
I would say no: currently just about every MW feature that exists with JavaScript exists without it as well, to the extent reasonably possible, and that's good. JavaScript should only be required where it absolutely must me, namely dynamic calculations or adjustments of page elements, and things in that vein.
But it's not my decision. Ask Brion, if you want to know.
Is it ok to write to a page
other than the one the user is looking at?
Sure, why not? I'm not sure you'd want to, though.
Well for instance, creating or updating a redirect would require modifying a page to add the #REDIRECT text...
I suppose so, if that's the way you're going to implement it. It would be the simplest, yes, in certain respects.
How would this work? Any ideas? I can't see that dynamic disambiguation pages could ever fully replace manual disambiguation pages, in the same way that categories don't fully supplant lists. Do we want a mechanism whereby both manual and dynamic disambiguation could take place for the same query?
Well, as with categories, we could allow arbitrary article text in an introduction sort of thing. If, unlike with categories, we also allowed arbitrary text to accompany each disambig item, and possibly custom ordering of some kind, I see no reason at all why manual disambig pages would need to exist.
On Thu, 25 Oct 2007 11:00:39 +1000, Steve Bennett wrote:
Some other issues that also occur to me:
- does template transclusion work on an alias? If not, why not? - does it
work in other namespaces? Which namespace does the alias apply in? Does this mean that [[Wikipedia:What Wikipedia is not]] cannot have a line like #ALIASES WP:NOT ? If aliases are applied in the main namespace, how do we stop people using them in user pages etc?
Suggestions welcome.
Steve
I'd suggest that it not work on transclusion. I don't see much benefit to it.
Actually, if it did work with transclusion, then you'd probably need the aliases list to load the dumps. Otherwise, you couldn't find the templates if you don't have that list to resolve them with; since you can't generate the complete list unless have the templates to parse them from!
I think it would be ideal if this could be made to work on the search/go form, without affecting things like links & transclusion, which would potentially make the data so much harder to work with. -Steve
On Thu, Oct 25, 2007 at 05:46:30PM -0400, Steve Sanbeg wrote:
I'd suggest that it not work on transclusion. I don't see much benefit to it.
Actually, if it did work with transclusion, then you'd probably need the aliases list to load the dumps. Otherwise, you couldn't find the templates if you don't have that list to resolve them with; since you can't generate the complete list unless have the templates to parse them from!
I think it would be ideal if this could be made to work on the search/go form, without affecting things like links & transclusion, which would potentially make the data so much harder to work with.
I can't say quite why, but this response feels to me like it takes this idea even closer to "we don't rewrite the URL on redirects, and there's a really good reason why" (which I don't remember right now, even though Brion or Tim have explained it to me at least twicet. :-)
Cheers, -- jra
On 10/24/07, Steve Bennett stevagewp@gmail.com wrote:
Andrew wrote:
No need for the complex setup you envisiage. For mysql, at least, we could create a new table 'article_aliases', and "select aa_page from article_aliases where 'my_title' like aa_alias". Of course, we'd need to do some built-in, potentially expensive checking on the aliases that would be originally introduced, like checking if any other pages match the regex (if so, block the alias), and if the article title itself matches the regex (if not, block the alias).
On thinking about this some more, a single table should do it, with fields
page_id, alias_pattern, alias_expanded. Then saving a page is conceptually:
DELETE * from aliases where page_id = @page_id
INSERT aliases (page_id, alias_pattern, alias_expanded) SELECT @page_id, pattern, expanded FROM #temp_aliases
And searching is just: SELECT page_id FROM aliases WHERE alias_pattern = @query
I guess the pattern itself might not even have to be stored.
Steve
For anyone who is considering implementing something like this, please give some thought to how it could work in a multilingual context (defining the equivalent name in other languages) and also for categories (that might be pushing it...).
thanks, Brianna
On 24/10/2007, Steve Bennett stevagewp@gmail.com wrote:
On Wed, 24 Oct 2007 15:30:44 +1000, Steve Bennett wrote:
General solution: Instead of having redirects that point to a page, have the page itself specify aliases which can be used to find it. This is specified as a pattern, like a very cut-down regexp: #ALIASES Thomas[-]Fran[ç|c]ois Dalibard
The syntax looks ambiguous with a numbered list. It should have braces, i.e.:
{{#ALIASES:Thomas[-]Fran[ç|c]ois Dalibard}}
XML should also work: <aliases>Thomas[-]Fran[ç|c]ois Dalibard</aliases>
It should be feasible to implement an extension that would store the aliases, maybe by generating a new kind of redirect if the page doesn't exist.
On 10/25/07, Steve Sanbeg ssanbeg@ask.com wrote:
The syntax looks ambiguous with a numbered list. It should have braces, i.e.:
In the same way that #REDIRECT foo is "ambiguous". That's the analogy I was thinking of when I suggested this syntax. I'm not familiar with the reasons for our different types of syntax.: #REDIRECT, {{defaultsort}}, __NOTOC__, <ref>...
{{#ALIASES:Thomas[-]Fran[ç|c]ois Dalibard}}
If that's the best way, then sure. I was slightly concerned that the | would be misinterpreted.
XML should also work:
<aliases>Thomas[-]Fran[ç|c]ois Dalibard</aliases>
I guess. It's a bit wordier.
Steve
On Thu, 25 Oct 2007 11:04:02 +1000, Steve Bennett wrote:
On 10/25/07, Steve Sanbeg ssanbeg@ask.com wrote:
The syntax looks ambiguous with a numbered list. It should have braces, i.e.:
In the same way that #REDIRECT foo is "ambiguous". That's the analogy I was thinking of when I suggested this syntax. I'm not familiar with the reasons for our different types of syntax.: #REDIRECT, {{defaultsort}}, __NOTOC__, <ref>...
#REDIRECT is disambiguated with a flag in the DB, which separates redirects from real articles. That way, if you see #REDIRECT it's unambiguously a list item. This won't work if you want to put alias in some random place in the text.
__NOTOC__ is a magic word in the parser. Things like {{defaultsort}} and <ref> can be implemented as extensions, without hacking the core code, which would certainly make life easier when implementing this.
{{#ALIASES:Thomas[-]Fran[ç|c]ois Dalibard}}
If that's the best way, then sure. I was slightly concerned that the | would be misinterpreted.
True, you'd need to find a way around that; either turn off that splitting, use a different character, or reassemble the string, and live without the missing whitespace.
XML should also work:
<aliases>Thomas[-]Fran[ç|c]ois Dalibard</aliases>
I guess. It's a bit wordier.
For a single alias, yes. For multiple aliases, it would seem clearer to just list them one per line in an <aliases> block than the alternatives with parser syntax. And this would give you a block of unbroken text, so no need to worry about the parser messing it up before you see it.
I'm not sure which would end up being the best, but it seems like either could work. It's definitely better to implement it as an extension than to hack completely new syntax into the parser.
-Steve
Steve
On 10/25/07, Steve Sanbeg ssanbeg@ask.com wrote:
__NOTOC__ is a magic word in the parser. Things like {{defaultsort}} and <ref> can be implemented as extensions, without hacking the core code, which would certainly make life easier when implementing this.
You really think this is plausibly implementable as an extension? Tons of core code will have to be modified to take into account two possible types of redirects. That's aside from the fact that again, I see no reason at all to make two separate systems to handle what amounts to the same problem.
Simetrical skrev:
Tons of core code will have to be modified to take into account two possible types of redirects. That's aside from the fact that again, I see no reason at all to make two separate systems to handle what amounts to the same problem.
I think that the only reasonable would be to have one *technical* implementation (the existing one) of the individual redirects themselves.
That is, the existing redirect links table, including the corresponding page/revision/text-article is good enough.
One of the main points with the Aliases/Synonyms idea is how these redirects are created/maintained (on Edit-Save), and from this perspective (the user's POV) it is not exactly the same problem.
But it really should be avoided to introduce several implementations of the lower level (technical) concept of the redirects themselves. But it probably wouldn't be needed, not even for a very sofisticated Alias/Synonym solution (as seen from the user's perspective).
Regards,
// Rolf Lampa
On Fri, 26 Oct 2007 14:28:30 +0200, Rolf Lampa wrote:
Simetrical skrev:
Tons of core code will have to be modified to take into account two possible types of redirects. That's aside from the fact that again, I see no reason at all to make two separate systems to handle what amounts to the same problem.
I think that the only reasonable would be to have one *technical* implementation (the existing one) of the individual redirects themselves.
That is, the existing redirect links table, including the corresponding page/revision/text-article is good enough.
One of the main points with the Aliases/Synonyms idea is how these redirects are created/maintained (on Edit-Save), and from this perspective (the user's POV) it is not exactly the same problem.
But it really should be avoided to introduce several implementations of the lower level (technical) concept of the redirects themselves. But it probably wouldn't be needed, not even for a very sofisticated Alias/Synonym solution (as seen from the user's perspective).
Regards,
// Rolf Lampa
Yes, that's how I see it as well. It would likely need a few tweaks to the existing redirect system, etc. And an alias extension that works with that.
-Steve
Simetrical skrev:
Tons of core code will have to be modified to take into account two possible types of redirects. That's aside from the fact that again, I see no reason at all to make two separate systems to handle what amounts to the same problem.
Again, I fully agree, from the technical side of things.
At the lowest level of abstraction, that is, in the form of a UML class diagram one could more clearly point out what, really, the Alias/Synonym idea "is", technically speaking.
Given the following symbols for associations:
-------> : Navigable only one way <>------ : "Has" <.>----- : "Owns" (filled diamond) [class] : Class (name)
== #1 - Existing Redirects ==
We could describe the existing Redirect concept, an article with the redirect clause: '#REDIRECT [[RedirectsTo]]' like so:
[ Page ] [ ]------------------ [ ] | [ ]<-1-RedirectsTo--- [ ]
This diagram shows a "single link", that is, a one way relation (we can disregard the internal 'redirect' table in the db for now), of a separate page (the redirect page) pointing at a target page.
It is important to note that, from a users perspective, the association is "navigable", that is, it goes only one way in that you cannot manage the link from inside the *target* page - which is not always very convenient for the users.
== #2 - Owned Aliases/Synonyms ==
If the association had been an Alias/Synonym instead, the first change we need to make in the diagram is that the association would be made "Owned" (managed) from the target page (but technically it would still remain being a "pure" redirect), like so:
[ Page ] [ ]--------------------- [ ] | [ ]<.><-1-RedirectsTo-- [ ]
Of course it should be possible to have several Aliases/Synonyms, and thus the multiplicity of the relation should be changed from one (1) to many "(*) :
== #3 - Multiple Owned Aliases/Synonyms ==
[ Page ] [ ]----------------------- [ ] | [ ]<.><-*-OwnedRedirects-- [ ]
In this way an article can have many Aliases/Synonyms (= Redirects) which are *Owned* (managed) by the target page. That would be an improvement of the existing concept. And there's no problem with keeping also the existing solution (#1) alongside with the Owned list of Aliases/Synonyms. The final diagram, capturing both concepts could look like this:
[ Page ] [ ]------------------ [ ] | [ ]<-1-RedirectsTo--- [ ] [ ]----------------------- [ ] | [ ]<.><-*-OwnedRedirects-- [ ]
Meaning: A redirect can be a freestanding redirect (current solution) with a link "RedirectsTo---1->" to one other page. A redirect can also be owned by the target page, thus serving as a "listItem" in the OwnedRedirects-list.
The redirect itself could carry information about whether it is "owned" or not.
The class-association Redirect (= existing table) could hold the (additional) attribute IsOwned to distinguish which links can be manually edited (the current solution) and which links can be modified only from the page on which they're defined (perhaps a back link from the redirect page would be good).
[ Page ] [ ]-1-rd_from--------- [ Redirect ] [ ] |------[ IsOwned: ] [ ]-1-rd_title-------- [ ] [ ]
When attempts to create multiple similar Redirects would be made the system simply refuses to (auto)create duplicate entries in the Redirect table in order to maintain "one redirect = one target".
This would be a significant difference/improvement for the users, but (probably) a rather small change technically, I think.
Regards,
// Rolf Lampa
On Thu, 25 Oct 2007 20:29:44 -0400, Simetrical wrote:
On 10/25/07, Steve Sanbeg ssanbeg@ask.com wrote:
__NOTOC__ is a magic word in the parser. Things like {{defaultsort}} and <ref> can be implemented as extensions, without hacking the core code, which would certainly make life easier when implementing this.
You really think this is plausibly implementable as an extension? Tons of core code will have to be modified to take into account two possible types of redirects. That's aside from the fact that again, I see no reason at all to make two separate systems to handle what amounts to the same problem.
At this point, I don't see why not. That's not to say that it won't require some kind of modifications to the core, but that I don't see the need to add a new ambiguity to the markup when it could be implemented with a consistent syntax.
Basically, I see the need for:
1) an article save/delete hook to extract and create/remove the auto-redirects
2) a hook to render the alias markup - presumably a noop, which would make our concerns about the | being misinterpreted in a parser function moot.
3) a way to access the aliases; either a hook when getting the titles or doing the search. Maybe also some way disable transclusion or linking to the alias.
I see redirects and aliases as solving distinct problems; a way to rename pages with minimal disruption, and a way to augment search. Sure, there's some overlap, and they'd share a lot of their internals, but that doesn't make them the same.
Most significantly, I don't see the advantage to template aliasing, and the worst case of potentially needing to parse millions of records to find (or determine nonexistence of) a template makes me uneasy. If aliases are called frequently in the data, that would lock the data more tightly into the software.
On 10/24/07, Steve Bennett stevagewp@gmail.com wrote:
a) If it's feasible (ie, is not computationally too expensive)
It looks so. Ultimately I'm not seeing it as much different from current redirects, implementationally.
b) How much work is required to implement it
Probably a reasonable amount.
c) If it was implemented, whether it would be enabled at Wikipedia
I don't see why not.
Instead of having redirects that point to a page, have the page itself specify aliases which can be used to find it. This is specified as a pattern, like a very cut-down regexp: #ALIASES Thomas[-]Fran[ç|c]ois Dalibard
The proposed syntax would be as follows (but is debatable):
Foo - matches Foo [Foo] - matches Foo or blank. [Foo|Moo] - matches Foo or Moo. [Foo|Moo|] or [|Foo|Moo] - matches Foo or Moo or blank. Foo[Moo - matches the literal string Foo[Moo
This would essentially be like regexes, but defined *without* the operation of iteration: only catenation and union are allowed. This is a large benefit because it means there are a finite number of possible patterns, and so they can be stored in enumerated form.
All whitespace is equivalent to a single space. So "Boo [Foo] [Moo] Woo" matches "Boo Woo", rather than "Boo<space><space><space>Woo" for instance.
Generally speaking I would like to see titles that differ only up to compression of whitespace to be considered identical. If this were the case, the searchable forms of all titles would be whitespace-normalized, and this point would be resolved automatically. Until then, I suggest that this aspect of it be brushed under the carpet for aliases as for anything.
- Search term matches one real page, some aliases: takes you to real page.
(Arguably gives you a "did you mean...?" banner, but not critical)
- Search term matches one alias, no real page: takes you to page.
- Search term matches several aliases, no real
page: either an automatically generated disambiguation page, or shows you search results with the matching aliases shown first.
I see. Possibly this is better than having the aliases be unique, yes.
Unresolved issues:
- Since pattern matching is prone to abuse, the total number of matching
aliases should be restricted in some way, perhaps to 10 or 20. The best way to handle an excessively broad query (eg, [A|b|c|d|e][A|b|c|d|e] etc) is left as an open question. Possibiliies include silently failing, noisily failing (with error message in rendered text), a special page for bad aliases...
It can create exponential database rows in the length of the alias string, yes, so that needs to be dealt with -- if we're doing explicit storage, anyway. I think 20 is probably too low.
- The role of redirects once this system is in place. One possible
implementation would simply create and destroy redirects as required. In any case, they would still be needed for some licensing issues.
Why?
Possible implementation: Without knowing the MediaWiki DB schema at all, I speculated on a possible implementation that would be a good tradeoff between size and speed. Two new tables are needed:
AliasesRaw would contain a constantly updated list of the actual aliases patterns used in articles. Each time an article is saved, this would possibly be updated. AliasesExpanded would contain expansions of these aliases, either fully or partially. So an expansion of #ALIASES [City of ][Greater ]Melbourne[, Victoria| (Australia)] to 5 characters would lead to three rows: "City ","of [Greater ]Melbourne[, Victoria| (Australia)]" "Great", "er Melbourne[, Victoria| (Australia)]" "Melbo", "urne[, Victoria| (Australia)]
This means that if a user searches for "Greater Melbourne", then the search process would go something like:
- Look for an article called Greater Melbourne, GREATER MELBOURNE, greater
melbourne (as present) - assume this fails.
- Look up "Great" in the AliasesExpanded table. Now iterate over the
matching results, finding one that matches.
Obviously the number of characters stored in the expanded aliases could be tuned.
I don't understand this. Why don't you simply create an alias table that mirrors the redirect table, like
alias_to alias_namespace alias_title
and every time a set of aliases is created for an article, just add or the appropriate rows to that table? Then some special-case logic would be added to appropriate classes and methods to deal with aliases, and in particular, any method of the form "create an object corresponding to the named article, following redirects" would take aliases into account. (Actually, you seem to have caught on to this point in your last post, written after I wrote that.)
Of course, that wouldn't be quite enough. There would be all sorts of things expecting particular behavior of redirects, and so this would create a fair amount of backwards incompatibility, and generally confuse things. Ideally I would like to see a proposal that merges redirects and aliases altogether: do we want them to have a corresponding page entry or not? They shouldn't be treated as distinct.
What we're looking for is a way to easily create and maintain redirects, not some totally new feature, and despite my suggestions above and below, I think that's how the problem should be posed. A special page to easily manage all redirects to a page, including to batch-create and -delete* them, is probably the best way to handle this. Grouping on this redirects page by category would be a good feature to have, for instance, and category management from it as well. But to start with, reversible batch creation and deletion is all that's needed.
*(Unprivileged users should indeed ideally be allowed to delete redirects in general if they have no substantial content, as currently they can during moves. However, history and easy reversibility needs to be built into this before it can be deployed, needless to say.)
On 10/24/07, Andrew Garrett andrew@epstone.net wrote:
No need for the complex setup you envisiage. For mysql, at least, we could create a new table 'article_aliases', and "select aa_page from article_aliases where 'my_title' like aa_alias". Of course, we'd need to do some built-in, potentially expensive checking on the aliases that would be originally introduced, like checking if any other pages match the regex (if so, block the alias), and if the article title itself matches the regex (if not, block the alias).
And you'd have to scan the table every time you want to check if an alias exists for a given string. Probably not a great idea.
On 10/25/07, Simetrical Simetrical+wikilist@gmail.com wrote:
This would essentially be like regexes, but defined *without* the operation of iteration: only catenation and union are allowed. This is a large benefit because it means there are a finite number of possible patterns, and so they can be stored in enumerated form.
Yes, I'm undecided whether nesting (aka iteration) is a good idea or not. Quite possibly it's a good idea to force people to explicitly state all the variations they intend. If iteration/nesting is not allowed, then multiple #ALIASES statements *should* be allowed, imho, for readability.
All whitespace is equivalent to a single space. So "Boo [Foo]
[Moo] Woo" matches "Boo Woo", rather than "Boo<space><space><space>Woo" for instance.
Generally speaking I would like to see titles that differ only up to compression of whitespace to be considered identical. If this were the case, the searchable forms of all titles would be whitespace-normalized, and this point would be resolved automatically. Until then, I suggest that this aspect of it be brushed under the carpet for aliases as for anything.
I think that's what I was trying to say. :)
- Search term matches one real page, some aliases: takes you to real page.
(Arguably gives you a "did you mean...?" banner, but not critical)
- Search term matches one alias, no real page: takes you to page.
- Search term matches several aliases, no real
page: either an automatically generated disambiguation page, or shows
you
search results with the matching aliases shown first.
I see. Possibly this is better than having the aliases be unique, yes.
Yeah. Ultimately, it's helpful for the reader if they *can* search for "J Smith". Obviously they don't expect it to be unique, but if that's all they have to go on, it's better than nothing.
It can create exponential database rows in the length of the alias
string, yes, so that needs to be dealt with -- if we're doing explicit storage, anyway. I think 20 is probably too low.
The right number is probably easy to come up with if someone can decide how big the table can be. I just don't have a feel for whether 1 million, 10 million, 100 million rows is "too many".
- The role of redirects once this system is in place. One possible
implementation would simply create and destroy redirects as required. In
any
case, they would still be needed for some licensing issues.
Why?
Because when articles get merged, one is turned into a redirect with the history of all the edits that were made. If we kill that redirect, we lose that history, including attribution. Ergo, non-compliance with GFDL.
aliases into account. (Actually, you seem to have caught on to this
point in your last post, written after I wrote that.)
Heh, yeah. I don't do much DB programming these days.
Of course, that wouldn't be quite enough. There would be all sorts of
things expecting particular behavior of redirects, and so this would create a fair amount of backwards incompatibility, and generally confuse things. Ideally I would like to see a proposal that merges redirects and aliases altogether: do we want them to have a corresponding page entry or not? They shouldn't be treated as distinct.
That would be even better, but I wasn't that ambitious. Do you have any ideas? Even better would be something that redefines the concept of disambiguation, which is again, a huge amount of manpower to set up and maintain.
One problem that just occurred to me is what happens when one query matches two aliases *and* a disambiguation page. Every possible outcome looks bad: - Just show the disambiguation page (with two missing entries) - Show a list of aliased pages plus the disambiguation page (what, I have to choose whether I want a real page or a disambiguation page?) - Attempt to jam the alias links somewhere in the disambiguation page (possibly duplicating actual links, or possibly requiring every disambiguation page to be updated with an <aliases> section).
Just like with the category/list dilemma, it doesn't seem possibly to create a fully dynamic disambiguation page that will be "as good as" a hand-edited one. But long term, it would be a very valuable thing if we could come close.
What we're looking for is a way to easily create and maintain
redirects, not some totally new feature, and despite my suggestions above and below, I think that's how the problem should be posed. A special page to easily manage all redirects to a page, including to batch-create and -delete* them, is probably the best way to handle this. Grouping on this redirects page by category would be a good feature to have, for instance, and category management from it as well. But to start with, reversible batch creation and deletion is all that's needed.
Are you thinking in terms of a special GUI, or a wikitext language feature? Say you used the #ALIASES idea, but it constructed actual pages with #REDIRECT text. Those pages could be marked with an "automatically generated" flag, so they would be killed when the corresponding #ALIASES text was modified.
Now, however, you have a different problem with ambiguous redirects: the user adds an #ALIASES tag pointing at the current page, but the redirect already exists and points somewhere else. What happens?
*(Unprivileged users should indeed ideally be allowed to delete
redirects in general if they have no substantial content, as currently they can during moves. However, history and easy reversibility needs to be built into this before it can be deployed, needless to say.)
Steve
On 10/24/07, Steve Bennett stevagewp@gmail.com wrote:
Yes, I'm undecided whether nesting (aka iteration) is a good idea or not. Quite possibly it's a good idea to force people to explicitly state all the variations they intend. If iteration/nesting is not allowed, then multiple #ALIASES statements *should* be allowed, imho, for readability.
I was assuming nesting would be allowed (although it might quickly run into alias number limits, of course). Iteration is the term my formal languages book uses for the * operator, indicating "repeat the preceding item zero or more times". Next time I'll just say "the * operator" instead of trying to be fancy. :)
Anyway, we surely don't want the * operator, as I remarked, so we don't need full regular expressions, is the point.
Because when articles get merged, one is turned into a redirect with the history of all the edits that were made. If we kill that redirect, we lose that history, including attribution. Ergo, non-compliance with GFDL.
Of course, the correct way to fix this is to actually implement real merging. For now, why don't people just do a history merge, by deleting one page, moving another on top of it, and undeleting? Is that viewed as too confusing in terms of the history display?
That would be even better, but I wasn't that ambitious. Do you have any ideas? Even better would be something that redefines the concept of disambiguation, which is again, a huge amount of manpower to set up and maintain.
Well, I'm not considering disambiguations for now. They're conceptually related, but I'd view it as a different discussion. One thing at a time.
Are you thinking in terms of a special GUI, or a wikitext language feature?
As I said, a special page.
On 10/25/07, Simetrical Simetrical+wikilist@gmail.com wrote:
I was assuming nesting would be allowed (although it might quickly run into alias number limits, of course). Iteration is the term my formal languages book uses for the * operator, indicating "repeat the preceding item zero or more times". Next time I'll just say "the * operator" instead of trying to be fancy. :)
Heh, ok.
Of course, the correct way to fix this is to actually implement real
merging. For now, why don't people just do a history merge, by deleting one page, moving another on top of it, and undeleting? Is that viewed as too confusing in terms of the history display?
That requires admin rights, which most people don't have - whereas replacing a page with a redirect can be done by anyone. It's also presumably virtually impossible to undo.
As I said, a special page.
Perhaps another tab, "aliases"? We should get away from calling them "redirects" in any case.
Steve
_______________________________________________
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 10/24/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Of course, the correct way to fix this is to actually implement real merging. For now, why don't people just do a history merge, by deleting one page, moving another on top of it, and undeleting? Is that viewed as too confusing in terms of the history display?
Yes, especially if the topic of the article being "merged" is different than that of the article it's being merged to, and especially if anybody wants to "unmerge" it at a later date (in the face of new information about the narrower topic).
—C.W.
Simetrical & Steve Bennett quotes intermixed:
Foo[Moo - matches the literal string Foo[Moo
You don't need to. [ will never appear on a title, as [ is reserved for [[links]]. It also skips the need to escape the \ (see the long thread about an escape char).
I'd prefer the syntax to be [Foo] matches Foo or nothing (also match blank?) {Foo|Bar} matches Foo or Bar. [Foo|Bar] matches Foo, Bar or nothing (or blank?)
Which is an already existing syntax on program parameters. Making [] also match blank could be an acceptable extension.
a) If it's feasible (ie, is not computationally too expensive)
It looks so. Ultimately I'm not seeing it as much different from current redirects, implementationally.
b) How much work is required to implement it
Probably a reasonable amount.
c) If it was implemented, whether it would be enabled at Wikipedia
I don't see why not.
Instead of having redirects that point to a page, have the page itself specify aliases which can be used to find it. This is specified as a pattern, like a very cut-down regexp: #ALIASES Thomas[-]Fran[ç|c]ois Dalibard
The proposed syntax would be as follows (but is debatable):
Foo - matches Foo [Foo] - matches Foo or blank. [Foo|Moo] - matches Foo or Moo. [Foo|Moo|] or [|Foo|Moo] - matches Foo or Moo or blank.
This would essentially be like regexes, but defined *without* the operation of iteration: only catenation and union are allowed. This is a large benefit because it means there are a finite number of possible patterns, and so they can be stored in enumerated form.
All whitespace is equivalent to a single space. So "Boo [Foo] [Moo] Woo" matches "Boo Woo", rather than "Boo<space><space><space>Woo" for instance.
Generally speaking I would like to see titles that differ only up to compression of whitespace to be considered identical. If this were the case, the searchable forms of all titles would be whitespace-normalized, and this point would be resolved automatically. Until then, I suggest that this aspect of it be brushed under the carpet for aliases as for anything.
Agree. Are multiple spaces really used? If not, there could be a flag to join them.
- Search term matches one real page, some aliases: takes you to real page.
(Arguably gives you a "did you mean...?" banner, but not critical)
- Search term matches one alias, no real page: takes you to page.
- Search term matches several aliases, no real
page: either an automatically generated disambiguation page, or shows you search results with the matching aliases shown first.
I see. Possibly this is better than having the aliases be unique, yes.
- Since pattern matching is prone to abuse, the total number of matching
aliases should be restricted in some way, perhaps to 10 or 20. The best way to handle an excessively broad query (eg, [A|b|c|d|e][A|b|c|d|e] etc) is left as an open question. Possibiliies include silently failing, noisily failing (with error message in rendered text), a special page for bad aliases...
It can create exponential database rows in the length of the alias string, yes, so that needs to be dealt with -- if we're doing explicit storage, anyway. I think 20 is probably too low.
If we're going to store them enumerated, it's trivial to stop at $wgMaxNumberOfAliases.
I don't understand this. Why don't you simply create an alias table that mirrors the redirect table, like
alias_to alias_namespace alias_title
What about adding a new field to the redirect table to differenciate between aliases and redirects?
Why not auto delete redirects?
Because when articles get merged, one is turned into a redirect with the history of all the edits that were made. If we kill that redirect, we lose that history, including attribution. Ergo, non-compliance with GFDL.
You could autodelete them if they were autocreated. The creator would be preserved somewhere on the history page. The redirects you talk are "hard" redirects, created placing a #REDIRECT statement on a page.
We could create with this system "soft redirects", which are shown as such on checking the redirect table, with fake history (autoredirect from Foo). If it's edited, the REDIRECT is automatically preloaded and saving makes then hard. Problem: Complicates the process for little benefit.
What we're looking for is a way to easily create and maintain redirects, not some totally new feature, and despite my suggestions above and below, I think that's how the problem should be posed. A special page to easily manage all redirects to a page, including to batch-create and -delete* them, is probably the best way to handle this. Grouping on this redirects page by category would be a good feature to have, for instance, and category management from it as well. But to start with, reversible batch creation and deletion is all that's needed.
A Special page changing the tables seems better than changing the page, which only has the benefit of the integrated history. A special page allows to reject pages, show (readable) lists of alias, select based on language... But splitting from traditional redirects also means more work for integration.
"Platonides" Platonides@gmail.com wrote in message news:ffq2a2$gr$1@ger.gmane.org...
Simetrical & Steve Bennett quotes intermixed:
Foo[Moo - matches the literal string Foo[Moo
You don't need to. [ will never appear on a title, as [ is reserved for [[links]]. It also skips the need to escape the \ (see the long thread about an escape char).
I'd prefer the syntax to be [Foo] matches Foo or nothing (also match blank?) {Foo|Bar} matches Foo or Bar. [Foo|Bar] matches Foo, Bar or nothing (or blank?)
Why not use the following:
[Foo] matches Foo [Foo|] matches Foo or nothing (or blank?) [Foo|Bar] matches Foo or Bar. [Foo|Bar|] matches Foo, Bar or nothing (or blank?)
Then there is only one syntax.
- Mark Clements (HappyDog)
Steve Bennett skrev:
I have an idea for an improvement to the system of redirects, by using pattern-based aliases. <...>
The problem: Many pages require a largeish number of redirects, to cope with differences in spelling, optional words, accented characters etc. It's a surprising amount of work to create and maintain these, when the value of each individual redirect is so low. For example, [[Thomas-François Dalibard]] might be spelt four ways, each requiring a redirect: Thomas-Francois Dalibart, Thomas François Dalibard, Thomas Francois Dalibard.
General solution:
Soundex.
The title variants, or, very often due to differencies in spelling, is an old problem which was solved a long time ago, long before computers came about. The (old) solution was based on the fact that sound comprises differencies in spelling etc, hence "Soundex" :
http://en.wikipedia.org/wiki/Soundex#History
Difference spelling can be automatically translated to and from a Soundex scheme, on the fly. Soundex is "machine readable" as well as writeable, and can be computed by search engines instead of being stored (Redirects would still take care of "conceptual" differencies in titles, which requires human interaction).
Soundex is different and it would not exactly be creating a YARR system (Yet Another Redirect Redirect).
Regards,
// Rolf Lampa
Rolf Lampa wrote:
Soundex.
The title variants, or, very often due to differencies in spelling, is an old problem which was solved a long time ago, long before computers came about. The (old) solution was based on the fact that sound comprises differencies in spelling etc, hence "Soundex" :
http://en.wikipedia.org/wiki/Soundex#History
Difference spelling can be automatically translated to and from a Soundex scheme, on the fly. Soundex is "machine readable" as well as writeable, and can be computed by search engines instead of being stored (Redirects would still take care of "conceptual" differencies in titles, which requires human interaction).
Soundex is different and it would not exactly be creating a YARR system (Yet Another Redirect Redirect).
Soundex is not exact, though; according to the example in the article, both Robert and Rupert reduce to the same Soundex string, but [[Robert Everett]] and [[Rupert Everett]] are two very different people. Also, Soundex makes no provision for handling accented letters vs. their plain equivalents, or a common foreign spelling or name for an article titled with the English word (deviled eggs and oeux mimosa was an example given earlier in the thread).
I agree that perhaps a system of phonetically similar searches needs to be implemented for the searchbox or MediaWiki search in general, but there are more advanced algorithms than Soundex available, and that still doesn't address all the issues covered by the alias proposal.
--en.wp Darkwind
RLS skrev:
Rolf Lampa wrote:
Soundex.
<...>
(Redirects would still take care of "conceptual" differencies in titles, which requires human interaction).
Soundex is different and it would not exactly be creating a YARR system (Yet Another Redirect Redirect).
Soundex is not exact, though;
Exactly, not exact. It would deal with phonetic and redirects with synonyms.
I agree that perhaps a system of phonetically similar searches needs to be implemented for the searchbox or MediaWiki search in general, but there are more advanced algorithms than Soundex available,
Probably, I say soundex just to identify the basic idea.
and that still doesn't address all the issues covered by the alias proposal.
Not all of them no. Redirects would still be very essential.
What doesn't seem very meaningful though is to invent another term (Alias) for essentially the same concept as redirects, especially when the only difference may end up being how, who and when they are created. It is that part I with YARR, yet another redirect system on top of the existing redirect system.
If one want to deal with aliases in essentially a different way than redirects then I'd rather see something like "Synonyms", which also would be placed at the very start of a text, since it could be very useful both for human reading and for search indexing!
If Synonyms are explicitly tagged or marked up (in the article text, as opposed to Redirects, which are defined outside of the article), then they could be regarded even by the HTML-parser, including them (the synonyms) in the keywords etc. Also the internal Indexer could return these among the results for an entirely different search word!
That would be powerful.
Regards,
On 10/26/07, Rolf Lampa rolf.lampa@rilnet.com wrote:
What doesn't seem very meaningful though is to invent another term (Alias) for essentially the same concept as redirects, especially when the only difference may end up being how, who and when they are created. It is that part I with YARR, yet another redirect system on top of the existing redirect system.
I think I have come to agree with this.
If one want to deal with aliases in essentially a different way than
redirects then I'd rather see something like "Synonyms", which also would be placed at the very start of a text, since it could be very useful both for human reading and for search indexing!
Right. Except I call "synonyms" "aliases" :) The point I think is to get away from manually having to set up "Term X is a hard-link to term Y", and getting towards something softer like "If user searches for something like X, Y or Z , the software will help them find A, if that's what they're really looking for".
If Synonyms are explicitly tagged or marked up (in the article text,
as opposed to Redirects, which are defined outside of the article), then they could be regarded even by the HTML-parser, including them (the synonyms) in the keywords etc. Also the internal Indexer could return these among the results for an entirely different search word!
Yeah, that's a nice advantage I hadn't thought of. With a bit of luck we might even distinguish concepts like "This article is about X" and "This article has a section about Y".
Steve
On 10/26/07, Rolf Lampa rolf.lampa@rilnet.com wrote:
Soundex.
The title variants, or, very often due to differencies in spelling, is an old problem which was solved a long time ago, long before computers came about. The (old) solution was based on the fact that sound comprises differencies in spelling etc, hence "Soundex" :
Heh. No. Soundex is awful. There might be something better by now, but not Soundex. Anything but that. In a previous job I briefly flirted with it to perform name matching but it (or the SQL Server implementation at least) is useless - it collapses any name down to 4 consonants, making Steve and Stove identical, for instance.
Anyway a Soundex-like tool might be useful to complement or improve searching, but the situation I'm describing here is when you know exactly what search terms you want to reach, but it's a lot of effort to create all those redirects.
Steve
Steve Bennett skrev:
On 10/26/07, Rolf Lampa rolf.lampa@rilnet.com wrote:
Soundex.
The title variants, or, very often due to differencies in spelling, is an old problem which was solved a long time ago, long before computers came about. The (old) solution was based on the fact that sound comprises differencies in spelling etc, hence "Soundex" :
Heh. No. Soundex is awful. There might be something better by now,
Probably.
but not Soundex. Anything but that. In a previous job I briefly flirted with it to perform name matching but it (or the SQL Server implementation at least) is useless - it collapses any name down to 4 consonants, making Steve and Stove identical, for instance.
Soundex is of course not a replacement for neither Redirects nor Aliases. Apart from that, Soundex, or its derivations, is getting better and better.
Anyway a Soundex-like tool might be useful to complement or improve searching,
Correct. And this is why I think it's a bit unfortunate that the entire WP is saturated with phonetic redirects (which seems to be a big part of the redirects). The phonetic part should have been taken care of "at the root of the tree", that is, in the search mechanism.
but the situation I'm describing here is when you know exactly what search terms you want to reach, but it's a lot of effort to create all those redirects.
Aliases is at risk of only creating another YARR, since an Alias is just that, a Redirect. Moreover, when you that you "know exactly" what terms you would like to be associated with that article then that alias cannot, in principle, be automagically created, instead an alias will always require your explicit definition. Which IS a good idea, but technically that is already supported through the existing redirects.
However, there is a difference, the Aliases would, as opposed to the existing redirects, be defined inside of the article instead of outside, and that opens up interesting perspectives, especially if changing the term to *Synonyms* instead of Aliases. I like the term "Synonyms" better because it implies supporting also human reading with more info (more than aliases does).
Synonyms should (for the same reasons as you have given for Aliases - and redirects) have its own unique markup. That would make it possible for machine reading, which means that the HTML-parser could autogenerate keywords, and other text indexers can prepare for presenting search results also based on these synonyms.
Therefore, in summary, I suggest Soundex (or modern derivations thereof, perhaps as part of the search mechanism - entirely automated though), and the concept of Synonyms to support a wider range of application than Aliases implies (the term "alias" is rather abstract and not very meaningful to most people). With an appropriate implementation* of a Synonyms concept, parsers and both internal and external Indexers could benefit from this info while at the same time it would potentially increase the informational value for human reading as well, especially if displayed** near the top of the article.
At last, Synonyms, and Soundex-like solutions for the search mechanism, are different enough, compared to Redirects, to not make for just YARR, as I pointed out in the previous post.
Regards,
// Rolf Lampa
* Synonyms could still be stored as Redirects, in the same table, perhaps with an extra state field identifying them as "InlineSynonyms".
** Perhaps special rendering for Synonyms, kind of like the Category rendering at the bottom of the pages, but near the top instead.
On 10/26/07, Rolf Lampa rolf.lampa@rilnet.com wrote:
Correct. And this is why I think it's a bit unfortunate that the entire WP is saturated with phonetic redirects (which seems to be a big part of the redirects). The phonetic part should have been taken care of "at the root of the tree", that is, in the search mechanism.
You can't solve everything in search, at the moment, because links require actual destinations. Arguably, it would be better if the linking process went something like:
- Write "...[[Gary Smith]]..." in some wikitext. - Press preview (or perhaps even save) - All ambiguous links (anything that doesn't point to an actual, non-dab page) are highlighted somehow - For each link, choose from amongst a small number of possible real locations ("Did you mean ''Gary Smith (footballer), Scottish footballer'', "Gary Smith (Kittian footballer), footballer from St Kitts and Nevis"...?. Links that point to redirects would be automatically updated but shown for approval. - Save with perfect, non-ambiguous links.
You know, if we assume that wikitext links always point to the actual page, then that does make life easier, because we can use the same searching mechanism at the time when a user searches for a page ("Hey, I'm looking for info on Gary Smith"), as when an editor is linking to an article. That's a massive plus.
Then our problem basically boils down to: how do we implement the best search ever, by using human-edited hints?
However, there is a difference, the Aliases would, as opposed to the existing redirects, be defined inside of the article instead of outside, and that opens up interesting perspectives, especially if
Yes. Redirects are painful partially because they're external. Centralised management is good.
changing the term to *Synonyms* instead of Aliases. I like the term
"Synonyms" better because it implies supporting also human reading with more info (more than aliases does).
It strikes me as a bit too fuzzy, personally, and could lead to people adding a lot of terms that no one would actually be searching for. Nicknames, epithets, insults, etc. But it will do for now, if you like.
Therefore, in summary, I suggest Soundex (or modern derivations thereof, perhaps as part of the search mechanism - entirely automated though), and the concept of Synonyms to support a wider range of application than Aliases implies (the term "alias" is rather abstract and not very meaningful to most people). With an appropriate
You keep bringing up Soundex. I'm not sure how it's useful, other than as a last ditch resort. "Uh, we don't have a page called John Barrnes. We don't have a disambiguation page called John Barrnes. We don't even have any pages with synonyms of John Barrnes. Any chance you meant John Barnes?" Let's just leave Soundex out for the moment.
implementation* of a Synonyms concept, parsers and both internal and
external Indexers could benefit from this info while at the same time it would potentially increase the informational value for human reading as well, especially if displayed** near the top of the article.
Yes, I like the idea of an article showing you explicitly "This article covers the following topics".
Though that itself raises the question of how to handle a topic which is dealt with in several places, such as when you have a summary on one page and a detailed account on another. A smart dynamic disambiguation system could deal with this: search for "US History" and get "United States (summary)", "History of the United States (detailed)" plus links to portals etc.
Steve
Steve Bennett wrote:
On 10/26/07, Rolf Lampa rolf.lampa@rilnet.com wrote:
<snip>
Then our problem basically boils down to: how do we implement the best search ever, by using human-edited hints?
You actually hint at the solution (or at least at the problem domain) when saying "using human-edited links". Humans is the problem, or the potential. Good results can only be achieved if motivating people to do a good job, and the task must be understood.
This is also the actual reason for why I'm a bit hesitant about the term "Alias" (since that abstract word doesn't mean much to most people, it doesn't give them a "guide line" so to speak). We can mean the same thing with a word, but now I'm talking about the pedagogical aspect, motivating and explaining the (untended) concept to editors. In that context Synonyms means something, it means more, it kind of intuitively gives most people at least some ideas about what kind of /relevant/ keywords to add to a Synonym list.
And there you go, best possible human edited links requires that people understand the basic idea, and when it seems meaningful to them they
# do it # eagerly # & properly
:)
Yes. Redirects are painful partially because they're external. Centralised management is good.
Perhaps both the traditional Redirects and the "Inline Aliases/Synonyms" are useful.
<snip>
Therefore, in summary, I suggest Soundex (or modern derivations thereof, perhaps as part of the search mechanism - entirely automated though), and the concept of Synonyms to support a wider range of application than Aliases implies (the term "alias" is rather abstract and not very meaningful to most people). With an appropriate
You keep bringing up Soundex. I'm not sure how it's useful, other than as a last ditch resort.
Well, yes, let's drop Soundex for now. Alias/Synonyms is more interesting and relevant to your original post (except for hinting about how/why the existing Redirect concept is "saturated" with phonetic problems, which could have been/can be, solved with less human effort).
But now for the Alias/Synonyms idea. =)
Regards,
// Rolf Lampa
On Fri, Oct 26, 2007 at 01:29:45PM +0200, Rolf Lampa wrote:
This is also the actual reason for why I'm a bit hesitant about the term "Alias" (since that abstract word doesn't mean much to most people, it doesn't give them a "guide line" so to speak). We can mean the same thing with a word, but now I'm talking about the pedagogical aspect, motivating and explaining the (untended) concept to editors. In that context Synonyms means something, it means more, it kind of intuitively gives most people at least some ideas about what kind of /relevant/ keywords to add to a Synonym list.
Though I don't think the word synonym accurate reflects what's going on here: synonyms are words which mean much the same as other words.
We really are talking about aliases: also known as -- different spellings (usually) for the same words. At least, in the pattern-driven examples I've seen mentioned upthread.
Cheers, -- jra
Jay R. Ashworth skrev:
On Fri, Oct 26, 2007 at 01:29:45PM +0200, Rolf Lampa wrote:
This is also the actual reason for why I'm a bit hesitant about the term "Alias" (since that abstract word doesn't mean much to most people, it doesn't give them a "guide line" so to speak). We can mean the same thing with a word, but now I'm talking about the pedagogical aspect, motivating and explaining the (untended) concept to editors. In that context Synonyms means something, it means more, it kind of intuitively gives most people at least some ideas about what kind of /relevant/ keywords to add to a Synonym list.
Though I don't think the word synonym accurate reflects what's going on here:
No, not exactly the same, hence my attempt to focus on the possible drawbacks with aliases, by trying to narrow in on something positive, something that hopefully wouldn't be as risky as aliases sounds in my ears.
Synonyms is a "narrower" concept than aliases, I really didn't imply that they were the same.
synonyms are words which mean much the same as other words.
Exactly, that's what Synonyms are good for. "Same as" is your word for it, which is also my point here, Synonyms implying that it's not giving the impression of that an article is covering everything, or something that doesn't really comply with the title.
Instead a Synonym (usually) covers just the concept someone is looking for, although whith other but relevant words. In other words, same meaning, same semantics.
We really are talking about aliases: also known as --
"Also known as" is a Synonym (in the context of an article title).
Other uses of a broad alias concept would NOT add value, which is my point.
different spellings (usually) for the same words.
Different spellings should not, in general, be dealt with manually (like it is dealt with now, using Redirects), since it can be automated.
At least, in the pattern-driven examples I've seen mentioned upthread.
Yes, more ideas exist about what kind of information to define as Aliases, but some of those ideas really isn't a good idea at all, in that they'd intend to manually define what's already in the text - namely the text. That part, presenting keywords from the text, should be handled by smart indexers and stemmers. As usual.
The text is already there and it's indexed by more or less smart search mechanisms, and Section headers are also existing information which can be used, or given rank by indexers. In short, automation could provide with just the kind of stuff which people would tend to define as an alias, manually..., if alias is meant to be perceived as a B R O A D concept. Which would be really bad.
Hence my suggestion to go for Synonyms instead. Synonyms is not a broad concept. Which is a good thing. Further, it is graspable, and it would provide just that which machines cannot provide, relevant and non ambigous semantics.
And this is where the whole idea with aliases (if allowing a too broad interpretation) is at risk of becoming what I just said, a redundant YARR, simply because it'd only add extra saturation to the articles to add alias definitions overlapping what's already in:
1. the text (indexed, more or less clever, using stemming etc) 2. section headers 3. categories 4. existing redirects. (5.) keywords (new approach to how to interpret categories?)
It does not add value to put manual effort into widening or diversifying the meaning of a title or topic, or already existing subtitles, with aliases without very strict guidelines.
It does not, for example, add any value to start listing, manually, any existing sub topics of an article as a list of aliases, because you will find such info using any silly search engine, with the main title/keywords having the highest rank, and the rest, the body text, with lower ranks, and the two presented even together on the same search result page (that is, the semantic connection already done). Aso.
Synonyms, on the other hand, adds value, because people usually search for concepts or "problem domains", just that which machines isn't very good at figuring out.
Categories/keywords is for broadening the title/topic, and that already exists. And the Redirects should (preferably) not deal with misspellings, redirects should instead be Synonyms. Fix that first. And then add to it this new idea, to let users add/manage Synonyms/Redirects directly from the target page.
For more detailed & fine grained *manual* specification of article semantics there's other existing solutions like the Semantic Web/Wiki.
Regards,
// Rolf Lampa
"Rolf Lampa" rolf.lampa@rilnet.com wrote in message news:fftcic$ebu$1@ger.gmane.org...
Jay R. Ashworth skrev:
On Fri, Oct 26, 2007 at 01:29:45PM +0200, Rolf Lampa wrote:
We really are talking about aliases: also known as --
"Also known as" is a Synonym (in the context of an article title).
Other uses of a broad alias concept would NOT add value, which is my point.
different spellings (usually) for the same words.
Different spellings should not, in general, be dealt with manually (like it is dealt with now, using Redirects), since it can be automated.
At least, in the pattern-driven examples I've seen mentioned upthread.
Yes, more ideas exist about what kind of information to define as Aliases, but some of those ideas really isn't a good idea at all, in that they'd intend to manually define what's already in the text - namely the text. That part, presenting keywords from the text, should be handled by smart indexers and stemmers. As usual.
Let's be honest here. To users of Wikipedia, the name you choose will not make any difference. If there is a percieved problem and a feature exists that solves that problem, then it will be used to solve that problem - even if it is not what the feature was intended for.
For example, I very much doubt that redirect pages would exist if page transclusion had been invented first.
If our search indexing is good enough to deal with sound-alikes then great, but if not (as is currently the case), then redirects/synonyms/aliases/whatever you call it, will be used to make these redirects manually (as is currently the case, and as will continue to be the case if the new feature is added first, whichever name you choose).
- Mark Clements (HappyDog)
Hoi, "Sounds alike" is a feature that will prove exceedingly problematic. Have an Irishman, a Brit, an Australian, someone from Louisiana and a Canadian pronounce the same words and then determine if the words still sound alike. The notion that the written word in a language like English defines the pronunciation is wrong, at best it gives an approximation.
Also when you program something like this, it will at best give you some success within one language. When you compare across languages the pronunciation of the individual characters and combination changes even more.
Thanks, GerardM
On 10/29/07, Mark Clements gmane@kennel17.co.uk wrote:
"Rolf Lampa" rolf.lampa@rilnet.com wrote in message news:fftcic$ebu$1@ger.gmane.org...
Jay R. Ashworth skrev:
On Fri, Oct 26, 2007 at 01:29:45PM +0200, Rolf Lampa wrote:
We really are talking about aliases: also known as --
"Also known as" is a Synonym (in the context of an article title).
Other uses of a broad alias concept would NOT add value, which is my point.
different spellings (usually) for the same words.
Different spellings should not, in general, be dealt with manually (like it is dealt with now, using Redirects), since it can be automated.
At least, in the pattern-driven examples I've seen mentioned upthread.
Yes, more ideas exist about what kind of information to define as Aliases, but some of those ideas really isn't a good idea at all, in that they'd intend to manually define what's already in the text - namely the text. That part, presenting keywords from the text, should be handled by smart indexers and stemmers. As usual.
Let's be honest here. To users of Wikipedia, the name you choose will not make any difference. If there is a percieved problem and a feature exists that solves that problem, then it will be used to solve that problem - even if it is not what the feature was intended for.
For example, I very much doubt that redirect pages would exist if page transclusion had been invented first.
If our search indexing is good enough to deal with sound-alikes then great, but if not (as is currently the case), then redirects/synonyms/aliases/whatever you call it, will be used to make these redirects manually (as is currently the case, and as will continue to be the case if the new feature is added first, whichever name you choose).
- Mark Clements (HappyDog)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 10/29/07, GerardM gerard.meijssen@gmail.com wrote:
"Sounds alike" is a feature that will prove exceedingly problematic. Have an Irishman, a Brit, an Australian, someone from Louisiana and a Canadian pronounce the same words and then determine if the words still sound alike. The notion that the written word in a language like English defines the pronunciation is wrong, at best it gives an approximation.
I think you're exaggerating. Certainly, there's a difference between "r-ful" and "r-less" speech, and the character of vowels changes, and there are even slight differences in which vowels are distinguished (most Americans pronounce "pa" and "paw" identically, while most Australians pronounce "poor" and "pour" identically), but these aren't major issues. However, sound-matching just isn't the solution here: we're not primarily concerned with helping people find an article if they don't know how to spell it, we're more concerned with getting people to the right article when either they can't spell it on their keyboard, or there are many ways it could be spelt, or even different words corresponding to the same article.
Does sound matching help at all in the "Nice" case? No. Not unless we really think someone is going to type "Neece" when looking for the French city. Does it help in the mulled wine/vin chaud/gluehwein case? No, again, except for exceptional instances like someone desperately typing "van show" or "glue vine". It may have some mild benefit to improving a good search algorithm even further, but it's certainly not the essence of a solution here.
However, I confess to being a bit stuck in my brainstorming here. To summarise the chain of reasoning so far: * I started this thread with a suggestion for a way to augment manual redirects with lightweight pattern-based aliases. * Then we realised that redirects are required to make existing articles work, not just for searching. * Having both redirects and another system would be kludgy and complex. * So I propose attempting to do away with almost all redirects, by making disambiguation happen at save time, and thus only saving real links to real, unambiguous pages.
However, this major paradigm shift will cause a lot of upheaval, development effort etc. What are the benefits? Is it worth it? What problem are we trying to solve exactly?
Steve
On 10/29/07, Steve Bennett stevagewp@gmail.com wrote:
However, I confess to being a bit stuck in my brainstorming here. To summarise the chain of reasoning so far:
- I started this thread with a suggestion for a way to augment manual
redirects with lightweight pattern-based aliases.
- Then we realised that redirects are required to make existing articles
work, not just for searching.
- Having both redirects and another system would be kludgy and complex.
- So I propose attempting to do away with almost all redirects, by making
disambiguation happen at save time, and thus only saving real links to real, unambiguous pages.
However, this major paradigm shift will cause a lot of upheaval, development effort etc. What are the benefits? Is it worth it? What problem are we trying to solve exactly?
Well, that would solve this problem
http://en.wikipedia.org/wiki/Wikipedia:Disambiguation_pages_with_links
Which is very real. I think this is a good solution. We shouldn't think about this in terms of making disambiguation pages and redirects easier. We should get rid of the problem completely if possible.
Steve Bennett wrote:
- Then we realised that redirects are required to make existing articles
work, not just for searching.
- Having both redirects and another system would be kludgy and complex.
Sorry for this long post, but perhaps part of the difficulty so far is that different concepts, from "different levels of abstraction" makes it difficult to clear things up. Short version: Redirects is primarily a *technical* concept, while you suggested a use case (which may, or may not, cope, or interfere, with the aforementioned technical concept). So to speak.
In other words, don't make, in an early stage, any assumptions at all about Redirects' be or not to be, until the use case (from the users perspective) is very well defined, and optionally, till the Redirect concept is very clearly understood, as a very good and generic abstraction useful as a (technical) design concept (in any complex system really).
Its always good with attempts, and attitudes, which aim to get rid of a problem entirely, preferably before it even rises. But I think that the existing Redirects really serves a purpose, and it is a technical concept rather than a "use case" that should be exposed to the users. This may cause some confusion if not kept apart.
Therefore, don't confuse Redirects per se, with any good ideas about use cases solving real problems for the editors.
In reality Redirects handles, under the hood, real problems, as "freestanding" helpful "guides", which will not go away very easily. This is because some of the problems it solves stems from the way we humans function. Example: People tends to not always find the best title for a topic directly. Then, when a better title is found later, someone may already have linked to the first/old title. Then it is convenient to be able to move a page, and automatically leave a (freestanding) Redirect behind as to not break existing links to the old name/title.
This use of Redirects simply is a good solution. It solves a real practical problem, in real use case(s) (both when moving a page, and in continually catching the now "bad links" in the system). It's just a charmingly good solution.
The Redirect concept actually abstracts a very basic concept in any complex system. The point is that basic concepts implies "useful for many things", which is exactly what *designers* look for when they search for or design their tools for solving things. And in such a complex web of information as WP really is, don't remove redirects, because Redirect is a tool, a technical good "meta solution" to more than one problem. But such (technical) design tools are not necessarily best exposed "naked" to the users (which currently is the case with the present Redirect solution).
So what I am saying is that whether, and how, Redirects are exposed to the users, is quite another matter. You usually do NOT expose abstract "meta solutions" to end users of a system.
I think that this thread is, at least in part, about different use cases useful for editors for this and that. Even if the thread is about a use case, the underlaying technical implementation c-a-n be discussed at the same time. But the technical implementation must not be confused with the essence of the Use Case! Well designed systems often have abstract "meta solutions" which solves, or aids in solving, many *different* problems, meaning that multiple Use Cases may have use for the same meta solutions under the hood. The Redirect concept is such an abstract concept which can serve many purposes and Use Cases.
Therefore: Define the use case(s) (of this thread) 1. very well, 2. unambiguously (... :), 3. with a name/names which catches the core idea v.e.r.y clearly (that's the purpose of a name, really. A bad name/broad meaning = sloppy specification, good name = stringent specification). 4. And, do not bother too much about the redirects early in this process.
The only "problem" with redirects may well turn out to be that they are exposed to the users (!), not that they aren't doing a good job under the hood, even for several different purposes.
After having a concrete (Use Case kind of) specification of the need and functionality of the feature, it will also become easier to pick, or develop, the best technical implementation for just that well defined feature.
Regards,
// Rolf Lampa
On 10/30/07, Rolf Lampa rolf.lampa@rilnet.com wrote:
Sorry for this long post, but perhaps part of the difficulty so far is
Not at all. I haven't had time to fully read it and reflect, but will do so in about 4 days.
that different concepts, from "different levels of abstraction" makes
it difficult to clear things up. Short version: Redirects is primarily
Right. The reason is that it's much easier to get consensus (or lack thereof) for a real, concrete change. OTOH it's easy to discuss stuff at an abstract level, but nothing ends up becoming of it.
In other words, don't make, in an early stage, any assumptions at all about Redirects' be or not to be, until the use case (from the users perspective) is very well defined, and optionally, till the Redirect concept is very clearly understood, as a very good and generic abstraction useful as a (technical) design concept (in any complex system really).
I have no idea how to go about having a good discussion to solve difficult problems, or to come up with excellent solutions for only moderately difficult discussions. It seems to me that most of the development effort on MediaWiki has come from individuals who themselves have come up with, and then implemented, solutions. Can an email list really suffice for analysing a problem and coming up with a solution?
Therefore, don't confuse Redirects per se, with any good ideas about use cases solving real problems for the editors.
Sure. Redirects are just there already, it's good to look at them to see if they're solving our problems or can be improved.
In reality Redirects handles, under the hood, real problems, as
"freestanding" helpful "guides", which will not go away very easily.
Yep. But I originally started by proposing a mechanism to enhance their use as "guides". We've called this concept "aliases", "synonyms", "hints"...obviously we need something like that. But redirects are obviously not a perfect solution. And as you point out they're trying to solve more than problem at once.
This is because some of the problems it solves stems from the way we
humans function. Example: People tends to not always find the best title for a topic directly. Then, when a better title is found later, someone may already have linked to the first/old title. Then it is convenient to be able to move a page, and automatically leave a (freestanding) Redirect behind as to not break existing links to the old name/title.
There's no obvious reason why the original text couldn't simply be updated, instead of a redirect being created. This often slowly happens anyway, as people or bots replace redirects with their targets.
This use of Redirects simply is a good solution. It solves a real
practical problem, in real use case(s) (both when moving a page, and in continually catching the now "bad links" in the system). It's just a charmingly good solution.
It's charmingly good in the way a 7 year old who can play violin is charmingly good. I'd still rather listen to a real virtuouso. It was easy to implement, it's conceptually simple...but why stop there?
The Redirect concept actually abstracts a very basic concept in any
complex system. The point is that basic concepts implies "useful for many things", which is exactly what *designers* look for when they search for or design their tools for solving things. And in such a complex web of information as WP really is, don't remove redirects,
I don't think I've ever suggested "removing redirects". I've suggested replacing actual physical implement redirects, but the mechanism should live on, I think.
because Redirect is a tool, a technical good "meta solution" to more
than one problem. But such (technical) design tools are not necessarily best exposed "naked" to the users (which currently is the case with the present Redirect solution).
Yep.
So what I am saying is that whether, and how, Redirects are exposed to
the users, is quite another matter. You usually do NOT expose abstract "meta solutions" to end users of a system.
MediaWiki will always operate this way though. It sort of has to. It's so general purpose it's pretty hard to really hide the implementation details with some sort of abstraction layer. Maybe individual sites could do that, but I don't see how the software could generally do that. Though I would like it if it tried ;)
discussed at the same time. But the technical implementation must not
be confused with the essence of the Use Case! Well designed systems
Yes, you have repeated this point several times. Can you offer some more concrete guidance? Believe it or not, I have studied requirements analysis, and I have some understanding of use cases. I'm skeptical about the chances of performing a genuine use case -> specification -> design -> implementation lifecycle on a product like this though. Especially by email. :) So perhaps if you have some good ideas, you could state them directly.
The only "problem" with redirects may well turn out to be that they are exposed to the users (!),
Don't forget that MediaWiki really doesn't distinguish between the obvious types of users: readers and editors. That's in the nature of the wiki. I don't think it's conceivable to truly "hide" information from the user.
After having a concrete (Use Case kind of) specification of the need and functionality of the feature, it will also become easier to pick, or develop, the best technical implementation for just that well defined feature.
This would be a nice process if we were starting from scratch. But realistically, if we come up with an awesome solution to the general problem, that requires masses of new code, it's just not going to happen. We need to focus on what smallish changes we could implement that would cause great benefit. Especially given how many other people are going to have to need to get involved to make it happen.
Steve
Steve Bennett skrev:
On 10/30/07, Rolf Lampa rolf.lampa@rilnet.com wrote:
Sorry for this long post, but perhaps part of the difficulty so far is
Not at all. I haven't had time to fully read it and reflect, but will do so in about 4 days.
I think I did the mistake to read several posts but answered only one of them...
I found your posts dealing more with the essentials (mainly focusing on the Use Case) and lesser with the implementation under the hood. It was other recent posts which, in my opinion, payed a little too much attention to the Redirects before the actual feature/Use Case was very clearly defined, or for that part, understood.
But perhaps my post gave another impression, that I meant that *you* had the focus at the wrong place, if so sorry for that, I really didnäöt mean it that way.
In other words, I think your line of thought is interesting, and it may well end up in something useful for the users, even without too much work under the hood (which is also I wanted to point out some benefits with the existing generic Redirect concept, especially to those who didn't already realize that).
And yes, if my intention had been to criticize your basic idea, then it definitely would be proper time for me to give better suggestions! =)
Best Regards,
// Rolf Lampa
"GerardM" gerard.meijssen@gmail.com wrote in message news:41a006820710290311o6a7fde6fj94f425e30fd6bec5@mail.gmail.com...
On 10/29/07, Mark Clements gmane@kennel17.co.uk
wrote: "Rolf Lampa" rolf.lampa@rilnet.com wrote in message news:fftcic$ebu$1@ger.gmane.org...
Jay R. Ashworth skrev:
Yes, more ideas exist about what kind of information to define as Aliases, but some of those ideas really isn't a good idea at all, in that they'd intend to manually define what's already in the text - namely the text. That part, presenting keywords from the text, should be handled by smart indexers and stemmers. As usual.
Let's be honest here. To users of Wikipedia, the name you choose will not make any difference. If there is a percieved problem and a feature exists that solves that problem, then it will be used to solve that problem - even if it is not what the feature was intended for.
For example, I very much doubt that redirect pages would exist if page transclusion had been invented first.
If our search indexing is good enough to deal with sound-alikes then great, but if not (as is currently the case), then redirects/synonyms/aliases/whatever you call it, will be used to make these redirects manually (as is currently the case, and as will continue to be the case if the new feature is added first, whichever name you choose).
"Sounds alike" is a feature that will prove exceedingly problematic. [rest of comment snipped]
Not really relevant to my point at all. I was saying that it is incorrect to assume that the name you choose will make any difference to the way people use the tool. If it can act like a hammer, and there is no hammer, then it will be used like a hammer.
- Mark Clements (HappyDog)
Steve Bennett wrote:
- Write "...[[Gary Smith]]..." in some wikitext.
- Press preview (or perhaps even save)
- All ambiguous links (anything that doesn't point to an actual, non-dab
page) are highlighted somehow
See bugs 4709 and 8339
Also, WYSIWIG would be helpful for it...
On Fri, Oct 26, 2007 at 10:59:53AM +0200, Rolf Lampa wrote:
Soundex is of course not a replacement for neither Redirects nor Aliases. Apart from that, Soundex, or its derivations, is getting better and better.
If you're working in English.
I don't believe Soundex per se (and Soundex is someone's trademark, I *think*) works in other character-based lanaguages.
And of course...
Cheers, -- jr 'what about zh?' a
Jay R. Ashworth skrev:
On Fri, Oct 26, 2007 at 10:59:53AM +0200, Rolf Lampa wrote:
Soundex is of course not a replacement for neither Redirects nor Aliases. Apart from that, Soundex, or its derivations, is getting better and better.
If you're working in English.
I don't believe Soundex per se (and Soundex is someone's trademark, I *think*) works in other character-based lanaguages.
I don't know any deatils about Soundex per se, but as with any technology, also software dealing with phonetics has only a limited range of application. The value, though, is when a technology can help users do away with trivia, so that more (manual) time can be spent on value, the semantics.
Oh, but you already knew that. =)
Regards,
// Rolf Lampa
On 26/10/2007, Steve Bennett stevagewp@gmail.com wrote:
On 10/26/07, Rolf Lampa rolf.lampa@rilnet.com wrote:
Soundex.
The title variants, or, very often due to differencies in spelling, is an old problem which was solved a long time ago, long before computers came about. The (old) solution was based on the fact that sound comprises differencies in spelling etc, hence "Soundex" :
Heh. No. Soundex is awful. There might be something better by now, but not Soundex. Anything but that. In a previous job I briefly flirted with it to perform name matching but it (or the SQL Server implementation at least) is useless - it collapses any name down to 4 consonants, making Steve and Stove identical, for instance.
Anyway a Soundex-like tool might be useful to complement or improve searching, but the situation I'm describing here is when you know exactly what search terms you want to reach, but it's a lot of effort to create all those redirects.
There's been a better alternative to Soundex for many years called Metaphone. I think there's even several variants of it these days.
I did some tests with Soundex or Metaphone when I was developing my DidYouMean extension. It's not too hard to use a different normalization algorithm. I also tried angagrams and textonyms.
Andrew Dunbar (hippietrail)
Steve _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Andrew Dunbar wrote:
There's been a better alternative to Soundex for many years called Metaphone. I think there's even several variants of it these days.
I did some tests with Soundex or Metaphone when I was developing my DidYouMean extension.
Thanks for the hint!
Regards,
// Rolf Lampa
I haven't read the entire thread but it seems the extension I was working on for Wiktionary would be relevant here: http://www.mediawiki.org/wiki/Extension:DidYouMean
It normalizes article names and traps page creations, deletions, and moves to maintain a database table of normalized titles.
At every page view and search request this database is queried (unless already cached) and a list of similar titles is suggested to the user.
The English Wiktionary already has a way (using templates) to suggest similar article titles. DidYouMean combines this hand-edited list with its generated list and displays them in the manner expected on Wiktionary.
I had already considered adding a subset of pattern matching to the normalization for finding kinds of similar titles that normalization alone wouldn't find. This would be essential for a Wikipedia solution.
Currently DidYouMean normalizes accented characters to unaccented characters, strips Hebrew and Arabic vowels, normalizes Japanese fullwidth and halfwidth characters to normal width, etc. It also strips spaces, hyphens, apostrophes, periods etc.
Obviously the matching heuristics for Wikipedia would be different. Possibly including word stemming and stoplists but possibly also hand-coded rules in a special page.
It might even be possible to automate page disambiguation to some degree using these methods.
Andrew Dunbar (hippietrail)
On 10/26/07, Andrew Dunbar hippytrail@gmail.com wrote:
I haven't read the entire thread but it seems the extension I was working on for Wiktionary would be relevant here: http://www.mediawiki.org/wiki/Extension:DidYouMean
This looks promising, and is yet another way to solve the general problem
we're slowly formulating. Do you have an installed working copy somewhere we can look at?
Steve
On 26/10/2007, Steve Bennett stevagewp@gmail.com wrote:
On 10/26/07, Andrew Dunbar hippytrail@gmail.com wrote:
I haven't read the entire thread but it seems the extension I was working on for Wiktionary would be relevant here: http://www.mediawiki.org/wiki/Extension:DidYouMean
This looks promising, and is yet another way to solve the general problem
we're slowly formulating. Do you have an installed working copy somewhere we can look at?
Yes: http://wiktionarydev.leuksman.com
Andrew Dunbar (hippietrail)
Steve _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 10/26/07, Andrew Dunbar hippytrail@gmail.com wrote:
Um, example page? I can't see anything...what is supposed to happen? What am I missing?
Steve
On 26/10/2007, Steve Bennett stevagewp@gmail.com wrote:
On 10/26/07, Andrew Dunbar hippytrail@gmail.com wrote:
Um, example page? I can't see anything...what is supposed to happen? What am I missing?
Make two pages that only differ by capitalisation, accents, spacing, hyphenation, apostrophes etc. You will see that both pages have a "See also" link at the top. Add more variations and you will see they all become interlinked... It's a wiki - you can edit it. And it's a test wiki so don't worry about creating meaningless pages - that's just what it's for (-:
Andrew Dunbar (hippietrail)
Steve _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 10/26/07, Andrew Dunbar hippytrail@gmail.com wrote:
Make two pages that only differ by capitalisation, accents, spacing, hyphenation, apostrophes etc. You will see that both pages have a "See also" link at the top. Add more variations and you will see they all become interlinked... It's a wiki - you can edit it. And it's a test wiki so don't worry about creating meaningless pages - that's just what it's for (-:
Ok, I see now. Cool!
However, I think this is only a half solution without disambiguation text. If we have some magic disambiguating text for each page, then the following things can be done automatically:
- Did you mean/see also/"For X, see Y" at the top of pages. - Dynamically constructed disambiguation pages (though not quite as beautiful as present for big ones, because we wouldn't have a way of grouping thematically) - Better search results - Possibly better disambiguating on save, as I described earlier. - Possibly other great new ideas.
The idea occurs that if we could massively improve search, we might not need hand-maintained disambiguation pages at all. Or at least, only for special cases.
I'm seriously liking the idea of disambiguating at page-save time. Links that point somewhere ambiguous should not be shown blue: maybe they should have a wiggly underline or something.
Ideally, also we would not have pages where the title is ambiguous but it's a real page (eg, [[Nice]]). Any link to [[Nice]] is inherently ambiguous: did the person really intend to link to the French city, or did they actually mean [[Nice (programming language)]]?
This is really a powerful solution: if every link is guaranteed to point to the right place, then we basically don't need redirects. Instead, we just need search hints. And if they're just hints, they can overlap, as I originally described. But the big bonus is that the different query entry points ("Go" button, link, typing URL) all behave the same: attempt to look up the page, if not, search using hints.
The question is, is "no ambiguous links" achievable? Some issues I foresee: - Obviously it will take a while to find and remove all the ambiguous links (and to train people) - Currently we consider it acceptable to deliberately link to a redirect, eg for a subject which is currently part of another article, but which should one day be split off. We would need a way to indicate this desire. - We would need a way to indicate that a link is a deliberate link to [[Nice]], rather than any of the homonyms. Any ideas for syntax?
That done, the three major changes would be: - All links to redirects would be replaced by the target (some bots do this already) - Links to pages that are invalid simple link targets (ie, dab pages and [[Nice]] pages) would be shown as "in need of attention" - At save time, a reminder that there exist pages to be dabbed.
Steve
Steve Bennett wrote:
I'm seriously liking the idea of disambiguating at page-save time. Links that point somewhere ambiguous should not be shown blue: maybe they should have a wiggly underline or something.
Some work on this is long overdue. See http://en.wikipedia.org/wiki/Wikipedia:Disambiguation_pages_with_links for an idea of how bad the current state is. If you include pages like "Nice", as Steve Bennett mentioned, it gets much worse.
Soo Reams
Andrew Dunbar skrev:
It might even be possible to automate page disambiguation to some degree using these methods.
One could view disambiguation page as a form of a "HashListView" for humans.
A disambiguation page nearly corresponds to a hash bucket in hash lists, a slot in which you collect items which end up with the same ("conflicting") hash key generated by some given hash algorithm. And in this slot/bucket you perform your final more fine-grained search/selection. The concept is generic and just as usful for both humans and computers.
Perhaps one could test how far one can get with automation of disambiguation pages by combining the existing pages with a handcrafted section plus an additional automagically created section below?
Regards,
// Rolf Lampa
Actually I think this whole proposal violates the spirit of "don't fix redirects that aren't broken".
Regex searches? Well that's an interesting thought, but only if you mean searching for pages whose text matches a regex entered by the user, but the other way around... what... o.O
—C.W.
On 10/30/07, Charlotte Webb charlottethewebb@gmail.com wrote:
Actually I think this whole proposal violates the spirit of "don't fix redirects that aren't broken".
Redirects take a lot of non-creative, mind-numbing work to set up and
maintain. And there are still massive gaps that redirects don't cover. From that point of view, they're broken.
But anyway, my proposal seems to boil down to: "Only use redirects to find stuff, not to link to."
What would happen if we made redirects show up as red links? If you clicked the red link, you would be taken to the target of the redirect, but it would be sufficiently annoying that you would probably fix the original article. Again, we'd need some way to mark those few deliberate redirects links.
Steve
On 10/29/07, Steve Bennett stevagewp@gmail.com wrote:
But anyway, my proposal seems to boil down to: "Only use redirects to find stuff, not to link to."
What would this help, again?
On 10/30/07, Simetrical Simetrical+wikilist@gmail.com wrote:
On 10/29/07, Steve Bennett stevagewp@gmail.com wrote:
But anyway, my proposal seems to boil down to: "Only use redirects to find stuff, not to link to."
What would this help, again?
That was my question about 2 posts ago. The main benefits I see are these:
* Hard redirects can then be replaced by a lighter, dynamic, pattern-based system like I originally proposed, without breaking anything. * We can making linking to something work the same way as searching* for it. Currently doing a search* for "alfred deakin" magically finds its way to "Alfred Deakin", but linking to it makes a red link. Linking to "harry potter" is ok though because there's an actual rd. Searching* for the programming language under "Nice" lets you know instantly that you've made a mistake, but linking to it fails silently. All of this is easily fixed if the software helps/forces you to get your links right at the time you save them, and it can use the same rules as searching*.
Other benefits?
Steve *Searching - "typing something in the search box then pressing 'Go'."
wikitech-l@lists.wikimedia.org