Someone raised the possibility of a bot that would go through a category (of biographies), read all articles, find '''bold''' parts, check if these exist as a page, and create a redirect if not.
IMHO that should be a semi-automated process, not a fully automated one (creating redirects, that is). So, here's the tool:
http://tools.wikimedia.de/~magnus/name_redirects.php
Category suggestion : "XXXX deaths" :-)
Have fun redirecting, Magnus
On 10/18/07, Magnus Manske magnusmanske@googlemail.com wrote:
Someone raised the possibility of a bot that would go through a category (of biographies), read all articles, find '''bold''' parts, check if these exist as a page, and create a redirect if not.
IMHO that should be a semi-automated process, not a fully automated one (creating redirects, that is). So, here's the tool:
I have some javascript that will make the bold-faced strings (ones which differ from the page title) into clickable links which open an edit window for that title, insert a redirect back to the article title, and autoclick the save button.
Category suggestion : "XXXX deaths" :-)
Have fun redirecting, Magnus
Of course. Redirects (and merges) are cheap, fun, and a great way to... well, don't mind me... :S
—C.W.
On 18/10/2007, Magnus Manske magnusmanske@googlemail.com wrote:
Someone raised the possibility of a bot that would go through a category (of biographies), read all articles, find '''bold''' parts, check if these exist as a page, and create a redirect if not. IMHO that should be a semi-automated process, not a fully automated one (creating redirects, that is). So, here's the tool: http://tools.wikimedia.de/~magnus/name_redirects.php
Bug report! Put in Category:2007 deaths and you get the following:
Luigi_Filippo_D'Amico Luigi Filippo D'Amico : #REDIRECT [[Luigi Filippo D'Amico]] [do it!] (12 pages link here)
Note: it's trying to redirect the article to itself. And when you hit "do it!", it'll happily offer to wipe out the article with the self-redirect ...
- d.
On 10/18/07, David Gerard dgerard@gmail.com wrote:
On 18/10/2007, Magnus Manske magnusmanske@googlemail.com wrote:
Someone raised the possibility of a bot that would go through a category (of biographies), read all articles, find '''bold''' parts, check if these exist as a page, and create a redirect if not. IMHO that should be a semi-automated process, not a fully automated one (creating redirects, that is). So, here's the tool: http://tools.wikimedia.de/~magnus/name_redirects.php
Bug report! Put in Category:2007 deaths and you get the following:
Luigi_Filippo_D'Amico Luigi Filippo D'Amico : #REDIRECT [[Luigi Filippo D'Amico]] [do it!] (12 pages link here)
Note: it's trying to redirect the article to itself. And when you hit "do it!", it'll happily offer to wipe out the article with the self-redirect ...
Thanks, should be fixed now. Should there be a MediaWiki function that prevents one from saving a #REDIRECT to the same topic? ;-)
Magnus
On 2007.10.18 17:08:03 +0100, Magnus Manske magnusmanske@googlemail.com scribbled 26 lines:
On 10/18/07, David Gerard dgerard@gmail.com wrote:
On 18/10/2007, Magnus Manske magnusmanske@googlemail.com wrote:
Someone raised the possibility of a bot that would go through a category (of biographies), read all articles, find '''bold''' parts, check if these exist as a page, and create a redirect if not. IMHO that should be a semi-automated process, not a fully automated one (creating redirects, that is). So, here's the tool: http://tools.wikimedia.de/~magnus/name_redirects.php
Bug report! Put in Category:2007 deaths and you get the following:
Luigi_Filippo_D'Amico Luigi Filippo D'Amico : #REDIRECT [[Luigi Filippo D'Amico]] [do it!] (12 pages link here)
Note: it's trying to redirect the article to itself. And when you hit "do it!", it'll happily offer to wipe out the article with the self-redirect ...
Thanks, should be fixed now. Should there be a MediaWiki function that prevents one from saving a #REDIRECT to the same topic? ;-)
Magnus
It would be nice, but there a lot more pressing issues to be working on. This problem really needs to just be handed over to someone with OCD or Asberger's equipped with Pywikipedia's selflink.py.
-- gwern 2A0120 767 LHI 3848 TRD rail Edens cypherpunk Horiuchi high
On 10/18/07, Magnus Manske magnusmanske@googlemail.com wrote:
Thanks, should be fixed now. Should there be a MediaWiki function that prevents one from saving a #REDIRECT to the same topic? ;-)
Yes. I'd guess 1 in 100 times that happens it's experimental vandalism of the "Hehehe, I wonder what would happen if..." sort, and the other 99 times it's an embarrassing paste error by an at least somewhat serious editor.
—C.W.
Charlotte Webb wrote:
On 10/18/07, Magnus Manske magnusmanske@googlemail.com wrote:
Thanks, should be fixed now. Should there be a MediaWiki function that prevents one from saving a #REDIRECT to the same topic? ;-)
Yes. I'd guess 1 in 100 times that happens it's experimental vandalism of the "Hehehe, I wonder what would happen if..." sort, and the other 99 times it's an embarrassing paste error by an at least somewhat serious editor.
I'd guess that about 5% of the times it happens it's some tremendously witty and clever person who's come up with the novel concept of redirecting [[Self-referential humor]] to itself.
On 2007.10.21 16:04:46 -0600, Bryan Derksen bryan.derksen@shaw.ca scribbled 0 lines:
Charlotte Webb wrote:
On 10/18/07, Magnus Manske magnusmanske@googlemail.com wrote:
Thanks, should be fixed now. Should there be a MediaWiki function that prevents one from saving a #REDIRECT to the same topic? ;-)
Yes. I'd guess 1 in 100 times that happens it's experimental vandalism of the "Hehehe, I wonder what would happen if..." sort, and the other 99 times it's an embarrassing paste error by an at least somewhat serious editor.
I'd guess that about 5% of the times it happens it's some tremendously witty and clever person who's come up with the novel concept of redirecting [[Self-referential humor]] to itself.
Honestly, I'd disagree. Webb's right that a lot of the time it *is* just a copy-paste error - I've seen quite a few hand-copied sort of template things.
But based on my experience delinking hundreds of self-redirects using the Pywikipedia script I've already mentioned, most of the uses are far more sordid and pathetic: they self-link to get *bolding* on the name.
Yes, that's right, instead of going '''Page name''', they go [[Page name]].
Oy gevalt.
-- gwern Nations SURSAT Abdurahmon ID Mountain shbangs MITM TRDL card b9
On 10/21/07, Gwern Branwen gwern0@gmail.com wrote:
Honestly, I'd disagree. Webb's right that a lot of the time it *is* just a copy-paste error - I've seen quite a few hand-copied sort of template things.
Yes, and a semi-automating tool, when used by people who create large numbers of redirects, would reduce the incidence of this error.
But based on my experience delinking hundreds of self-redirects using the Pywikipedia script I've already mentioned, most of the uses are far more sordid and pathetic: they self-link to get *bolding* on the name.
You're conflating two different issues here, and the latter has nothing to do with redirects.
Yes, that's right, instead of going '''Page name''', they go [[Page name]].
The matter being discussed is where instead of going "#redirect [[Title of correct target page]]" they go "#redirect [[Title of this redirect]]" (or redirecting [[Self-referential humor]] to itself as Bryan suggested — has somebody actually done that? hehehe)
It would be easy to run a bot to identify existing cases like this (a page redirecting to itself), but damn near impossible to determine editor intent. I think the best procedure for a bot would something like:
"If a previous version exists (regardless of whether it appears to be a different redirect, or a proper article) revert to that version. Otherwise delete. Notify user in either case."
—C.W.
Charlotte Webb wrote:
The matter being discussed is where instead of going "#redirect [[Title of correct target page]]" they go "#redirect [[Title of this redirect]]" (or redirecting [[Self-referential humor]] to itself as Bryan suggested — has somebody actually done that? hehehe)
Repeatedly. It's like how our article [[Earth]] used to be routinely replaced with the text "mostly harmless" before semiprotection came to town. Cute for the first time, not so much for the seventh and subsequent times. :)
Bryan Derksen wrote:
Repeatedly. It's like how our article [[Earth]] used to be routinely replaced with the text "mostly harmless" before semiprotection came to town. Cute for the first time, not so much for the seventh and subsequent times. :)
Heh. Could we use a TiredJokeBot? It'd just look for these familiar patterns and automatically revert, leaving a little note on the user's page. Heck, it could even link to the first time it was done, making clear that their joke was hilarious in 2004.
William
Bryan Derksen wrote:
Repeatedly. It's like how our article [[Earth]] used to be routinely replaced with the text "mostly harmless" before semiprotection came to town. Cute for the first time, not so much for the seventh and subsequent times. :)
I hereby stake my claim to "cute" -- I'm reasonably sure I was the first to add "mostly harmless" to [[Earth]].
I was probably also the third or fourth, but I don't think I continued much beyond that. Definitely not the seventh.
See? Vandals can be rehabilitated....
Magnus Manske schreef:
IMHO that should be a semi-automated process, not a fully automated one (creating redirects, that is). So, here's the tool:
http://tools.wikimedia.de/~magnus/name_redirects.php
Category suggestion : "XXXX deaths" :-)
Thanks! I've started on 1991 deaths, and I've already racked up ~200 edits in barely an hour :-) And I'm only at the letter C...
But now I've come across a redirect [[Vance Colvig, Jr.]] => [[Vance Colvig]] which was created before, and deleted because "unnecessary redirect, unlikely to be used". Predictably, this name is in use now. And now I'm checking the deletion log of User:Tregoweth, and I'm depressed: he has deleted hundreds of reasonable redirects...
What's the use of contributing to Wikipedia if your work can be deleted by the first random idiot passing by?
Eugene
On 10/18/07, Eugene van der Pijll eugene@vanderpijll.nl wrote:
What's the use of contributing to Wikipedia if your work can be deleted by the first random idiot passing by?
That's a damn good question.
—C.W.
On 10/19/07, Charlotte Webb charlottethewebb@gmail.com wrote:
On 10/18/07, Eugene van der Pijll eugene@vanderpijll.nl wrote:
What's the use of contributing to Wikipedia if your work can be deleted by the first random idiot passing by?
That's a damn good question.
Indeed, given that people given admin powers should not be random idiots, one hopes.
-Matt
On 10/18/07, Magnus Manske magnusmanske@googlemail.com wrote:
Someone raised the possibility of a bot that would go through a category (of biographies), read all articles, find '''bold''' parts, check if these exist as a page, and create a redirect if not.
It would be really nice if redirects were replaced by a system of aliases, which were part of the metadata to the article.
In particular, if you could do regular expressions, like:
Fr[ée]deri(que|c) (Fran[çc]ois)? (Leblanc|Gonzale[zs]) (Jr(.)?)?
Obviously you'd want to place restrictions to cut down the number of matches, but this would save a lot of tedious redirect creating.
In fact the expression language would only need one feature: multiple options for parts of a word, preferably nested. So the above example could actually be expressed:
Fr[é|e]deri[que|c] [Fran[ç|c]ois|] [Leblanc|Gonzale[z|s]] [Jr[.|]]
Not so abusable. Very useful. Any takers to implement it? :)
Steve
It would be really nice if redirects were replaced by a system of aliases, which were part of the metadata to the article.
In particular, if you could do regular expressions, like:
Fr[ée]deri(que|c) (Fran[çc]ois)? (Leblanc|Gonzale[zs]) (Jr(.)?)?
Obviously you'd want to place restrictions to cut down the number of matches, but this would save a lot of tedious redirect creating.
In fact the expression language would only need one feature: multiple options for parts of a word, preferably nested. So the above example could actually be expressed:
Fr[é|e]deri[que|c] [Fran[ç|c]ois|] [Leblanc|Gonzale[z|s]] [Jr[.|]]
Not so abusable. Very useful. Any takers to implement it? :)
Shouldn't be too hard to implement as long as you don't mind it taking half an hour to load each page as the code has to check every article in the database to see which have an alias which matches whatever you typed in/clicked on. (And that's without addressing the issue of duplicates. A redirect can only point to one article, a given string can match the regexps in many articles. Automated disambig pages, perhaps?)
Nice idea, though.
Thomas Dalton wrote:
It would be really nice if redirects were replaced by a system of aliases, which were part of the metadata to the article. [...] In fact the expression language would only need one feature: multiple options for parts of a word, preferably nested. So the above example could actually be expressed:
Fr[é|e]deri[que|c] [Fran[ç|c]ois|] [Leblanc|Gonzale[z|s]] [Jr[.|]]
Not so abusable. Very useful. Any takers to implement it? :)
Shouldn't be too hard to implement as long as you don't mind it taking half an hour to load each page as the code has to check every article in the database to see which have an alias which matches whatever you typed in/clicked on.
Technically, that's not necessary.
You know that these things only change on save, so at that point you look at the difference between the old aliases and the new and update the master set. Computationally, it's only a smidgen more expensive than our current approach. And given that we're such a read-heavy environment, unnoticeably so.
(And that's without addressing the issue of duplicates. A redirect can only point to one article, a given string can match the regexps in many articles. Automated disambig pages, perhaps?)
That'd be a great way to solve that. And the main bit could be done as automatically updating our redirect pages. As a first pass, anyhow.
From a user experience perspective, I'd be a little worried about putting more mysterious Wiki markup at the top of a page. On another wiki I'm working on, we're moving more of this metadata outside the markup and to specialized UIs, so that it doesn't clutter the edit box.
I think the only real abuse potential comes from either putting in a giant list or trying to redirect in a bunch of existing articles. But one you can catch with a size limit, and the other you could fix by refusing to mess with real articles.
So Steve, I'd say it's a great idea. However, I'd want to do some user testing. Since I've been doing regular expressions for so long, they make instant sense to me, but even this limited version might be too mysterious for most of our editors. Perhaps the special UI would show them the list of generated alternatives as they edit?
William
Technically, that's not necessary.
You know that these things only change on save, so at that point you look at the difference between the old aliases and the new and update the master set. Computationally, it's only a smidgen more expensive than our current approach. And given that we're such a read-heavy environment, unnoticeably so.
Not using regular expressions, you can't. Regexps are good for telling if a string matches a pattern, they aren't good for producing a list of all strings that match a pattern (which could be infinitely long if you allow arbitrary patterns). A much simplified version of regexps could be used, but I'm not sure if it's worth it.
Thomas Dalton wrote:
Technically, that's not necessary.
You know that these things only change on save, so at that point you look at the difference between the old aliases and the new and update the master set. Computationally, it's only a smidgen more expensive than our current approach. And given that we're such a read-heavy environment, unnoticeably so.
Not using regular expressions, you can't. Regexps are good for telling if a string matches a pattern, they aren't good for producing a list of all strings that match a pattern (which could be infinitely long if you allow arbitrary patterns). A much simplified version of regexps could be used, but I'm not sure if it's worth it.
Actually, the limited form of regexes that Steve suggested should always produce a finite list. So I think it would work.
Even if you needed the full power of regular expressions, you could still build something pretty workable in our environment by keeping the list hot and doing some partitioning and caching. But we don't need that, so the simple solution would work.
William
On 10/22/07, William Pietri william@scissor.com wrote:
You know that these things only change on save, so at that point you look at the difference between the old aliases and the new and update the master set. Computationally, it's only a smidgen more expensive than our current approach. And given that we're such a read-heavy environment, unnoticeably so.
Yeah, I did some more thinking about this in bed and thought about a possible implementation.
The wiki code would be a single line, like #ALIASES [City of ][Greater ]Melbourne[, Victoria| (Australia)]
Two tables would store all the aliases: one would store the raw patterns, and another would expand them, possibly only partially. You could expand say the first 5 characters: "City ","of [Greater ]Melbourne[, Victoria| (Australia)]" "Great", "er Melbourne[, Victoria| (Australia)]" "Melbo", "urne[, Victoria| (Australia)]
3 entries. That way, once a user types an actual request (say, "Greater Melbourne"), you just look up the first 5 characters ("Great"), then iterate over the matches there. There are lots of algorithms and data structures that would help here.
(And that's without addressing the issue of duplicates. A redirect can only point to one article, a given string can match the regexps in many articles. Automated disambig pages, perhaps?)
That'd be a great way to solve that. And the main bit could be done as automatically updating our redirect pages. As a first pass, anyhow.
Omg, automated disambig pages. Yes please! Maintaining disambiguation pages is horribly time consuming. You could conceive of another keyword like "{{disambigtext|Second largest city in Australia.}}" that would be shown where necessary. But I'm getting ahead of myself.
From a user experience perspective, I'd be a little worried about putting more mysterious Wiki markup at the top of a page. On another wiki I'm working on, we're moving more of this metadata outside the markup and to specialized UIs, so that it doesn't clutter the edit box.
So put it at the bottom, next to {{DEFAULTSORT}}. I do agree that location-independent metadata should be separated from content though. Categories and interwikis fall into that category too.
I think the only real abuse potential comes from either putting in a giant list or trying to redirect in a bunch of existing articles. But one you can catch with a size limit, and the other you could fix by refusing to mess with real articles.
Most likely from having a redirect which expands to too many possibilities, like [A|b|c|d|e][A|b|c|d|e][A|b|c|d|e][A|b|c|d|e]. But that would be easily catchable. The trouble is what to do about it, besides failing silently. Perhaps reject it into a special page that admins can browse from time to time?
So Steve, I'd say it's a great idea. However, I'd want to do some user testing. Since I've been doing regular expressions for so long, they make instant sense to me, but even this limited version might be too mysterious for most of our editors. Perhaps the special UI would show them the list of generated alternatives as they edit?
Well the great thing with such a limited expression language is that there's very little to learn, and very little to stuff up. And even better, users can just use the most naive approach imaginable. So, while a CS major would readily write an expression like:
#ALIASES [Dr] Grace [Smith|Jones]
A beginner user might simply write:
#ALIASES [Dr Grace Smith|Dr Grace Jones|Grace Smith|Grace Jones]
or even: #ALIASES Dr Grace Smith #ALIASES Dr Grace Jones #ALIASES Grace Smith #ALIASES Grace Jones
A UI tool would obviously help, but that would be a slight departure for MediaWiki. There's nothing else like that atm (afaik), so it's hard to picture how it would fit in exactly.
Steve
Steve Bennett wrote:
Yeah, I did some more thinking about this in bed and thought about a possible implementation.
The wiki code would be a single line, like #ALIASES [City of ][Greater ]Melbourne[, Victoria| (Australia)]
Two tables would store all the aliases [...] just look up the first 5 characters ("Great"), then iterate over the matches there. There are lots of algorithms and data structures that would help here.
True. Although we could do it with no database changes if we just go through and update the existing redirects on page save. I think I'd only go for something more interesting if we wanted to get rid of redirects altogether in favor of aliases. I haven't looked at the request routing, code, though, so maybe I'm just being a bit chicken.
From a user experience perspective, I'd be a little worried about
putting more mysterious Wiki markup at the top of a page. On another wiki I'm working on, we're moving more of this metadata outside the markup and to specialized UIs, so that it doesn't clutter the edit box.
So put it at the bottom, next to {{DEFAULTSORT}}. I do agree that location-independent metadata should be separated from content though. Categories and interwikis fall into that category too.
Hmmm... There I'd be worried about people not finding it. Probably better to do actual user testing, but my guess is that sticking it in at the top is the best easy implementation.
I think the only real abuse potential comes from either putting in a giant list or trying to redirect in a bunch of existing articles. But one you can catch with a size limit, and the other you could fix by refusing to mess with real articles.
Most likely from having a redirect which expands to too many possibilities, like [A|b|c|d|e][A|b|c|d|e][A|b|c|d|e][A|b|c|d|e]. But that would be easily catchable. The trouble is what to do about it, besides failing silently. Perhaps reject it into a special page that admins can browse from time to time?
I think both should just refuse to save with error messages, sort of like the behavior now if you ask it to nag you about edit comments.
Alternatively, it could behave like most other mangled wikimarkup and just render it as plain text if it doesn't like it. Although that's more consistent with the current model, I think the not-visible-on-the-page nature of redirects would make that approach wrong for #ALIASES.
Well the great thing with such a limited expression language is that there's very little to learn, and very little to stuff up. And even better, users can just use the most naive approach imaginable. So, while a CS major would readily write an expression like:
#ALIASES [Dr] Grace [Smith|Jones]
A beginner user might simply write:
#ALIASES [Dr Grace Smith|Dr Grace Jones|Grace Smith|Grace Jones]
or even: #ALIASES Dr Grace Smith #ALIASES Dr Grace Jones #ALIASES Grace Smith #ALIASES Grace Jones
That's a great point.
Would we need a little more in the syntax to suggest whether blocks could be optional?
A UI tool would obviously help, but that would be a slight departure for MediaWiki. There's nothing else like that atm (afaik), so it's hard to picture how it would fit in exactly.
Yeah, it would be a departure for sure. On the other Wiki-driven project I've been working on, which we coded from scratch, we've been adding more JavaScript UI for metadata and it has been a big hit, especially for things that have complicated structure.
William
On 10/22/07, William Pietri william@scissor.com wrote:
True. Although we could do it with no database changes if we just go through and update the existing redirects on page save. I think I'd only
My first reaction is that that's kludgy. I guess the downsides are that you end up with actual redirect pages, and lots of them, inflating our page count, and possibly other bad things.
go for something more interesting if we wanted to get rid of redirects
altogether in favor of aliases. I haven't looked at the request routing, code, though, so maybe I'm just being a bit chicken.
Oh, I meant to mention, I assume this behaviour would take place after all the other lookups:
Greater Melbourne -> Greater Melbourne (fails) -> greater melbourne (fails) -> GREATER MELBOURNE (fails) try the aliases... etc.
I'm not sure what the long term role of redirects would be. You can't just delete them because you lose history...though IMHO the history isn't doing a lot of good sitting on a dusty redirect somewhere.
Hmmm... There I'd be worried about people not finding it. Probably
better to do actual user testing, but my guess is that sticking it in at the top is the best easy implementation.
Yeah, we'll have to see how it works out in practice.
I think both should just refuse to save with error messages, sort of
like the behavior now if you ask it to nag you about edit comments.
Well, it's unlike any behaviour - MediaWiki *never* refuses to save because of bad syntax.
Alternatively, it could behave like most other mangled wikimarkup and
just render it as plain text if it doesn't like it. Although that's more consistent with the current model, I think the not-visible-on-the-page nature of redirects would make that approach wrong for #ALIASES.
I think that would be ok. At least you know you've done something wrong and will fix it...or someone else will. It's more difficult if the #ALIASES line is syntactically valid, but illegal due to too many expansions.
That's a great point.
Would we need a little more in the syntax to suggest whether blocks could be optional?
IMHO:
A [Foo] Bar = A Foo Bar, A Bar A [Foo|Moo] Bar = A Foo Bar, A Moo Bar A [Foo|Moo|] Bar = A Foo Bar, A Moo Bar, A Bar A [[Foo|Moo]] Bar = same as previous, but discouraged.
It keeps the total syntax down to three elements: [], |, and an escaping mechanism, presumably . Fortunately [, ], | and \ are all extremely uncommon in (Wikipedia) page titles.
Yeah, it would be a departure for sure. On the other Wiki-driven project I've been working on, which we coded from scratch, we've been adding more JavaScript UI for metadata and it has been a big hit, especially for things that have complicated structure.
Hit in what sense?
Anyway, I'll propose this on Wikitech and see what they have to say. It's easy to say that it could be implemented, but I haven't hacked on the MediaWiki code.
Steve
On 10/22/07, Steve Bennett stevagewp@gmail.com wrote:
IMHO:
A [Foo] Bar = A Foo Bar, A Bar A [Foo|Moo] Bar = A Foo Bar, A Moo Bar A [Foo|Moo|] Bar = A Foo Bar, A Moo Bar, A Bar A [[Foo|Moo]] Bar = same as previous, but discouraged.
It keeps the total syntax down to three elements: [], |, and an escaping mechanism, presumably . Fortunately [, ], | and \ are all extremely uncommon in (Wikipedia) page titles.
For self-consistency, how about: A [Foo] Bar = A Foo Bar, A Bar A [Foo|Moo] Bar = A Foo Bar, A Moo Bar, A Bar A [Foo|Moo|] Bar = A Foo Bar, A Moo Bar
That is, the final pipe/bracket combo would be used to make the bracketed choices a required part of the expansion. That's consistent with the use of [Foo] to mean Foo is optional (you could then do [Foo|] to mean Foo is required, although that's just a waste of space). It'd also probably be (logically) easier to parse.
Also, there's no reason to implement an optional (but already deprecated) syntax if we're creating something from scratch.
--Darkwind
On 10/22/07, RLS evendell@gmail.com wrote:
For self-consistency, how about: A [Foo] Bar = A Foo Bar, A Bar A [Foo|Moo] Bar = A Foo Bar, A Moo Bar, A Bar A [Foo|Moo|] Bar = A Foo Bar, A Moo Bar
I find that very counterintuitive. [Foo|Moo|] to mea reads "Foo, or Moo, or". In other words, Foo, or Moo, or blank - making the whole bracketed text optional.
That is, the final pipe/bracket combo would be used to make the
bracketed choices a required part of the expansion. That's consistent with the use of [Foo] to mean Foo is optional (you could then do [Foo|] to mean Foo is required, although that's just a waste of space). It'd also probably be (logically) easier to parse.
Hm. I don't quite agree, but that's not too important. I'm sure if this gets implemented we will come up with an expression syntax that appeals to everyone.
A [[Foo|Moo]] Bar = same as previous, but discouraged.
Also, there's no reason to implement an optional (but already deprecated) syntax if we're creating something from scratch.
It's not a special syntax, it just arises automatically if you can nest
expressions. By definition if [X] means X is optional, and [Y|Z] means you must have Y or Z, then [[Foo|Moo]] must mean that you can optionally have either a Foo or a Moo.
Steve
A [Foo] Bar = A Foo Bar, A Bar A [Foo|Moo] Bar = A Foo Bar, A Moo Bar A [Foo|Moo|] Bar = A Foo Bar, A Moo Bar, A Bar A [[Foo|Moo]] Bar = same as previous, but discouraged.
That's the wrong way round. [Foo|Moo|] is, without any need to define new syntax: "Foo" or "Moo" or "" (the empty string, ie. nothing). Personally, I would put the extra pipe at the beginning, but either would work without any extra programming.
So:
A [Foo] Bar = A Foo Bar A [|Foo] Bar = A Bar, A Foo Bar (the initial | could be consider implicit if only one option is given - there is no reason to put the word in brackets unless you intend it to be optional. Depends if the addition convenience is worth the additional confusion.) A [Foo|Moo] Bar = A Foo Bar, A Moo Bar A [|Foo|Moo] Bar = A Bar, A Foo Bar, A Moo Bar
Thomas Dalton wrote:
A [Foo] Bar = A Foo Bar, A Bar A [Foo|Moo] Bar = A Foo Bar, A Moo Bar A [Foo|Moo|] Bar = A Foo Bar, A Moo Bar, A Bar A [[Foo|Moo]] Bar = same as previous, but discouraged.
That's the wrong way round. [Foo|Moo|] is, without any need to define new syntax: "Foo" or "Moo" or "" (the empty string, ie. nothing). Personally, I would put the extra pipe at the beginning, but either would work without any extra programming.
So:
A [Foo] Bar = A Foo Bar A [|Foo] Bar = A Bar, A Foo Bar (the initial | could be consider implicit if only one option is given - there is no reason to put the word in brackets unless you intend it to be optional. Depends if the addition convenience is worth the additional confusion.) A [Foo|Moo] Bar = A Foo Bar, A Moo Bar A [|Foo|Moo] Bar = A Bar, A Foo Bar, A Moo Bar
What this proposal seems to ignore is the normal use of square brackets in texts. In a quotation like, "He [Bob] was found on the beach" the square brackets are used to add in the name to clarify the pronoun, even though it is not part of the actual quotation. The actual name would have occurred earlier in a part of the source text that is not useful to the quote. The proposal, when parsed, seems as though it would remove the brackets.
Ec
On 10/23/07, Ray Saintonge saintonge@telus.net wrote:
ar What this proposal seems to ignore is the normal use of square brackets in texts. In a quotation like, "He [Bob] was found on the beach" the square brackets are used to add in the name to clarify the pronoun, even though it is not part of the actual quotation. The actual name would have occurred earlier in a part of the source text that is not useful to the quote. The proposal, when parsed, seems as though it would remove the brackets.
Yes, this isn't natural language, this is a pattern matching construct. But let's not get bogged down on syntax until we determine whether such a thing is even feasible.
Steve
Hi, Steve. Looks like we're in broad agreement. A couple of minor things:
Steve Bennett wrote:
On 10/22/07, William Pietri william@scissor.com wrote:
True. Although we could do it with no database changes if we just go through and update the existing redirects on page save.
My first reaction is that that's kludgy. I guess the downsides are that you end up with actual redirect pages, and lots of them, inflating our page count, and possibly other bad things.
Yeah, I was going for the minimum change in suggesting that. It would also allow us to easily remove the feature, and reduces the need for immediately reducating people and adjusting third-party tools that might deal with redirects. Ideally, though, later revisions would unify the two models.
If we were going for unconstrained rearchitecture, I've got plenty of notions, but none that I'd have time to code and support on my own nickel.
I think both should just refuse to save with error messages, sort of like the behavior now if you ask it to nag you about edit comments.
Well, it's unlike any behaviour - MediaWiki *never* refuses to save because of bad syntax.
Agreed, and that worries me some. Wikis are great for novices because errors are visible without error messages. Plus, the way Ward coded the original wiki, giving syntax error messages would have been drastically more work. (He didn't actually parse anything; it was just a bunch of regex magic. Having written an actual parser for wiki markup, it was probably ten times the effort for me.)
Here we're extending the user's power, so that the effect of their work is out of proportion to the effort. That's a classic opportunity for trouble. The effect is also almost invisible; you can see that a page is bad, but a possibly overlapping network of millions of aliases is hard to grasp. So I the obvious choices are between noisy rejection (which is almost never done now) or silent failure (which would be painfully mysterious).
I guess we could also go for noisy failure, where an error message appears in the page itself. I think this happens when you don't substitute a template when you should, for example. And I think {{cite news}} does that if you leave the article title out. Perhaps that's the approach most consistent with the current model.
Taking the aliases to a separate UI would solve this, too, of course.
IMHO:
A [Foo] Bar = A Foo Bar, A Bar A [Foo|Moo] Bar = A Foo Bar, A Moo Bar A [Foo|Moo|] Bar = A Foo Bar, A Moo Bar, A Bar A [[Foo|Moo]] Bar = same as previous, but discouraged.
It keeps the total syntax down to three elements: [], |, and an escaping mechanism, presumably . Fortunately [, ], | and \ are all extremely uncommon in (Wikipedia) page titles.
Perfect. People are already used to whitespace compression, so I think that is just fine.
Yeah, it would be a departure for sure. On the other Wiki-driven project I've been working on, which we coded from scratch, we've been adding more JavaScript UI for metadata and it has been a big hit, especially for things that have complicated structure.
Hit in what sense?
A hit in the sense that users now complain about other things. That's the most I think you can ask for. :-) Consider this page, for example:
http://www.sidereel.com/The_Office
We originally kept the related links in the wiki markup. But especially when there are hundreds of them, too many users had a hard time managing the links. Now they are stored outside the article text, and there is custom JavaScript UI for managing them. Participation is up, and grumbling is down. The only drawback is that wiki markup acts as a mild idiot filter, so I think vandalism went up some.
Now that we're adding things like user ratings and user reviews, we followed the same pattern. We feel like our experience there confirms that approach, at least for things that are metadata-ish.
William
On 10/23/07, William Pietri william@scissor.com wrote:
Yeah, I was going for the minimum change in suggesting that. It would also allow us to easily remove the feature, and reduces the need for immediately reducating people and adjusting third-party tools that might deal with redirects. Ideally, though, later revisions would unify the two models.
I think I'm advocating having both redirects *and* aliases, but over time reducing the need for redirects to special cases. So there would be no immediate need to reeducate or modify third-party tools, as redirects would still exist, and the use aliases would gradually creep in, as references did for example.
If we were going for unconstrained rearchitecture, I've got plenty of
notions, but none that I'd have time to code and support on my own nickel.
That's really the biggest issue here, of course.
Agreed, and that worries me some. Wikis are great for novices because
errors are visible without error messages. Plus, the way Ward coded the original wiki, giving syntax error messages would have been drastically more work. (He didn't actually parse anything; it was just a bunch of regex magic. Having written an actual parser for wiki markup, it was probably ten times the effort for me.)
Ah, I didn't know that. Yeah, I've written "parsers" that way - very quick to get going, but quickly becomes diabolically complex to debug.
Here we're extending the user's power, so that the effect of their work
is out of proportion to the effort. That's a classic opportunity for
Yay for wikis.
trouble. The effect is also almost invisible; you can see that a page is
bad, but a possibly overlapping network of millions of aliases is hard to grasp. So I the obvious choices are between noisy rejection (which is almost never done now) or silent failure (which would be painfully mysterious).
Can you give me a concrete example? What's the worst that can happen? Say you have two separate articles with two sets of aliases, but they happen to create an overlap. That's just a standard case of disambiguation, isn't it? A searching user just types "John Smith" and that matches 30 different aliases - so you get shown a page with those 30 different choices. Sounds like a good, useful behaviour, not a disaster?
Or are we still talking about the case of an overly broad aliases pattern, which I think we agreed should be rejected by the software?
I guess we could also go for noisy failure, where an error message
appears in the page itself. I think this happens when you don't substitute a template when you should, for example. And I think {{cite
That's actually the template itself producing that message, rather than the software.
news}} does that if you leave the article title out. Perhaps that's the
approach most consistent with the current model.
You're talking about Wikipedia. I'm really talking about MediaWiki. The only time I know that MediaWiki prints error messages midstream is when MathTeX fails (which is, of course, an extension). But there could be others.
Perfect. People are already used to whitespace compression, so I think
that is just fine.
I hadn't thought much about the whitespace issue. You would treat all whitespace as equivalent to a single space, except first or last? So I guess the behaviour of this:
"[Dr] John [Wilson] Smith"
is any of "John Smith", "Dr John Smith", "John Wilson Smith", "Dr John Wilson Smith" - ignoring the fact that you technically have "<space>John<space><space>Smith".
That seems good.
Yeah, it would be a departure for sure. On the other Wiki-driven project
I've been working on, which we coded from scratch, we've been adding more JavaScript UI for metadata and it has been a big hit, especially for things that have complicated structure.
Hit in what sense?
A hit in the sense that users now complain about other things. That's
Oh! Heh, totally missed that meaning. Was thinking "database hit", "performance hit"...:)
We originally kept the related links in the wiki markup. But especially when there are hundreds of them, too many users had a hard time managing the links. Now they are stored outside the article text, and there is custom JavaScript UI for managing them. Participation is up, and grumbling is down. The only drawback is that wiki markup acts as a mild
Wikipedia could do with a lot more of this.
idiot filter, so I think vandalism went up some.
That's ok. Vandalism is easy.
Steve
I think you've pretty much got a solution, so a couple more minor followups.
Steve Bennett wrote:
Here we're extending the user's power, so that the effect of their work is out of proportion to the effort. That's a classic opportunity for trouble.
Yay for wikis.
Not in the sense I'm talking about here. Most Wikipedia edits have an effect proportional to the effort required. If I put a few naughty words in a page, I've disturbed one page out of millions. Because cleaning up is generally even easier than vandalizing, the balance of power stays on the side of the angels.
But when you amplify someone's power, you create opportunities for trouble. A guy with a backhoe can do a lot more damage than a guy with a shovel, often without even knowing it.
The effect is also almost invisible; you can see that a page is
bad, but a possibly overlapping network of millions of aliases is hard to grasp. So I the obvious choices are between noisy rejection (which is almost never done now) or silent failure (which would be painfully mysterious).
Can you give me a concrete example? What's the worst that can happen? Say you have two separate articles with two sets of aliases, but they happen to create an overlap. That's just a standard case of disambiguation, isn't it? A searching user just types "John Smith" and that matches 30 different aliases - so you get shown a page with those 30 different choices. Sounds like a good, useful behaviour, not a disaster?
Well, I you should let the alias creator know when the collide with existing articles, as touching the articles would be wrong, and silent, invisible failure is not so good either. I think you could auto-disambiguate when aliases cross, but you'd probably want a hint to write the needed text. You should also tell people when they encounter the technical limitations of Wikipedia (by by violating WP:NCTR).
Or are we still talking about the case of an overly broad aliases pattern, which I think we agreed should be rejected by the software?
That's definitely another case. I think our options for all of these are
1. silent failure (accept #ALIASES line but do nothing or do it only partially) 2. noisy rejection (refuse to save, give error message) 3. noisy failure (accept save, but put an error message in the page) 4. special tool (JavaScript widget that gives continuous feedback)
And I can make cases for and against any of them. I'd think what really matters is what the core MediaWiki team prefers, but I'd lead towards #3.
Overall, though, it sounds like a great feature. I'd say go for it!
William
On 10/24/07, William Pietri william@scissor.com wrote:
Not in the sense I'm talking about here. Most Wikipedia edits have an effect proportional to the effort required. If I put a few naughty words in a page, I've disturbed one page out of millions. Because cleaning up is generally even easier than vandalizing, the balance of power stays on the side of the angels.
But when you amplify someone's power, you create opportunities for trouble. A guy with a backhoe can do a lot more damage than a guy with a shovel, often without even knowing it.
Agreed. I think it would be best to limit the number of possible aliases for a page, probably somewhere between 10 and 20.
Well, I you should let the alias creator know when the collide with
existing articles, as touching the articles would be wrong, and silent, invisible failure is not so good either. I think you could auto-disambiguate when aliases cross, but you'd probably want a hint to write the needed text. You should also tell people when they encounter the technical limitations of Wikipedia (by by violating WP:NCTR).
I'm not sure how much I stated before, but here was my assumption in terms of searching:
- Search term matches no real pages, no aliases: takes you to some search results. - Search term matches one real page, no aliases: takes you to real page. - Search term matches one real page, some aliases: takes you to real page. (Arguably gives you a "did you mean...?" banner, but not critical) - Search term matches one alias, no real page: takes you to page. - Search term matches several aliases, no real page: takes you to auto disambig page. Some other keyword could be used to specify the text that would appear here (and would be useful for other contexts too).
It suddenly occurs to me that you could actually do without the auto disambig page, as follows: - Search term matches several aliases, no real page: shows you search results (as if you had matched nothing), but the top few pages are those matched by the aliases. The formatting could be tweaked to ensure the text comes from the first paragraph. That way it would almost function like a disambig page anyway.
That's definitely another case. I think our options for all of these are
- silent failure (accept #ALIASES line but do nothing or do it only partially)
That's not terrible, particularly if we can monitor these failures, and perhaps alert the user.
2. noisy rejection (refuse to save, give error message)
Very bad. What if they'd written 10 paragraphs plus the #ALIASES line, they're about to lose internet access, and they can't save? Bad.
3. noisy failure (accept save, but put an error message in the page)
Ok.
4. special tool (JavaScript widget that gives continuous feedback)
Yes, but obviously much more work.
And I can make cases for and against any of them. I'd think what really
matters is what the core MediaWiki team prefers, but I'd lead towards #3.
#3 and #1 seem equally ok to me.
Steve
Steve Bennett wrote:
Agreed. I think it would be best to limit the number of possible aliases for a page, probably somewhere between 10 and 20.
Do any of our resident data masters know what article has the most redirects?
I'd just take that number, round it up, and multiply it by 2-10. Somebody going a little crazy isn't a problem. It's just that we don't want somebody creating an alias for, say, all 3-8 letter words. :-)
I'm not sure how much I stated before, but here was my assumption in terms of searching: [snipped]
Nice. That's a lot more work than I was thinking, but it sounds like a better approach. When you say search, you're thinking direct URLs and the "Go" button in the search box, right? I presume real searches should still search.
That's definitely another case. I think our options for all of these are
- silent failure (accept #ALIASES line but do nothing or do it only partially)
That's not terrible, particularly if we can monitor these failures, and perhaps alert the user.[...]
- noisy failure (accept save, but put an error message in the page)
Ok.
[...]
#3 and #1 seem equally ok to me.
In that case, I'd strongly recommend #3. #1 is much harder to debug, and probably violates the [[principle of least astonishment]].
William
On 10/24/07, William Pietri william@scissor.com wrote:
I'd just take that number, round it up, and multiply it by 2-10. Somebody going a little crazy isn't a problem. It's just that we don't want somebody creating an alias for, say, all 3-8 letter words. :-)
Hmm, maybe the minimum alias length is more the issue then. A very short alias expansion is probably a mistake though. OTOH an article reachable by 200 aliases doesn't sound normal either.
Nice. That's a lot more work than I was thinking, but it sounds like a better approach. When you say search, you're thinking direct URLs and the "Go" button in the search box, right? I presume real searches should still search.
Yeah. Someday I'm hoping "go" and "search" will find harmony. Failing that, a verb for what happens when you type a query and press "go" would be good. I "goed" for it? Ugh.
Anyway, I've posted a proposal to Wikitech. We'll see.
Steve
William Pietri schreef:
Do any of our resident data masters know what article has the most redirects?
I'd just take that number, round it up, and multiply it by 2-10.
That would be 1800 to 9000 redirects per page.
The ten articles with the most redirects in the August database dump are, according to my count:
199 #REDIRECT [[Myrmica]] 200 #REDIRECT [[List of chess openings]] 212 #REDIRECT [[Enochian angels]] 221 #REDIRECT [[Main Page]] 230 #REDIRECT [[List of Hobbits]] 234 #REDIRECT [[List of Torres Strait Islands]] 281 #REDIRECT [[Consolidated Fund Act]] 287 #REDIRECT [[Appropriation Act]] 301 #REDIRECT [[Coalition military operations of the Iraq War]] 923 #REDIRECT [[Gospel of Matthew]]
Most of these are lists, with redirects from the individual entries. These wouldn't benefit much from the new syntax. More interesting are [[Benny Benassi]] (191 redirects) and [[Tsentralny District]] (167 redirects).
Eugene
On 10/24/07, Eugene van der Pijll eugene@vanderpijll.nl wrote:
230 #REDIRECT [[List of Hobbits]]
Most of these are lists, with redirects from the individual entries. These wouldn't benefit much from the new syntax. More interesting are [[Benny Benassi]] (191 redirects) and [[Tsentralny District]] (167 redirects).
This is an interesting question. At first glance, you would think "the redirects to List of Hobbits were made when someone tried to create a new article, Esmeralda Brandybuck and it got merged back into the list". No one would set out to create that many redirects, right?
But then, if it were easy to do, it would be pretty tempting to set up an alias for every entry on the list. Why not? Just as the tool that inspired this discussion looks for bolded text, it would be tempting to set up a template that did something like:
#ALIASES {{{1}}}'''{{{1}}}'''
For each item in the list. So if someone searched for "Filibert Bolger", they would be guaranteed to end up on the most definitive page. Sure, there is no information on Filibert Bolger. But it *is* the most definitive page, and anyone wishing to add information about hem/her should do it there, and not create some new [[Filibert Bolger]].
Steve
On 24/10/2007, Steve Bennett stevagewp@gmail.com wrote:
So if someone searched for "Filibert Bolger", they would be guaranteed to end up on the most definitive page. Sure, there is no information on Filibert Bolger. But it *is* the most definitive page, and anyone wishing to add information about hem/her should do it there, and not create some new [[Filibert Bolger]].
Redirects to lists for items not on lists is not uncommon - it usually means we've had the merger in the past and now that section's been wiped.
Incidentally, many many redirects to a person can easily occur with authors - quite often, as a makeshift, book titles get redirected to their writer's article. Take a prolific author, add the four or five permutations for a book title, those can rack up fast...
On 10/22/07, William Pietri william@scissor.com wrote:
Steve Bennett wrote:
#ALIASES [Dr] Grace [Smith|Jones]
A beginner user might simply write:
#ALIASES [Dr Grace Smith|Dr Grace Jones|Grace Smith|Grace Jones]
or even: #ALIASES Dr Grace Smith #ALIASES Dr Grace Jones #ALIASES Grace Smith #ALIASES Grace Jones
That's a great point.
Would we need a little more in the syntax to suggest whether blocks could be optional?
A UI tool would obviously help, but that would be a slight departure for MediaWiki. There's nothing else like that atm (afaik), so it's hard to picture how it would fit in exactly.
Yeah, it would be a departure for sure. On the other Wiki-driven project I've been working on, which we coded from scratch, we've been adding more JavaScript UI for metadata and it has been a big hit, especially for things that have complicated structure.
I agree with Mr. Dalton that the database load of scanning for "reverse redirects"/"aliases"/whatever would be outlandish.
A partial solution, to eliminate the need for a large percentage of existing and potential redirects, is quite simple. Make wikilinks case-insensitive by default. Except in cases where two titles exist with the same spelling (but different capitalization).
This way if somebody links to "[[least weasel]]" mid-sentence (because they don't realize it is our convention to capitalize the species name), the link would automatically point to [[Least Weasel]], whether a redirect existed or not. For lesser known species where no redirect exists yet, this would eliminate the duplicate effort associated with accidentally creating a redundant article. Icing on the cake? Have the link automatically capitalize itself when the page is saved, to ensure correct typography in article space.
—C.W.
On 10/23/07, Charlotte Webb charlottethewebb@gmail.com wrote:
I agree with Mr. Dalton that the database load of scanning for "reverse redirects"/"aliases"/whatever would be outlandish.
I don't think it's possible to say that without talking about a concrete implementation. I mentioned a possible data structure which allow a good tradeoff between size and speed.
A partial solution, to eliminate the need for a large percentage of
existing and potential redirects, is quite simple. Make wikilinks case-insensitive by default. Except in cases where two titles exist with the same spelling (but different capitalization).
I've raised a very similar issue before. Basically, these three use cases are very different: - User types "least weasel" in the search box and presses "go" (MediaWiki attempts a couple of different capitalisations before giving up) - User directly types /wiki/least%20weasel (MediaWiki converts %20 to space, attempts to match, fails, then takes you to a page letting you search) - User follows a (red) link to [[least weasel]] - link actually goes directly to an edit page.
Having consistent behaviour for these three would be great.
This way if somebody links to "[[least weasel]]" mid-sentence (because
they don't realize it is our convention to capitalize the species name), the link would automatically point to [[Least Weasel]], whether
There are problems with this approach. Should the link be red or blue? What if the capitalisation is blatantly wrong? How should a link to [[OATS]] be treated? As a mis-capitalisation for "Oats", or as a non-existent acronym OATS?
It might be better to have the redlink return some search results by default, if they are highly pertinent (ie, extremely similar name), then show the "create a new page" screen.
a redirect existed or not. For lesser known species where no redirect
exists yet, this would eliminate the duplicate effort associated with accidentally creating a redundant article. Icing on the cake? Have the link automatically capitalize itself when the page is saved, to ensure correct typography in article space.
I really don't like the idea of respelling someone's edit without their knowledge. In my example above, they could really have meant to redlink to [[OATS]] and instead their wikitext is converted to [[Oats]] - very bad.
Steve
—C.W.
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: http://lists.wikimedia.org/mailman/listinfo/wikien-l