search=steven+tyler gets Steven_tyler

List overview All Threads
Download

newer

older

Re: [Wikitech-l] Stepping down as...

Autocreated accounts are now...

jidanni＠jidanni.org

13 May 2011 13 May '11

12:21 p.m.

http://en.wikipedia.org/wiki/Special:Search?search=steven+tyler gets http://en.wikipedia.org/wiki/Steven_tyler wouldn't be better to directly get http://en.wikipedia.org/wiki/Steven_Tyler even though one could say they are the same page?

Show replies by thread

Jay Ashworth

13 May 13 May

12:25 p.m.

----- Original Message -----

...

From: jidanni(a)jidanni.org

...

They're not the same page. Wikipedia page titles are case sensitive -- except that the first character is forced to upper case by the engine. Does that search not return both? Why would we have both? Cheers, -- jra -- Jay R. Ashworth Baylink jra(a)baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274

Carl (CBM)

12:34 p.m.

On Fri, May 13, 2011 at 12:25 AM, Jay Ashworth <jra(a)baylink.com> wrote:

...

They're not the same page. Wikipedia page titles are case sensitive -- except that the first character is forced to upper case by the engine. Does that search not return both? Why would we have both?

Like you said, the system is case sensitive. These redirects are created because the software doesn't handle case changes correctly otherwise. For example the following link leads to a "no such page" error because the appropriate redirect does not exist: http://en.wikipedia.org/wiki/Sterling_heights,_Michigan . It would be possible to code around this, so that the redirects would be simulated if they don't exist, but it hasn't happened. In practice, people like me like to type a title in all lower case, and so we have redirects to make it work. - Carl

M. Williamson

3:31 p.m.

I still don't think page titles should be case sensitive. Last time I asked how useful this really was, back in 2005 or so, I got a tersely-worded response that we need it to disambiguate certain pages. OK, but how many cases does that actually apply to? I would think that the increased usability from removing case sensitivity would far outweigh the benefit of natural disambiguation that only applies to a tiny minority of pages, and which could easily be replaced with disambiguation pages. 2011/5/12 Carl (CBM) <cbm.wikipedia(a)gmail.com>

...

On Fri, May 13, 2011 at 12:25 AM, Jay Ashworth <jra(a)baylink.com> wrote:

They're not the same page. Wikipedia page titles are case sensitive --

except

that the first character is forced to upper case by the engine. Does that search not return both? Why would we have both?

Andrew Dunbar

3:32 p.m.

On 13 May 2011 14:34, Carl (CBM) <cbm.wikipedia(a)gmail.com> wrote:

...

On Fri, May 13, 2011 at 12:25 AM, Jay Ashworth <jra(a)baylink.com> wrote:

They're not the same page. Wikipedia page titles are case sensitive -- except that the first character is forced to upper case by the engine. Does that search not return both? Why would we have both?

Indeed on the English Wiktionary we do have some JavaScript which runs when on a page which would be a redlink. It checks all casing combinations of: all lowercase, all uppercase, first letter uppercase and the rest lowercase. If one of those exists it automatically redirects after a couple of seconds. With the different nature of Wikipedia titles you would probably want to check sentence case and title case but would still miss quite a few where only proper nouns within the title are capitalized. And some people would probably hate such a feature too (-: Andrew Dunbar (hippietrail)

...

- Carl _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Jay Ashworth

3:39 p.m.

----- Original Message -----

...

From: "Carl (CBM)" <cbm.wikipedia(a)gmail.com>

...

On Fri, May 13, 2011 at 12:25 AM, Jay Ashworth <jra(a)baylink.com> wrote:

They're not the same page. Wikipedia page titles are case sensitive -- except that the first character is forced to upper case by the engine. Does that search not return both? Why would we have both?

Ah; one is a redirect. Sorry; I hadn't actually looked.

...

It would be possible to code around this, so that the redirects would be simulated if they don't exist, but it hasn't happened. In practice, people like me like to type a title in all lower case, and so we have redirects to make it work.

I seem to remember having been on the wrong end of a protracted version of that exact argument, a couple years ago on this very mailing list. :-) No, on reflection, it was actually "why aren't redirects handled at HTTP level". Let's not open that one again, though. Cheers, -- jra -- Jay R. Ashworth Baylink jra(a)baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274

Andrew Dunbar

3:42 p.m.

On 13 May 2011 17:31, M. Williamson <node.ue(a)gmail.com> wrote:

...

There has been talk from time to time over the years to add full "case folding" whereby page titles preserve a certain case of each letter but ignore such info for internal operations. A lot like the filesystem on Microsoft Windows. It would be a third setup option in MediaWiki alongside "case-sensitive" and "first-letter". But there's never been enough interest and it's never been important enough and no developer has ever stepped up. It would take a bit of work to implement. Andrew Dunbar (hippietrail)

...

2011/5/12 Carl (CBM) <cbm.wikipedia(a)gmail.com>

On Fri, May 13, 2011 at 12:25 AM, Jay Ashworth <jra(a)baylink.com> wrote:

They're not the same page. Wikipedia page titles are case sensitive --

except

that the first character is forced to upper case by the engine. Does that search not return both? Why would we have both?

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Jay Ashworth

3:42 p.m.

----- Original Message -----

...

From: "Andrew Dunbar" <hippytrail(a)gmail.com>

...

On 13 May 2011 14:34, Carl (CBM) <cbm.wikipedia(a)gmail.com> wrote:

Repeat after me: "Not all Mediawikiae are Wikipedia; Wikipedia is merely *the most important* customer of the project, not the only one". No, it would *not* be good to make the base package page-title-case-folding. Cheers, -- jra -- Jay R. Ashworth Baylink jra(a)baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274

Daniel Friesen

3:59 p.m.

On 11-05-13 12:42 AM, Andrew Dunbar wrote:

...

On 13 May 2011 17:31, M. Williamson <node.ue(a)gmail.com> wrote:

http://svn.wikimedia.org/viewvc/mediawiki/branches/titlerewrite/phase3/ Never finished it... but this topic is why I got commit access in the first place. ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name] -- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

David Gerard

4:03 p.m.

On 13 May 2011 08:42, Jay Ashworth <jra(a)baylink.com> wrote:

...

While your first sentence is the case, this says nothing about these other users. What is your actual basis for asserting that case-insensitivity is worse than case-sensitivity? What is your data? Where did your statement come from? Every MediaWiki I've ever had cause to set up would have benefited spectacularly from case-insensitivity, for example. - d.

Jay Ashworth

4:09 p.m.

----- Original Message -----

...

From: "David Gerard" <dgerard(a)gmail.com>

...

On 13 May 2011 08:42, Jay Ashworth <jra(a)baylink.com> wrote:

25 years experience in software design. Assuming case-folding is acceptable is imposing your assumptions about other people's use cases upon the software, and hence them. Not doing so, well, isn't.

...

Every MediaWiki I've ever had cause to set up would have benefited spectacularly from case-insensitivity, for example.

And having it available as an option's a great idea. But forcing it *absolutely* rules out some potential use-cases, which is rarely a good idea in software design; especially in the case of *tools* design, which is what Mediawiki is: it's a tool as much as it's an application. Ask someone who's tried to implement a parser that will deal with all of Wikipedia's templates if you don't believe me. :-) Cheers, -- jra -- Jay R. Ashworth Baylink jra(a)baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274

Paul Houle

9:06 p.m.

On 5/13/2011 3:31 AM, M. Williamson wrote:

...

Last time I looked there were about 10,000 pairs of pages in Wikipedia where there were two different pages with names that differ only by case. One example is http://en.wikipedia.org/wiki/Direct_Instruction http://en.wikipedia.org/wiki/Direct_instruction This is a particularly odd example because the two kinds of "direct instruction" are two concepts that are very close but not the same. Some people might argue for the merging of the two cases. You can find plenty of other types too, for instance, an all-caps acronym vs. an ordinary word.

Aryeh Gregor

11:48 p.m.

On Fri, May 13, 2011 at 3:31 AM, M. Williamson <node.ue(a)gmail.com> wrote:

...

From a software perspective, the way to do this would be to store a

canonicalized version of each page's title, and require that to be unique instead of the title itself. This would be nice because we could allow underscores in page titles, for instance, in addition to being able to do case-folding. Note that Unicode capitalization is locale-dependent, but case-folding is not. Thus we could use the same case-folding on all projects, including international projects like Commons. There's only one exception -- Turkish, with its dotless and dotted i's. But that's minor enough that we should be able to work around it without too much pain. Some projects, like probably all Wiktionaries, would doubtless not want case-folding at all, so we should support different canonicalization algorithms. Even the ones that don't want case-folding could still benefit from allowing underscores in titles. But all this would require a very intrusive rewrite. Assumptions like "replace spaces by underscores to get dbkey" are hardwired into MediaWiki all over the place, unfortunately. It's not clear that it's worth it, since there are downsides to case-folding too. It might make more sense to auto-generate redirects instead, which would be a much easier project that wouldn't have the downsides.

Daniel Friesen

14 May 14 May

6:58 a.m.

On 11-05-13 08:48 AM, Aryeh Gregor wrote:

...

On Fri, May 13, 2011 at 3:31 AM, M. Williamson <node.ue(a)gmail.com> wrote:

From a software perspective, the way to do this would be to store a canonicalized version of each page's title, and require that to be unique instead of the title itself. This would be nice because we could allow underscores in page titles, for instance, in addition to being able to do case-folding. Note that Unicode capitalization is locale-dependent, but case-folding is not. Thus we could use the same case-folding on all projects, including international projects like Commons. There's only one exception -- Turkish, with its dotless and dotted i's. But that's minor enough that we should be able to work around it without too much pain. Some projects, like probably all Wiktionaries, would doubtless not want case-folding at all, so we should support different canonicalization algorithms. Even the ones that don't want case-folding could still benefit from allowing underscores in titles. But all this would require a very intrusive rewrite. Assumptions like "replace spaces by underscores to get dbkey" are hardwired into MediaWiki all over the place, unfortunately. It's not clear that it's worth it, since there are downsides to case-folding too. It might make more sense to auto-generate redirects instead, which would be a much easier project that wouldn't have the downsides.

Fortunately I think most of the space/underscore switching done by code is actually isolated to a subset of Title and perhaps a few other core classes (probably ones like User and the filerepo stuff), most code should be using the title interface. When I tried that first rewrite I had more of an issue with the wide use of $user->getName() to test if two user objects were the same. -- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Happy-melon

7:27 a.m.

"Daniel Friesen" <lists(a)nadir-seen-fire.com> wrote in message news:4DCDB7AF.7020909@nadir-seen-fire.com...

...

http://toolserver.org/~krinkle/wikimedia-svn-search/do-live.php?path=%2Ftru… Sure, keep telling yourself that... :-P --HM

Daniel Friesen

7:57 a.m.

On 11-05-13 04:27 PM, Happy-melon wrote:

...

"Daniel Friesen" <lists(a)nadir-seen-fire.com> wrote in message news:4DCDB7AF.7020909@nadir-seen-fire.com...

http://toolserver.org/~krinkle/wikimedia-svn-search/do-live.php?path=%2Ftru… Sure, keep telling yourself that... :-P --HM

Doesn't look that bad... - Some arcane maintenance scripts. - Some .js that can't interact with Title working with urls. - The expected User, Title, Parser, file related, etc... core api stuff that's easy to tweak. - Some hardcoded stuff for namespaces which could be improved, but actually isn't all that applicable to what we're trying to fix. - Some special pages cleaning up inputs where we might want to provide something inside Title for that. And plenty of '_' matches that aren't actually relevant, things like the Database doing SQL Like escapes, Xml stuff replacing spaces with _ in tag names, string concatenation of '_', the installer doing some Title like replacements to clean up the sitename but not doing anything that would affect our working with page related stuff. -- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Andrew Dunbar

11:33 a.m.

On 14 May 2011 01:48, Aryeh Gregor <Simetrical+wikilist(a)gmail.com> wrote:

...

On Fri, May 13, 2011 at 3:31 AM, M. Williamson <node.ue(a)gmail.com> wrote:

I'm almost positive Azeri has the same dotless i issue and perhaps some of the other Turkic languages of Central Asia. One solution is to do accent/diacritic normalization too as part of the canonicalization. Andrew Dunbar (hippietrail)

...

Some projects, like probably all Wiktionaries, would doubtless not want case-folding at all, so we should support different canonicalization algorithms. Even the ones that don't want case-folding could still benefit from allowing underscores in titles. But all this would require a very intrusive rewrite. Assumptions like "replace spaces by underscores to get dbkey" are hardwired into MediaWiki all over the place, unfortunately. It's not clear that it's worth it, since there are downsides to case-folding too. It might make more sense to auto-generate redirects instead, which would be a much easier project that wouldn't have the downsides. _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

David Gerard

4 p.m.

On 14 May 2011 04:33, Andrew Dunbar <hippytrail(a)gmail.com> wrote:

...

On 14 May 2011 01:48, Aryeh Gregor <Simetrical+wikilist(a)gmail.com> wrote: > On Fri, May 13, 2011 at 3:31 AM, M. Williamson <node.ue(a)gmail.com> wrote:

...

>> I still don't think page titles should be case sensitive. Last time I asked >> how useful this really was, back in 2005 or so, I got a tersely-worded >> response that we need it to disambiguate certain pages. OK, but how many >> cases does that actually apply to? I would think that the increased >> usability from removing case sensitivity would far outweigh the benefit of >> natural disambiguation that only applies to a tiny minority of pages, and >> which could easily be replaced with disambiguation pages.

...

> From a software perspective, the way to do this would be to store a > canonicalized version of each page's title, and require that to be > unique instead of the title itself. This would be nice because we > could allow underscores in page titles, for instance, in addition to > being able to do case-folding. > Note that Unicode capitalization is locale-dependent, but case-folding > is not. Thus we could use the same case-folding on all projects, > including international projects like Commons. There's only one > exception -- Turkish, with its dotless and dotted i's. But that's > minor enough that we should be able to work around it without too much > pain.

...

This is getting into "nirvana fallacy" territory - we can't have case-folding until every edge case works? Instead, I would ask first: What does it take in English? Then work out from there. - d.

Andrew Garrett

4:25 p.m.

On Sat, May 14, 2011 at 6:00 PM, David Gerard <dgerard(a)gmail.com> wrote:

...

This is getting into "nirvana fallacy" territory - we can't have case-folding until every edge case works? Instead, I would ask first: What does it take in English? Then work out from there.

Nobody's saying it can't be done. I get the feeling we need to feel out what issues might come up, and plan for them. I think there was a Title class rewrite with pluggable canonicalisation of page titles somewhere, perhaps by Dan Friesen? It would probably need to be reimplemented from scratch though, I think I last heard of it a couple of years ago. -- Andrew Garrett Wikimedia Foundation agarrett(a)wikimedia.org

Niklas Laxström

4:46 p.m.

On 14 May 2011 06:33, Andrew Dunbar <hippytrail(a)gmail.com> wrote:

...

It's a good thing to think about these beforehand. But we already do enough mindless killing of diacritics. It doesn't work across all languages. In Finnish saa and sää are different words and ä is not a letter "a" with something added to it. -Niklas -- Niklas Laxström

David Gerard

4:52 p.m.

On 13 May 2011 09:09, Jay Ashworth <jra(a)baylink.com> wrote:

...

Ah, OK, we agree :-) I wouldn't make it the *default* unless and until it proved to be massively the actual desired use case. I think it'd be a great feature and I'd really like it for real-world use (intranet wikis in particular - it's amazing how many people who use a computer all day every day get away with claiming to be unable to use them). - d.

jidanni＠jidanni.org

6:33 p.m.

OK, then why can't http://en.wikipedia.org/wiki/Steven_tyler just do a browser redirect to http://en.wikipedia.org/wiki/Steven_Tyler As the URL you leave in the user's browser location bar is the one he will tell all his friends to use. You can't expect him to be smart enough to dig the canonical one we would rather he use out of all those links on the page, even if it is "right there linked to 'Article'". $ lynx -dump -listonly http://en.wikipedia.org/wiki/Steven_tyler |grep http://en.wikipedia.org/wiki/Steven_Tyler 737. http://en.wikipedia.org/wiki/Steven_Tyler 771. http://en.wikipedia.org/wiki/Steven_Tyler 774. http://en.wikipedia.org/wiki/Steven_Tyler

K. Peachey

6:37 p.m.

On Sat, May 14, 2011 at 8:33 PM, <jidanni(a)jidanni.org> wrote:

...

OK, then why can't http://en.wikipedia.org/wiki/Steven_tyler just do a browser redirect to http://en.wikipedia.org/wiki/Steven_Tyler

Because then we can't show the "(Redirected from X)" bar that accompanies the redirects

Andrew Dunbar

6:53 p.m.

On 14 May 2011 20:37, K. Peachey <p858snake(a)gmail.com> wrote:

...

On Sat, May 14, 2011 at 8:33 PM, <jidanni(a)jidanni.org> wrote:

OK, then why can't http://en.wikipedia.org/wiki/Steven_tyler just do a browser redirect to http://en.wikipedia.org/wiki/Steven_Tyler

Because then we can't show the "(Redirected from X)" bar that accompanies the redirects

The JavaScript we use on the English Wiktionary also makes a slightly different "(Automaticaly redirected from X)" bar, or something very similar. Andrew Dunbar (hippietrail)

...

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Jay Ashworth

11:42 p.m.

----- Original Message -----

...

From: "David Gerard" <dgerard(a)gmail.com>

...

This is getting into "nirvana fallacy" territory - we can't have case-folding until every edge case works? Instead, I would ask first: What does it take in English? Then work out from there.

You appear to be suggesting, David, that giving thought to the problem space before working up an architectural design for the solution is a *bad* thing. Especially when putting fundamentally changing code into a giant, widely used, architecture like Mediawiki. I'm sure you can't mean that. Cheers, -- jra -- Jay R. Ashworth Baylink jra(a)baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274

Jay Ashworth

11:44 p.m.

----- Original Message -----

...

From: "David Gerard" <dgerard(a)gmail.com>

...

On 13 May 2011 09:09, Jay Ashworth <jra(a)baylink.com> wrote:

Ah, OK, we agree :-)

Wait'll you see my next reply. :-)

...

I wouldn't make it the *default* unless and until it proved to be massively the actual desired use case. I think it'd be a great feature and I'd really like it for real-world use (intranet wikis in particular - it's amazing how many people who use a computer all day every day get away with claiming to be unable to use them).

Who the hell would be typing internal URLs into an Intranet wiki? Cheers, -- jra -- Jay R. Ashworth Baylink jra(a)baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274

Jay Ashworth

11:45 p.m.

----- Original Message -----

...

From: "K. Peachey" <p858snake(a)gmail.com>

...

On Sat, May 14, 2011 at 8:33 PM, <jidanni(a)jidanni.org> wrote: > OK, then why can't > http://en.wikipedia.org/wiki/Steven_tyler > just do a browser redirect to > http://en.wikipedia.org/wiki/Steven_Tyler

...

Because then we can't show the "(Redirected from X)" bar that accompanies the redirects

I *did* say we shouldn't restart this argument, right? :-) Cheers, -- jra -- Jay R. Ashworth Baylink jra(a)baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274

Aryeh Gregor

15 May 15 May

11:02 p.m.

On Fri, May 13, 2011 at 7:57 PM, Daniel Friesen <lists(a)nadir-seen-fire.com> wrote:

...

Except that there are who knows how many other places in the code that make such assumptions but aren't so easily found by searching. On Fri, May 13, 2011 at 11:33 PM, Andrew Dunbar <hippytrail(a)gmail.com> wrote:

...

The dotless-i issue affects "Turkic (Turkish/Azerbaijani)" text, according to <http://userguide.icu-project.org/transforms/casemappings>. This is a well-studied issue with existing standards, and we're not going to do better than the Unicode Consortium has come up with. You cannot fix the problem by doing accent/diacritic normalization. "i" and "I" are the same letter in English but different letters in Turkish. You cannot get around that. We'd need to have a separate case-folding algorithm for Turkish wikis, or make them use one that's incorrect for their language.

Marco Schuster

16 May 16 May

8:11 a.m.

On Sun, May 15, 2011 at 5:02 PM, Aryeh Gregor <Simetrical+wikilist(a)gmail.com> wrote:

...

You cannot fix the problem by doing accent/diacritic normalization. "i" and "I" are the same letter in English but different letters in Turkish. You cannot get around that. We'd need to have a separate case-folding algorithm for Turkish wikis, or make them use one that's incorrect for their language.

Actually non-Turkish/Azerbaijan wikis have this problem too, if the wiki has articles or redirects using these characters... Marco

4729

days inactive

4732

days old

wikitech-l@lists.wikimedia.org

Manage subscription

28 comments

14 participants

tags (0)

participants (14)

Andrew Dunbar
Andrew Garrett
Aryeh Gregor
Carl (CBM)
Daniel Friesen
David Gerard
Happy-melon
Jay Ashworth
jidanni＠jidanni.org
K. Peachey
M. Williamson
Marco Schuster
Niklas Laxström
Paul Houle