On Mon, Aug 25, 2008 at 2:50 AM, dantman@svn.wikimedia.org wrote:
Revision: 39938 Author: dantman Date: 2008-08-25 06:50:31 +0000 (Mon, 25 Aug 2008)
Log Message:
Revert 39936 and 39935; This 'fix' is merely a bad workaround and creates more issues rather than simply fixing. A) Part of the Title class is being /duplicated/ meaning more bugs are going to show up when someone improves stuff inside Title and doesn't know stuff is duplicated here. B) This change breaks cases as $wgCaptialLinks is now a per-namespace array, not a boolean.
No it isn't, that patch hasn't been committed. The correct function call (once it's committed) will be MWNamespace::isCapitalized( $index ). Fwiw: running a title through Title::newFromText() eventually runs it through $title->secureAndSplit(), which does all kinds of fun normalization (including forced first-letter capitalization). Personally, I'd rather see _more_ of the logic for first-letter-case-sensitivity go there, rather than have $wgCapitalLinks floating around everywhere.
As to the original bug: I say "invalid" myself. Trailing spaces and leading spaces are _always_ trimmed. Thoughts on this?
-Chad
The issue was on prefixes... "Test_" was showing things like "TestMan".
I noted a way to fix that on the bug's page. Append a single character to the title, and then strip it off once you have the key. Using '.' in this case.
substr( ...titleToKey($text.'.'), 0, -1 );
Because the . is appended to the title "test_" will be normalized to the db key "Test_." and then we strip off the . and end up with "Test_". It's basically a placeholder character saying "Hey, I'm sitting here representing the rest of the title... don't strip what's beside me!", then we get rid of it when done.
Unless you want to start making complex logic for TitlePrefixes that'll cause me to raise hell around here when I get back to the TitleRewrite project and start complaining that it has become nearly impossible to tweak the normalization process.
Hmmm... on that note, perhaps rather than my array list of functions, I should create a list based system for normalization... Then it can actually be outputted in the api in a format that other languages can make use of. (Though, there's nothing wrong with creating a api module to normalize a list of titles)
(text replace /[\s_]+/ with " ") (text rtrim) (dbkey setto text) (dbkey ltrim) (dbkey replace " " with "_")
^_^ Ok, ok... different syntax and overall idea... I just like to draft in lisp inspired syntaxes...
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of: -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --The ElectronicMe project (http://electronic-me.org) --Games-G.P.S. (http://ggps.org) -And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) --Animepedia (http://anime.wikia.com) --Narutopedia (http://naruto.wikia.com)
Chad wrote:
On Mon, Aug 25, 2008 at 2:50 AM, dantman@svn.wikimedia.org wrote:
Revision: 39938 Author: dantman Date: 2008-08-25 06:50:31 +0000 (Mon, 25 Aug 2008)
Log Message:
Revert 39936 and 39935; This 'fix' is merely a bad workaround and creates more issues rather than simply fixing. A) Part of the Title class is being /duplicated/ meaning more bugs are going to show up when someone improves stuff inside Title and doesn't know stuff is duplicated here. B) This change breaks cases as $wgCaptialLinks is now a per-namespace array, not a boolean.
No it isn't, that patch hasn't been committed. The correct function call (once it's committed) will be MWNamespace::isCapitalized( $index ). Fwiw: running a title through Title::newFromText() eventually runs it through $title->secureAndSplit(), which does all kinds of fun normalization (including forced first-letter capitalization). Personally, I'd rather see _more_ of the logic for first-letter-case-sensitivity go there, rather than have $wgCapitalLinks floating around everywhere.
As to the original bug: I say "invalid" myself. Trailing spaces and leading spaces are _always_ trimmed. Thoughts on this?
-Chad
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Friesen wrote:
The issue was on prefixes... "Test_" was showing things like "TestMan".
I noted a way to fix that on the bug's page. Append a single character to the title, and then strip it off once you have the key. Using '.' in this case.
substr( ...titleToKey($text.'.'), 0, -1 );
Because the . is appended to the title "test_" will be normalized to the db key "Test_." and then we strip off the . and end up with "Test_". It's basically a placeholder character saying "Hey, I'm sitting here representing the rest of the title... don't strip what's beside me!", then we get rid of it when done.
That feels a little icky to me. :)
What I might recommend is having a couple of steps to the normalization:
1) Normalization of partial titles
...may end with / or whitespace or otherwise not be quite 100% a valid title... for use in normalizing things to go into searches, prefix searches, etc.
2) Complete title normalization
Finish that off with right-side trims, enforce length limits, etc.
- -- brion
Brion Vibber schreef:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Friesen wrote:
The issue was on prefixes... "Test_" was showing things like "TestMan".
I noted a way to fix that on the bug's page. Append a single character to the title, and then strip it off once you have the key. Using '.' in this case.
substr( ...titleToKey($text.'.'), 0, -1 );
Because the . is appended to the title "test_" will be normalized to the db key "Test_." and then we strip off the . and end up with "Test_". It's basically a placeholder character saying "Hey, I'm sitting here representing the rest of the title... don't strip what's beside me!", then we get rid of it when done.
That feels a little icky to me. :)
What I might recommend is having a couple of steps to the normalization:
- Normalization of partial titles
...may end with / or whitespace or otherwise not be quite 100% a valid title... for use in normalizing things to go into searches, prefix searches, etc.
- Complete title normalization
Finish that off with right-side trims, enforce length limits, etc.
Of course Brion's solution is the cleanest one and the best one in the long term, but until someone has done that split I'm just gonna use the hack Daniel suggested (although I'll put it *inside* the titleToKey() function, not in the call).
Roan Kattouw (Catrope)
Why inside of the function? That goes back to the whole issue of titles not being normalized right. Someone asking for "...&title=foobar &..." is going to get the dbkey "Foobar_" which won't be valid for the page.
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of: -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --The ElectronicMe project (http://electronic-me.org) --Games-G.P.S. (http://ggps.org) -And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) --Animepedia (http://anime.wikia.com) --Narutopedia (http://naruto.wikia.com)
Roan Kattouw wrote:
Brion Vibber schreef:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Friesen wrote:
The issue was on prefixes... "Test_" was showing things like "TestMan".
I noted a way to fix that on the bug's page. Append a single character to the title, and then strip it off once you have the key. Using '.' in this case.
substr( ...titleToKey($text.'.'), 0, -1 );
Because the . is appended to the title "test_" will be normalized to the db key "Test_." and then we strip off the . and end up with "Test_". It's basically a placeholder character saying "Hey, I'm sitting here representing the rest of the title... don't strip what's beside me!", then we get rid of it when done.
That feels a little icky to me. :)
What I might recommend is having a couple of steps to the normalization:
- Normalization of partial titles
...may end with / or whitespace or otherwise not be quite 100% a valid title... for use in normalizing things to go into searches, prefix searches, etc.
- Complete title normalization
Finish that off with right-side trims, enforce length limits, etc.
Of course Brion's solution is the cleanest one and the best one in the long term, but until someone has done that split I'm just gonna use the hack Daniel suggested (although I'll put it *inside* the titleToKey() function, not in the call).
Roan Kattouw (Catrope)
Daniel Friesen schreef:
Why inside of the function? That goes back to the whole issue of titles not being normalized right. Someone asking for "...&title=foobar &..." is going to get the dbkey "Foobar_" which won't be valid for the page.
Yeah, I thought of that later too. Still, it's probably a good idea to have a separate function (like titlePartToKey() and keyToTitlePart() or something similar) that does all the substr() magic rather than duplicating it all over the place.
Roan Kattouw (Catrope)
Ya, that part would be a good idea. It would also make migrating to any new normalization system easier since you only need to change that one function, and don't need to worry about anyone calling titleToKey and keyToTitle when you have different code for prefixes.
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of: -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --The ElectronicMe project (http://electronic-me.org) --Games-G.P.S. (http://ggps.org) -And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) --Animepedia (http://anime.wikia.com) --Narutopedia (http://naruto.wikia.com)
Roan Kattouw wrote:
Daniel Friesen schreef:
Why inside of the function? That goes back to the whole issue of titles not being normalized right. Someone asking for "...&title=foobar &..." is going to get the dbkey "Foobar_" which won't be valid for the page.
Yeah, I thought of that later too. Still, it's probably a good idea to have a separate function (like titlePartToKey() and keyToTitlePart() or something similar) that does all the substr() magic rather than duplicating it all over the place.
Roan Kattouw (Catrope)
wikitech-l@lists.wikimedia.org