In r36253 [1] I made some changes to the $linkTrail. Primarily I changed the use of a-z to use \p{L&} instead. From [2] \p{L&} is the Unicode class for [[:alpha:]] in other words, all alphabetical characters. This has fixed it so that link trails use all valid characters. As you'll see in [3] all the characters in Wikipedia's EditTools and a few more are now considered part of the linkTrail (Before 99% of those would not be part of the link)
But in addition to that I fixed an old complaint that things like [[Bar]]'s do not consider the 's as part of the link. (Don't worry, I made sure that things like ''[[Foo]]'' and '[[Foo]]' do not break)
TimStarling pointed out that some other languages have their own punctuation characters. I need some translator help compiling a list of foreign language punctuation characters used similarly to ' which should become part of the link when they come immediately after the closing ]]. I can't compile a list like this myself because I can't identify what foreign languages do with certain punctuation characters.
When that list is created, we can add it to a [] inside of the default $linkTrail. That way punctuation should be linked correctly for all languages no matter what locale is used. As TimStarling pointed out, any "ambiguity that can only be resolved ... at the language [level]" we can fix in individual languages. But for the most part, things should work for any language no matter what the current local is. After all, there's nothing wrong with having Japanese text in an English wiki, we do it all over Wikia and the Anime and Manga WikiProject on Wikipedia.
[1] http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=36253 [2] http://www.regular-expressions.info/posixbrackets.html#class [3] http://dev.wiki-tools.com/purge/Link_Codes
But in addition to that I fixed an old complaint that things like [[Bar]]'s do not consider the 's as part of the link. (Don't worry, I made sure that things like ''[[Foo]]'' and '[[Foo]]' do not break)
How about dashes?
______________________________ Danny B.
On Fri, Jun 13, 2008 at 12:55 AM, Danny B. Wikipedia.Danny.B@email.cz wrote:
How about dashes?
Em and en dashes should certainly not be included in the link. Hyphens probably shouldn't either. Remember that linktrail characters should be those that should *always* be included in the trail. Those that should only sometimes be included, on a case-by-case basis, can still be manually added with a pipe. A phrase like, say, "[[moth]]-eater" should certainly not become "[[moth|moth-eater]]".
This is a good development, but there are two major possible issues I see with this:
1) Languages that don't use spaces. This is probably not such a problem, since all such languages I know of use their own writing system, which can be specifically checked for. Make sure that links don't automatically cross a boundary between characters in a writing system that uses spaces and one from a writing system that does not. Of course, it should also not automatically include further characters within a writing system that doesn't use spaces.
2) Compound words. In English, a phrase like "moth-eater" is hyphenated; in other languages, it might be written as the equivalent of "motheater", for all I know. Some languages may go even further and use much more elaborate compound words: "agglutinative" or "polyagglutinative" languages. If this is correct, such languages should be exempted.
Note that this change might be a regression for some languages, even if they didn't previously use a custom link trail. Some languages might have deliberately refrained from including their own alphabet in the linktrail, keeping the English default so that it still worked for English. Such languages will now incorrectly see their own writing system become part of link trails.
As many languages probably have a use for link prefixes as have a use for link trails. They should probably work symmetrically unless it causes problems for a particular language.
On Fri, Jun 13, 2008 at 9:42 AM, Simetrical <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com> wrote:
On Fri, Jun 13, 2008 at 12:55 AM, Danny B. Wikipedia.Danny.B@email.cz wrote:
How about dashes?
Em and en dashes should certainly not be included in the link. Hyphens probably shouldn't either. Remember that linktrail characters should be those that should *always* be included in the trail. Those that should only sometimes be included, on a case-by-case basis, can still be manually added with a pipe. A phrase like, say, "[[moth]]-eater" should certainly not become "[[moth|moth-eater]]".
This is a good development, but there are two major possible issues I see with this:
- Languages that don't use spaces. This is probably not such a
problem, since all such languages I know of use their own writing system, which can be specifically checked for. Make sure that links don't automatically cross a boundary between characters in a writing system that uses spaces and one from a writing system that does not. Of course, it should also not automatically include further characters within a writing system that doesn't use spaces.
- Compound words. In English, a phrase like "moth-eater" is
hyphenated; in other languages, it might be written as the equivalent of "motheater", for all I know. Some languages may go even further and use much more elaborate compound words: "agglutinative" or "polyagglutinative" languages. If this is correct, such languages should be exempted.
Note that this change might be a regression for some languages, even if they didn't previously use a custom link trail. Some languages might have deliberately refrained from including their own alphabet in the linktrail, keeping the English default so that it still worked for English. Such languages will now incorrectly see their own writing system become part of link trails.
As many languages probably have a use for link prefixes as have a use for link trails. They should probably work symmetrically unless it causes problems for a particular language.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Any reason not to use $linkTrail = '/^(\p{L&}*'?\p{L&}+)(.*)$/usD'; to allow things like [[Verb]]ing's?
Also, as Simetrical stated, it could be very valuable, if somewhat more complicated, to use something like a $linkWord instead of a $linkTrail, to allow us to do really cool stuff like this:
$linkWord = '/^(.*?)(\p{L&}*)Something that signifies the initial link was here(\p{L&}*(?:(?<!'.+)')?\p{L&}+)(.*?)$/usD'; Essentially, any letters before the link are included, as are any letters after the link, allowing an apostrophe iff there is not another apostrophe before the link. This is probably slow, might not even work, and is in short a bad idea, but it would be nice to be able to do things like this.
On Fri, Jun 13, 2008 at 2:36 PM, Dan Collins en.wp.st47@gmail.com wrote:
Also, as Simetrical stated, it could be very valuable, if somewhat more complicated, to use something like a $linkWord instead of a $linkTrail, to allow us to do really cool stuff like this:
There's already link prefix functionality, last I checked, which is just disabled in the case of English. I've never looked closely at it, so I don't know if it works exactly like link trails or what.
Yes, I was considering using a bit more complex of a regex at some point to improve it.
Don't worry about other languages whatever the case, local overrides are still possible if a language is to fatally different. For the most part we should remove all the legacy $linkTrails except for those languages which actually have an issue, and add in $linkTrail = '/^()(.*)$/sD'; for any language which really needs to omit link trails.
~Daniel Friesen(Dantman) of: -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --Games-G.P.S. (http://ggps.org) -And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
Dan Collins wrote:
On Fri, Jun 13, 2008 at 9:42 AM, Simetrical <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com> wrote:
On Fri, Jun 13, 2008 at 12:55 AM, Danny B. Wikipedia.Danny.B@email.cz wrote:
How about dashes?
Em and en dashes should certainly not be included in the link. Hyphens probably shouldn't either. Remember that linktrail characters should be those that should *always* be included in the trail. Those that should only sometimes be included, on a case-by-case basis, can still be manually added with a pipe. A phrase like, say, "[[moth]]-eater" should certainly not become "[[moth|moth-eater]]".
This is a good development, but there are two major possible issues I see with this:
- Languages that don't use spaces. This is probably not such a
problem, since all such languages I know of use their own writing system, which can be specifically checked for. Make sure that links don't automatically cross a boundary between characters in a writing system that uses spaces and one from a writing system that does not. Of course, it should also not automatically include further characters within a writing system that doesn't use spaces.
- Compound words. In English, a phrase like "moth-eater" is
hyphenated; in other languages, it might be written as the equivalent of "motheater", for all I know. Some languages may go even further and use much more elaborate compound words: "agglutinative" or "polyagglutinative" languages. If this is correct, such languages should be exempted.
Note that this change might be a regression for some languages, even if they didn't previously use a custom link trail. Some languages might have deliberately refrained from including their own alphabet in the linktrail, keeping the English default so that it still worked for English. Such languages will now incorrectly see their own writing system become part of link trails.
As many languages probably have a use for link prefixes as have a use for link trails. They should probably work symmetrically unless it causes problems for a particular language.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Any reason not to use $linkTrail = '/^(\p{L&}*'?\p{L&}+)(.*)$/usD'; to allow things like [[Verb]]ing's?
Also, as Simetrical stated, it could be very valuable, if somewhat more complicated, to use something like a $linkWord instead of a $linkTrail, to allow us to do really cool stuff like this:
$linkWord = '/^(.*?)(\p{L&}*)Something that signifies the initial link was here(\p{L&}*(?:(?<!'.+)')?\p{L&}+)(.*?)$/usD'; Essentially, any letters before the link are included, as are any letters after the link, allowing an apostrophe iff there is not another apostrophe before the link. This is probably slow, might not even work, and is in short a bad idea, but it would be nice to be able to do things like this.
wikitech-l@lists.wikimedia.org