On Fri, Jun 13, 2008 at 9:42 AM, Simetrical
<Simetrical+wikilist@gmail.com<Simetrical%2Bwikilist@gmail.com>>
wrote:
On Fri, Jun 13, 2008 at 12:55 AM, Danny B.
<Wikipedia.Danny.B(a)email.cz>
wrote:
How about dashes?
Em and en dashes should certainly not be included in the link.
Hyphens probably shouldn't either. Remember that linktrail characters
should be those that should *always* be included in the trail. Those
that should only sometimes be included, on a case-by-case basis, can
still be manually added with a pipe. A phrase like, say,
"[[moth]]-eater" should certainly not become "[[moth|moth-eater]]".
This is a good development, but there are two major possible issues I
see with this:
1) Languages that don't use spaces. This is probably not such a
problem, since all such languages I know of use their own writing
system, which can be specifically checked for. Make sure that links
don't automatically cross a boundary between characters in a writing
system that uses spaces and one from a writing system that does not.
Of course, it should also not automatically include further characters
within a writing system that doesn't use spaces.
2) Compound words. In English, a phrase like "moth-eater" is
hyphenated; in other languages, it might be written as the equivalent
of "motheater", for all I know. Some languages may go even further
and use much more elaborate compound words: "agglutinative" or
"polyagglutinative" languages. If this is correct, such languages
should be exempted.
Note that this change might be a regression for some languages, even
if they didn't previously use a custom link trail. Some languages
might have deliberately refrained from including their own alphabet in
the linktrail, keeping the English default so that it still worked for
English. Such languages will now incorrectly see their own writing
system become part of link trails.
As many languages probably have a use for link prefixes as have a use
for link trails. They should probably work symmetrically unless it
causes problems for a particular language.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Any reason not to use
$linkTrail = '/^(\p{L&}*\'?\p{L&}+)(.*)$/usD';
to allow things like
[[Verb]]ing's?
Also, as Simetrical stated, it could be very valuable, if somewhat more
complicated, to use something like a $linkWord instead of a $linkTrail, to
allow us to do really cool stuff like this:
$linkWord = '/^(.*?)(\p{L&}*)Something that signifies the initial link was
here(\p{L&}*\(?:(?<!\'.+)')?\p{L&}+)(.*?)$/usD';
Essentially, any letters before the link are included, as are any letters
after the link, allowing an apostrophe iff there is not another apostrophe
before the link. This is probably slow, might not even work, and is in short
a bad idea, but it would be nice to be able to do things like this.
--
DCollins/ST47
Administrator,
en.wikipedia.org
Channel Operator,
irc.freenode.net/#wikipedia
Maintainer, Perlwikipedia module