Hi All,
Thank you all for the great feedback! It is appreciated and welcome.
I'll combine all the responses related to this into one email, to make it easier to
read:
One caveat: quite a lot of the suggested links were of
random phrases
to pop-culture articles on e.g. album titles of that name. These are
not so useful for articles that aren't pop-culture.
Yes: Films, songs, albums, plays, and books are all culprits because sometimes artists do
enjoy misappropriating phrases from common
usage.
Here's a hit parade of some of the worst offenders, showing the links that would be
suggested for a completely unlinked bit of text:
"Look [[To The West|to the west]]! That guy [[In the Middle|in the middle]] [[In A
Car|in a car]], [[he is]] [[A Single|a single]].
[[Of Course|of course]] [[this time]] he [[Has Been|has been]] [[On the Cover|on the
cover]] of Vogue."
The bad news is that every single one of the those suggestions is completely useless, as
they're all about songs or albums or other
gumpf. The good news however is that they were all on the way out (and half of them had
already been eliminated based on people's
votes), but to help speed things along I've now given the rest of them the heave-ho.
So, with progressive use the annoying pop-culture ones should drop out.
Also: it took a reload to get the PHP pages to load.
Hmmm, I thought I was the only one seeing that, and only intermittently. I'm really
not sure why that's happening, although it
doesn't seem to be coming from my local PHP or Apache, as apache doesn't register
anything in the error logs or access logs when it
happens. My best guess is that either the DNS is slightly screwy, or that my ISP is doing
something with a proxy that's interfering
(they're big on transparent proxying, and originally I wasn't sending any headers
to prevent caching, and that error did happen
previously when I was setting it up, so _maybe_ something is caching that original
error).
I dread the misuse this tool will have... simply
because there will be
some who use it to link everything that has a matching article.. which
is clearly a bad idea..
Well, like everything else on the Wikipedia it requires some degree of judgement as to
what's appropriate to include in an article.
If people are daft about it, then those edits should be reverted, just like any other bad
edit.
Don't use pipe links for [[plural word]]s.
Sounds good, that's been added now, and it should transform any proposed link variant
of the form [[X|Y]], where X is a substring of
Y, and where X and Y contain an equal number of spaces, into [[X]]rest-of-Y.
Why not have "don't know" selected for
each item when the page loads - it
will make things clearer, and then you only need to change the ones you are
sure about.
Good idea, that's more explicit about what's going on. It has been changed to this
now.
First of all, it seems to do a string
search-and-replace for the *first* instance of each string. That's no
good: it tried replacing "The '''[[C (programming language)|C
programming language]]''' is a very widely used programming language"
with "The '''[[C ([[programming language]])|C programming
language]]''' is a very widely used programming language" rather than
"The '''[[C (programming language)|C programming
language]]''' is a
very widely used [[programming language]]".
Guilty as charged. :-)
When it looks for things to link (suggester.php), it was finding and suggesting this:
"The '''[[C (programming language)|C programming
language]]''' is a very widely used
programming language"
^^^^^^^^^^^^^^^^^^^
... however the bit that actually does the linking (post.php) was just using a
replace-first-instance approach, which would link on
this:
"The '''[[C (programming language)|C programming
language]]'''"
^^^^^^^^^^^^^^^^^^^^
It'll transmit the offset to start at now, which will avoid the above problem, and
should be fine as long as people don't start
getting into rapid conflicting-edit situations, involving articles with duplicated bits of
text which they said yes to link to, some
of which is enclosed in wiki syntax and some of which isn't. If this turns out to be a
problem the whole article can just be
reparsed, but I'll start with the simpler approach and revise if needed.
It suggested linking
"source code" to "[[Source Code|source code]]", for instance, rather
than just "[[source code]]".
Good catch, fixed now - Thank you.
One improvement would be to change
the edit summary since external links aren't rendered in edit
summaries anyway - either just show the link as plain text (instead of
wikitext) or make a page on the wiki where you can explain it since
internal links will work.
Good idea, and done - I've added a user subpage ([[:en:User:Nickj/Can We Link It]])
with the overview information on it, and so now
people can read about it and then get to the tool's page with 2 clicks from the edit
summary, rather than having to copy and paste
the URL.
Is the code for this available anywhere, and is it
available under the
GPL?
Yes and yes. It's at:
http://files.nickj.org/MediaWiki/suggest-links.zip
Couple of other smaller fixes/tweaks added additional to the above:
* Section linking should work now if it's unhappy about wiki syntax (i.e. links to
"page#Section", rather than
"page§ion=section")
* In the page's <title>, replaced the underscores with spaces.
* On the suggester output page, added a header that says "<h1>Link Suggestions
for:<a href='page_name'>page name</a></h1>"
* Added a link back to landing page when there are no suggestions, or the article name
given did not exist, or all suggestions were
rejected.
* Updated some of the tables columns from latin1 encoding to utf8 encoding to prevent a
MySQL "Illegal mix of collations" error from
occurring.
* Made syntax checking ignore seemingly mismatched ''' and ''
occurring on the same line, as this can be valid syntax. E.g.:
''France'''s, or ''''77'''
* Updated TcpQuery backend to be running the new 0.44 version.
All the best,
Nick.