Bugs item #2986051, was opened at 2010-04-12 20:33 Message generated for change (Comment added) made by valhallasw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2986051...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: Rejected Priority: 5 Private: No Submitted By: Daniel Barrett (djbarrett) Assigned to: xqt (xqt) Summary: weblinkchecker should ignore URLs inside some tags, part 2
Initial Comment: This is a followup to [pywikipediabot-Bugs-1969051] "weblinkchecker should ignore URLs inside some tags"
The fix in pyrev:8076 by xqt is appreciated, but not an appropriate solution. The particular tag I listed in the ticket, "<sql>", was just an example. The fix by xqt simply hard-coded this example (bogus) tag into the Pywikipedia source code:
svn diff -c8076 http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia
A better fix would be to recognize when you are reading a tag attribute:
<AnyTagGoesHere ... attr='http://whatever%5C' ...>
{{AnyTemplateOrParserFunction | attr=http://whatever
and ignore the URL in these situations.
$ python version.py Pywikipedia [http] trunk/pywikipedia (r8050, 2010/04/01, 15:43:14) Python 2.4.3 (#1, Sep 3 2009, 15:37:37) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)]
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-04-13 08:28
Message: The <sql> tag is a non-standard tag, but is used by on the other bug reporters' wiki (as was clearly stated in his/hers bug report)
----------------------------------------------------------------------
Comment By: xqt (xqt) Date: 2010-04-13 08:13
Message: 3rd done in pyrev:8086
----------------------------------------------------------------------
Comment By: Daniel Barrett (djbarrett) Date: 2010-04-12 21:39
Message: I see your point. Three notes:
1. Can this be an OPTION for weblinkchecker?
2. If not, can you at least strip off the trailing single quotes (shown in bug 1969051) so you don't get broken URLs? Since single quotes are valid in tags but should not be part of the URL.
3. In any case, you should revert pyrev:8076 because there is no such tag as <sql>.
----------------------------------------------------------------------
Comment By: xqt (xqt) Date: 2010-04-12 21:26
Message: I do not agree. Since it is legal putting URLs into <ref /> tags as well as others like <noinclude> etc. or assigning URLs to a template field, this normally shouldn't be ignored by the weblinkchecker but checked if this URL is still valid.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw) Date: 2010-04-12 21:23
Message: I disagree. It is very well possible to have a sensible URL in a template (e.g. a reference). I'd suggest to only add 'exceptions', as has been done in r8076.
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2986051...
pywikipedia-bugs@lists.wikimedia.org