https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
Web browser: --- Bug ID: 55276 Summary: weblinkchecker should ignore URLs inside some tags, part 2 Product: Pywikibot Version: unspecified Hardware: All OS: All Status: ASSIGNED Severity: normal Priority: Unprioritized Component: General Assignee: Pywikipedia-bugs@lists.wikimedia.org Reporter: legoktm.wikipedia@gmail.com Classification: Unclassified Mobile Platform: ---
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1164/ Reported by: djbarrett Created on: 2010-04-12 18:33:11 Subject: weblinkchecker should ignore URLs inside some tags, part 2 Assigned to: xqt Original description: This is a followup to [pywikipediabot-Bugs-1969051] "weblinkchecker should ignore URLs inside some tags"
The fix in pyrev:8076 by xqt is appreciated, but not an appropriate solution. The particular tag I listed in the ticket, "<sql>", was just an example. The fix by xqt simply hard-coded this example (bogus) tag into the Pywikipedia source code:
svn diff -c8076 http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia
A better fix would be to recognize when you are reading a tag attribute:
<AnyTagGoesHere ... attr='http://whatever%5C' ...>
{{AnyTemplateOrParserFunction | attr=http://whatever
and ignore the URL in these situations.
$ python version.py Pywikipedia [http] trunk/pywikipedia (r8050, 2010/04/01, 15:43:14) Python 2.4.3 (#1, Sep 3 2009, 15:37:37) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)]
https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
--- Comment #1 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- I disagree. It is very well possible to have a sensible URL in a template (e.g. a reference). I'd suggest to only add 'exceptions', as has been done in r8076.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
--- Comment #2 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- I do not agree. Since it is legal putting URLs into <ref /> tags as well as others like <noinclude> etc. or assigning URLs to a template field, this normally shouldn't be ignored by the weblinkchecker but checked if this URL is still valid.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
--- Comment #3 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- - **status**: open --> open-rejected
https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
--- Comment #4 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- - **assigned_to**: nobody --> xqt - **status**: open-rejected --> pending-rejected
https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
--- Comment #5 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- I see your point. Three notes:
1. Can this be an OPTION for weblinkchecker?
2. If not, can you at least strip off the trailing single quotes (shown in bug 1969051) so you don't get broken URLs? Since single quotes are valid in tags but should not be part of the URL.
3. In any case, you should revert pyrev:8076 because there is no such tag as <sql>.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
--- Comment #6 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- - **status**: pending-rejected --> open-rejected
https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
--- Comment #7 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- 3rd done in pyrev:8086
https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
--- Comment #8 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- The <sql> tag is a non-standard tag, but is used by on the other bug reporters' wiki (as was clearly stated in his/hers bug report)
https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
--- Comment #9 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- valhallasw: Actually, I *am* the other bug reporter. :-) <sql> is a made-up tag for the example. We have 40 tags that exhibit the problem behavior.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://sourceforge.net/p/p | |ywikipediabot/bugs/1164
https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
xqt info@gno.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEW CC| |info@gno.de
https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
Ricordisamoa ricordisamoa@openmailbox.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |ricordisamoa@openmailbox.or | |g Component|General |weblinkchecker.py
pywikipedia-bugs@lists.wikimedia.org