https://bugzilla.wikimedia.org/show_bug.cgi?id=55276
Web browser: --- Bug ID: 55276 Summary: weblinkchecker should ignore URLs inside some tags, part 2 Product: Pywikibot Version: unspecified Hardware: All OS: All Status: ASSIGNED Severity: normal Priority: Unprioritized Component: General Assignee: Pywikipedia-bugs@lists.wikimedia.org Reporter: legoktm.wikipedia@gmail.com Classification: Unclassified Mobile Platform: ---
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1164/ Reported by: djbarrett Created on: 2010-04-12 18:33:11 Subject: weblinkchecker should ignore URLs inside some tags, part 2 Assigned to: xqt Original description: This is a followup to [pywikipediabot-Bugs-1969051] "weblinkchecker should ignore URLs inside some tags"
The fix in pyrev:8076 by xqt is appreciated, but not an appropriate solution. The particular tag I listed in the ticket, "<sql>", was just an example. The fix by xqt simply hard-coded this example (bogus) tag into the Pywikipedia source code:
svn diff -c8076 http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia
A better fix would be to recognize when you are reading a tag attribute:
<AnyTagGoesHere ... attr='http://whatever%5C' ...>
{{AnyTemplateOrParserFunction | attr=http://whatever
and ignore the URL in these situations.
$ python version.py Pywikipedia [http] trunk/pywikipedia (r8050, 2010/04/01, 15:43:14) Python 2.4.3 (#1, Sep 3 2009, 15:37:37) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)]