Revision: 6380 Author: nicdumz Date: 2009-02-20 02:37:42 +0000 (Fri, 20 Feb 2009)
Log Message: ----------- Modifying linksearch(): When looking a specific top level domain, e.g. "-weblink:*.yu", we were retrieving Linksearch/yu and Linksearch/*.yu ... :s Now we only retrieve the pages that the user asked for, i.e. urls matching *.yu
Question for code reviewer: Anyone knows why we feel the need, when user asks for -weblink:wikimedia.org, to provide him with page containing links to http://wikimedia.org AND every subsite http://*.wikimedia.org ??
Modified Paths: -------------- trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2009-02-19 13:58:20 UTC (rev 6379) +++ trunk/pywikipedia/wikipedia.py 2009-02-20 02:37:42 UTC (rev 6380) @@ -5660,12 +5660,14 @@
def linksearch(self, siteurl, limit=500): """Yield Pages from results of Special:Linksearch for 'siteurl'.""" - if siteurl.startswith('*.'): - siteurl = siteurl[2:] output(u'Querying [[Special:Linksearch]]...') cache = [] R = re.compile('title ?="([^<>]*?)">[^<>]*</a></li>') - for url in [siteurl, '*.' + siteurl]: + + urlsToRetrieve = [siteurl] + if not siteurl.startswith('*.'): + urlsToRetrieve.append('*.' + siteurl) + for url in urlsToRetrieve: offset = 0 while True: path = self.linksearch_address(url, limit=limit, offset=offset)