[Pywikipedia-l] SVN: [6380] trunk/pywikipedia/wikipedia.py
nicdumz at svn.wikimedia.org
nicdumz at svn.wikimedia.org
Fri Feb 20 02:37:42 UTC 2009
Revision: 6380
Author: nicdumz
Date: 2009-02-20 02:37:42 +0000 (Fri, 20 Feb 2009)
Log Message:
-----------
Modifying linksearch():
When looking a specific top level domain, e.g. "-weblink:*.yu", we were retrieving Linksearch/yu and Linksearch/*.yu ... :s
Now we only retrieve the pages that the user asked for, i.e. urls matching *.yu
Question for code reviewer: Anyone knows why we feel the need, when user asks for -weblink:wikimedia.org, to provide him with page containing links to http://wikimedia.org AND every subsite http://*.wikimedia.org ??
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2009-02-19 13:58:20 UTC (rev 6379)
+++ trunk/pywikipedia/wikipedia.py 2009-02-20 02:37:42 UTC (rev 6380)
@@ -5660,12 +5660,14 @@
def linksearch(self, siteurl, limit=500):
"""Yield Pages from results of Special:Linksearch for 'siteurl'."""
- if siteurl.startswith('*.'):
- siteurl = siteurl[2:]
output(u'Querying [[Special:Linksearch]]...')
cache = []
R = re.compile('title ?=\"([^<>]*?)\">[^<>]*</a></li>')
- for url in [siteurl, '*.' + siteurl]:
+
+ urlsToRetrieve = [siteurl]
+ if not siteurl.startswith('*.'):
+ urlsToRetrieve.append('*.' + siteurl)
+ for url in urlsToRetrieve:
offset = 0
while True:
path = self.linksearch_address(url, limit=limit, offset=offset)
More information about the Pywikipedia-l
mailing list