[Pywikipedia-l] SVN: [6380] trunk/pywikipedia/wikipedia.py

nicdumz at svn.wikimedia.org nicdumz at svn.wikimedia.org
Fri Feb 20 02:37:42 UTC 2009


Revision: 6380
Author:   nicdumz
Date:     2009-02-20 02:37:42 +0000 (Fri, 20 Feb 2009)

Log Message:
-----------
Modifying linksearch():
When looking a specific top level domain, e.g. "-weblink:*.yu", we were retrieving Linksearch/yu and Linksearch/*.yu ... :s
 Now we only retrieve the pages that the user asked for, i.e. urls matching *.yu

Question for code reviewer: Anyone knows why we feel the need, when user asks for -weblink:wikimedia.org, to provide him with page containing links to http://wikimedia.org AND every subsite http://*.wikimedia.org ??

Modified Paths:
--------------
    trunk/pywikipedia/wikipedia.py

Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py	2009-02-19 13:58:20 UTC (rev 6379)
+++ trunk/pywikipedia/wikipedia.py	2009-02-20 02:37:42 UTC (rev 6380)
@@ -5660,12 +5660,14 @@
 
     def linksearch(self, siteurl, limit=500):
         """Yield Pages from results of Special:Linksearch for 'siteurl'."""
-        if siteurl.startswith('*.'):
-            siteurl = siteurl[2:]
         output(u'Querying [[Special:Linksearch]]...')
         cache = []
         R = re.compile('title ?=\"([^<>]*?)\">[^<>]*</a></li>')
-        for url in [siteurl, '*.' + siteurl]:
+
+        urlsToRetrieve = [siteurl]
+        if not siteurl.startswith('*.'):
+            urlsToRetrieve.append('*.' + siteurl)
+        for url in urlsToRetrieve:
             offset = 0
             while True:
                 path = self.linksearch_address(url, limit=limit, offset=offset)





More information about the Pywikipedia-l mailing list