Pywikipedia-l October 2008

pywikipedia-l@lists.wikimedia.org

25 participants
195 discussions

SVN: [6037] trunk/pywikipedia/interwiki.py
by wikipedian＠svn.wikimedia.org 27 Oct '08

27 Oct '08

Revision: 6037 Author: wikipedian Date: 2008-10-27 20:51:31 +0000 (Mon, 27 Oct 2008) Log Message: ----------- applied patch [ 2192349 ] Faroese (fo) translations for interwiki.py Modified Paths: -------------- trunk/pywikipedia/interwiki.py Modified: trunk/pywikipedia/interwiki.py =================================================================== --- trunk/pywikipedia/interwiki.py 2008-10-27 20:45:51 UTC (rev 6036) +++ trunk/pywikipedia/interwiki.py 2008-10-27 20:51:31 UTC (rev 6037) @@ -170,7 +170,7 @@ you are sure you have first gotten the interwiki on the starting page exactly right). (note: without ending colon) - + -back only work on pages that have no backlink from any other language; if a backlink is found, all work on the page will be halted. (note: without ending colon) @@ -359,6 +359,7 @@ 'fa': (u'ربات ', u'افزودن', u'حذف', u'اصلاح'), 'fi': (u'Botti ', u'lisäsi', u'poisti', u'muokkasi'), 'fiu-vro': (u'robot ', u'manopandminõ', u'ärqvõtminõ', u'tävvendämine'), + 'fo': (u'bottur ', u'leggur aftrat', u'strikar', u'broytur'), 'fr': (u'robot ', u'Ajoute', u'Retire', u'Modifie'), 'frp': (u'robot ', u'Apond', u'Retire', u'Modifie'), 'fur': (u'Robot: ', u'o zonti', u'o cambii', u'o gjavi'),

1 0

SVN: [6036] branches/rewrite/
by russblau＠svn.wikimedia.org 27 Oct '08

27 Oct '08

Revision: 6036 Author: russblau Date: 2008-10-27 20:45:51 +0000 (Mon, 27 Oct 2008) Log Message: ----------- change setuptools to v 2.6 for ease of migration Property Changed: ---------------- branches/rewrite/ Property changes on: branches/rewrite ___________________________________________________________________ Modified: svn:externals - simplejson -r1021 http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7.2/ httplib2 -r261 http://httplib2.googlecode.com/svn/trunk/ setuptools -r59540 http://svn.python.org/projects/sandbox/trunk/setuptools/ + simplejson -r1021 http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7.2/ httplib2 -r261 http://httplib2.googlecode.com/svn/trunk/ setuptools -r67036 http://svn.python.org/projects/sandbox/branches/setuptools-0.6/

1 0

SVN: [6035] branches/rewrite/pywikibot
by russblau＠svn.wikimedia.org 27 Oct '08

27 Oct '08

Revision: 6035 Author: russblau Date: 2008-10-27 20:40:15 +0000 (Mon, 27 Oct 2008) Log Message: ----------- deletedrevs needs a page argument Modified Paths: -------------- branches/rewrite/pywikibot/site.py branches/rewrite/pywikibot/tests/site_tests.py Modified: branches/rewrite/pywikibot/site.py =================================================================== --- branches/rewrite/pywikibot/site.py 2008-10-27 20:22:38 UTC (rev 6034) +++ branches/rewrite/pywikibot/site.py 2008-10-27 20:40:15 UTC (rev 6035) @@ -1784,7 +1784,7 @@ wlgen.request["wlshow"] = "|".join(wlshow) return wlgen - def deletedrevs(self, start=None, end=None, reverse=None, limit=None, + def deletedrevs(self, page, start=None, end=None, reverse=None, limit=None, get_text=False): """Iterate deleted revisions. @@ -1794,6 +1794,7 @@ recentchanges (plus a 'content' element if requested). If get_text is true, the toplevel dict will contain a 'token' key as well. + @param page: The page to check for deleted revisions @param start: Iterate revisions starting at this timestamp @param end: Iterate revisions ending at this timestamp @param reverse: Iterate oldest revisions first (default: newest) @@ -1834,7 +1835,8 @@ % self.user()) drgen = api.ListGenerator("deletedrevs", site=self, - drprop="revid|user|comment|minor") + titles=page.title(withSection=False), + drprop="revid|user|comment|minor") if get_text: drgen.request['drprop'] = drgen.request['drprop'] + "|content|token" if start is not None: Modified: branches/rewrite/pywikibot/tests/site_tests.py =================================================================== --- branches/rewrite/pywikibot/tests/site_tests.py 2008-10-27 20:22:38 UTC (rev 6034) +++ branches/rewrite/pywikibot/tests/site_tests.py 2008-10-27 20:40:15 UTC (rev 6035) @@ -824,51 +824,54 @@ warnings.warn( "Cannot test Site.deleted_revs; no sysop account configured.") return - dr = list(mysite.deletedrevs(limit=10)) + dr = list(mysite.deletedrevs(limit=10, page=mainpage)) self.assertTrue(len(dr) <= 10) self.assertTrue(all(isinstance(rev, dict) for rev in dr)) - dr2 = list(mysite.deletedrevs(titles=mainpage.title(withSection=False), - limit=10)) + dr2 = list(mysite.deletedrevs(page=mainpage, limit=10)) self.assertTrue(len(dr2) <= 10) self.assertTrue(all(isinstance(rev, dict) for rev in dr2)) for rev in mysite.deletedrevs(start="2008-10-11T01:02:03Z", - limit=5): + page=mainpage, limit=5): self.assertType(rev, dict) self.assertTrue(rev['timestamp'] <= "2008-10-11T01:02:03Z") for rev in mysite.deletedrevs(end="2008-04-01T02:03:04Z", - limit=5): + page=mainpage, limit=5): self.assertType(rev, dict) self.assertTrue(rev['timestamp'] >= "2008-10-11T02:03:04Z") for rev in mysite.deletedrevs(start="2008-10-11T03:05:07Z", - limit=5, reverse=True): + page=mainpage, limit=5, + reverse=True): self.assertType(rev, dict) self.assertTrue(rev['timestamp'] >= "2008-10-11T03:05:07Z") for rev in mysite.deletedrevs(end="2008-10-11T04:06:08Z", - limit=5, reverse=True): + page=mainpage, limit=5, + reverse=True): self.assertType(rev, dict) self.assertTrue(rev['timestamp'] <= "2008-10-11T04:06:08Z") for rev in mysite.deletedrevs(start="2008-10-13T11:59:59Z", end="2008-10-13T00:00:01Z", - limit=5): + page=mainpage, limit=5): self.assertType(rev, dict) self.assertTrue("2008-10-13T00:00:01Z" <= rev['timestamp'] <= "2008-10-13T11:59:59Z") for rev in mysite.deletedrevs(start="2008-10-15T06:00:01Z", end="2008-10-15T23:59:59Z", - reverse=True, limit=5): + page=mainpage, reverse=True, + limit=5): self.assertType(rev, dict) self.assertTrue("2008-10-15T06:00:01Z" <= rev['timestamp'] <= "2008-10-15T23:59:59Z") # start earlier than end self.assertRaises(pywikibot.Error, mysite.deletedrevs, - start="2008-09-03T00:00:01Z", + page=mainpage, start="2008-09-03T00:00:01Z", end="2008-09-03T23:59:59Z", limit=5) # reverse: end earlier than start self.assertRaises(pywikibot.Error, mysite.deletedrevs, - start="2008-09-03T23:59:59Z", - end="2008-09-03T00:00:01Z", reverse=True, limit=5) + page=mainpage, start="2008-09-03T23:59:59Z", + end="2008-09-03T00:00:01Z", reverse=True, + limit=5) def testUsers(self): """Test the site.users() method"""

1 0

SVN: [6034] branches/rewrite/pywikibot/comms/http.py
by russblau＠svn.wikimedia.org 27 Oct '08

27 Oct '08

Revision: 6034 Author: russblau Date: 2008-10-27 20:22:38 +0000 (Mon, 27 Oct 2008) Log Message: ----------- set user-agent header Modified Paths: -------------- branches/rewrite/pywikibot/comms/http.py Modified: branches/rewrite/pywikibot/comms/http.py =================================================================== --- branches/rewrite/pywikibot/comms/http.py 2008-10-27 15:51:24 UTC (rev 6033) +++ branches/rewrite/pywikibot/comms/http.py 2008-10-27 20:22:38 UTC (rev 6034) @@ -90,6 +90,7 @@ baseuri = "%s://%s/" % (site.protocol(), site.hostname()) uri = urlparse.urljoin(baseuri, uri) + kwargs.setdefault("headers", {})['user-agent'] = useragent request = threadedhttp.HttpRequest(uri, *args, **kwargs) http_queue.put(request) request.lock.acquire()

1 0

SVN: [6033] trunk/pywikipedia/family.py
by purodha＠svn.wikimedia.org 27 Oct '08

27 Oct '08

Revision: 6033 Author: purodha Date: 2008-10-27 15:51:24 +0000 (Mon, 27 Oct 2008) Log Message: ----------- tl Namespace name update Modified Paths: -------------- trunk/pywikipedia/family.py Modified: trunk/pywikipedia/family.py =================================================================== --- trunk/pywikipedia/family.py 2008-10-27 14:07:30 UTC (rev 6032) +++ trunk/pywikipedia/family.py 2008-10-27 15:51:24 UTC (rev 6033) @@ -2007,7 +2007,7 @@ 'tet': u'Kategoria', 'tg': u'Гурӯҳ', 'th': u'หมวดหมู่', - 'tl':[u'Kauria', u'Kategorya'], + 'tl':[u'Kaurian', u'Kategorya'], 'tlh': u'Segh', 'tr': u'Kategori', 'tt': u'Törkem', @@ -2162,7 +2162,7 @@ 'tet': u'Diskusaun Kategoria', 'tg': u'Баҳси гурӯҳ', 'th': u'คุยเรื่องหมวดหมู่', - 'tl':[u'Usapang kauria', u'Usapang kategorya'], + 'tl':[u'Usapang kaurian', u'Usapang kategorya'], 'tlh': u"Segh ja'chuq", 'tr': u'Kategori tartışma', 'tt': u'Törkem bäxäse',

1 0

SVN: [6032] trunk/pywikipedia/checkimages.py
by filnik＠svn.wikimedia.org 27 Oct '08

27 Oct '08

Revision: 6032 Author: filnik Date: 2008-10-27 14:07:30 +0000 (Mon, 27 Oct 2008) Log Message: ----------- Making the loading of the categories even faster ^_^ and a little bugfix in the smartDetection Modified Paths: -------------- trunk/pywikipedia/checkimages.py Modified: trunk/pywikipedia/checkimages.py =================================================================== --- trunk/pywikipedia/checkimages.py 2008-10-26 18:10:52 UTC (rev 6031) +++ trunk/pywikipedia/checkimages.py 2008-10-27 14:07:30 UTC (rev 6032) @@ -496,37 +496,16 @@ wikipedia.output(u'No data found.') return False -def categoryElementsNumber(CatName): - #action=query&prop=categoryinfo&titles=Category:License_tags - """ - """ - params = { - 'action' :'query', - 'prop' :'categoryinfo', - 'titles' :CatName, - } - - data = query.GetData(params, - useAPI = True, encodeTitle = False) - pageid = data['query']['pages'].keys()[0] - elements = data['query']['pages'][pageid]['categoryinfo']['size'] - return elements - def categoryAllElements(CatName): #action=query&list=categorymembers&cmlimit=500&cmtitle=Category:License_tags """ + Category to load all the elements in a category. Limit: 5000 elements. """ wikipedia.output("Loading %s..." % CatName) - elements = int(categoryElementsNumber(CatName)) - elements += 20 # better to be sure that all the elements are loaded - if (elements - 20) > 5000: - raise wikipedia.Error(u'The category selected as more than 5.000 elements, limit reached') - elif elements > 5000: # if they are less then 5000, but for few elements - elements = 5000 params = { 'action' :'query', 'list' :'categorymembers', - 'cmlimit' :str(elements), + 'cmlimit' :'5000', 'cmtitle' :CatName, } @@ -534,6 +513,8 @@ useAPI = True, encodeTitle = False) members = data['query']['categorymembers'] + if len(members) == 5000: + raise wikipedia.Error(u'The category selected as >= 5.000 elements, limit reached.') allmembers = members results = list() for subcat in members: @@ -549,6 +530,9 @@ results.append(member) return results def categoryAllPageObjects(CatName): + """ + From a list of dictionaries, return a list of page objects. + """ final = list() for element in categoryAllElements(CatName): final.append(wikipedia.Page(wikipedia.getSite(), element['title'])) @@ -1132,7 +1116,8 @@ for templateReal in self.licenses_found: if self.convert_to_url(template_selected).lower().replace('template:', '') == \ self.convert_to_url(templateReal.title().lower().replace('template:', '')): - allLicenses.append(templateReal) + if templateReal not in allLicenses: # don't put the same template, twice. + allLicenses.append(templateReal) if self.licenses_found != []: for template in self.licenses_found: license_selected = template.title().replace('Template:', '')

1 0

[ pywikipediabot-Bugs-2200214 ] yo.wikibooks incorrectly listed as obsolete
by SourceForge.net 27 Oct '08

27 Oct '08

Bugs item #2200214, was opened at 2008-10-27 09:12 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2200214&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: General Group: v1.0 (example) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jesse Martin (pathoschild) Assigned to: Nobody/Anonymous (nobody) Summary: yo.wikibooks incorrectly listed as obsolete Initial Comment: Yo.wikibooks is listed as obsolete, but is open. This can be fixed by removing it from the "self.obsolete" dictionary in pywikipedia/families/wikibooks_family.py on line 318. Version at time of report: Pywikipedia [http] trunk/pywikipedia (r6019, Oct 25 2008, 16:16:12) Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)] ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2200214&group_…

1 0

[ pywikipediabot-Bugs-2193942 ] reading category: memory leak and slow down
by SourceForge.net 26 Oct '08

26 Oct '08

Bugs item #2193942, was opened at 2008-10-25 13:10 Message generated for change (Comment added) made by smalacarne You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2193942&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: category Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Simone Malacarne (smalacarne) Assigned to: Nobody/Anonymous (nobody) Summary: reading category: memory leak and slow down Initial Comment: I need to read a very big category (80.000+ articles). So i just do: site = wikipedia.getSite() cat = catlib.Category(site,'category name') gen = pagegenerators.PreloadingGenerator(cat.articles(), pageNumber=100) for page in gen: do_something problem is that the program start using more and more memory (at the end near 2giga ram). Even cpu time increase over time, if first 10.000 articles are processed in 10 min, second 10.000 double that time and so on... it takes about 20 hours to read all the articles. If i use: gen = pagegenerators.CategorizedPageGenerator(cat , recurse=False, start=u'') instead of PreloadingGenerator i dont have mem or cpu leaks but it's slow as hell to read and articles at the time (more than 24 hours to finish). Pywikipedia [http] trunk/pywikipedia (r6015, Oct 24 2008, 18:29:39) Python 2.5.2 (r252:60911, Oct 5 2008, 19:29:17) [GCC 4.3.2] ---------------------------------------------------------------------- Comment By: Simone Malacarne (smalacarne) Date: 2008-10-26 21:43 Message: I track the problem to catlib in the category._getContents function. The function cache something but with a lot of pages memory and cpu use is massive. I try to comment 2 lines in this part: else: print ('not Cached') for tag, page in self._parseCategory(purge, startFrom): if tag == ARTICLE: #self.articleCache.append(page) if not page in cache: #cache.append(page) yield ARTICLE, page and all is fine now, memory use is about 20/30mbyte fix and cpu occupation is normal. Don't know what that cache is used for but it caused me lot of trouble. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2193942&group_…

1 0

[ pywikipediabot-Bugs-2198717 ] Cannot read AllPages
by SourceForge.net 26 Oct '08

26 Oct '08

Bugs item #2198717, was opened at 2008-10-26 23:34 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2198717&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: General Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: maksim j (maksimpp) Assigned to: Nobody/Anonymous (nobody) Summary: Cannot read AllPages Initial Comment: Cannot read all categories from it.wikipedia nsp=14 start=u'' for page in mysite.allpages(start = start, namespace = nsp): wikipedia.output(page.title()) After Categoria:Progetto:Biografie/Tabella monitoraggio automatico - scrittura nc It get Categoria:Birmingham with infinite loop. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2198717&group_…

1 0

SVN: [6031] trunk/pywikipedia/archivebot.py
by misza13＠svn.wikimedia.org 26 Oct '08

26 Oct '08

Revision: 6031 Author: misza13 Date: 2008-10-26 18:10:52 +0000 (Sun, 26 Oct 2008) Log Message: ----------- adding support for week number in archive page names ('week' variable; first day of week is monday; varaible is an integer (specify as %(week)02d to pad with a zero)) Modified Paths: -------------- trunk/pywikipedia/archivebot.py Modified: trunk/pywikipedia/archivebot.py =================================================================== --- trunk/pywikipedia/archivebot.py 2008-10-26 17:52:05 UTC (rev 6030) +++ trunk/pywikipedia/archivebot.py 2008-10-26 18:10:52 UTC (rev 6031) @@ -430,6 +430,7 @@ 'month' : TStuple[1], 'monthname' : int2month(TStuple[1]), 'monthnameshort' : int2month_short(TStuple[1]), + 'week' : int(time.strftime('%W',TStuple)), } archive = archive % vars if self.feedArchive(archive,t,maxArchSize,vars):

1 0

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Pywikipedia-l October 2008