Pywikipedia-l October 2008

pywikipedia-l@lists.wikimedia.org

25 participants
195 discussions

[ pywikipediabot-Bugs-2136828 ] wiktionary_family.py - wrong sort order for fr.wikt.

by SourceForge.net

Bugs item #2136828, was opened at 2008-09-29 21:10 Message generated for change (Comment added) made by spacebirdy You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2136828&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: ulana merops (spacebirdy) Assigned to: Nobody/Anonymous (nobody) Summary: wiktionary_family.py - wrong sort order for fr.wikt. Initial Comment: Please see http://fr.wiktionary.org/wiki/Discussion_Wiktionnaire:Structure_des_article… and remove 'fr': self.alphabetic, in line 416 Syntax on fr.wikt: http://fr.wiktionary.org/wiki/Wiktionnaire:Structure_des_articles#Liens_int… I don't know who added this here but it seems wrong, thanks ---------------------------------------------------------------------- >Comment By: ulana merops (spacebirdy) Date: 2008-10-29 18:25 Message: There will be no response to that ever... I suggest to remove it, since I was notified by a fr.wikt user and this page http://fr.wiktionary.org/wiki/Wiktionnaire:Structure_des_articles#Liens_int… It is quite annoying that this issue is not solved by the fr.wikt community, I am unsure which sort order to use. Thanks for Your time. ---------------------------------------------------------------------- Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2008-10-20 04:16 Message: I really don't understand that request ; that line has always been here, and used as a convention for all pywikipedia bot running on fr.wikt I left a note, in French, on the project page. ---------------------------------------------------------------------- Comment By: ulana merops (spacebirdy) Date: 2008-10-11 14:02 Message: Please, I would like to update the bot normally without having to remove that line all the time, thanks in advance. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2136828&group_…

15 years, 6 months

SVN: [6041] branches/rewrite/pywikibot/data/api.py

by nicdumz＠svn.wikimedia.org

Revision: 6041 Author: nicdumz Date: 2008-10-29 04:24:40 +0000 (Wed, 29 Oct 2008) Log Message: ----------- yes, query-continue can return ints ... but sometimes also some non-ascii characters :) Modified Paths: -------------- branches/rewrite/pywikibot/data/api.py Modified: branches/rewrite/pywikibot/data/api.py =================================================================== --- branches/rewrite/pywikibot/data/api.py 2008-10-28 22:15:22 UTC (rev 6040) +++ branches/rewrite/pywikibot/data/api.py 2008-10-29 04:24:40 UTC (rev 6041) @@ -409,8 +409,11 @@ raise Error("Missing '%s' key in ['query-continue'] value." % self.module) update = self.data["query-continue"][self.module] - for key in update: - self.request[key] = str(update[key]) + for key, value in update.iteritems(): + # query-continue can return ints + if isinstance(value, int): + value = str(value) + self.request[key] = value def result(self, data): """Process result data as needed for particular subclass."""

15 years, 6 months

SVN: [6040] trunk/pywikipedia/commonsdelinker/image_replacer.py

by btongminh＠svn.wikimedia.org

Revision: 6040 Author: btongminh Date: 2008-10-28 22:15:22 +0000 (Tue, 28 Oct 2008) Log Message: ----------- Made disallowed_replacement regex case insensitive Modified Paths: -------------- trunk/pywikipedia/commonsdelinker/image_replacer.py Modified: trunk/pywikipedia/commonsdelinker/image_replacer.py =================================================================== --- trunk/pywikipedia/commonsdelinker/image_replacer.py 2008-10-28 12:51:49 UTC (rev 6039) +++ trunk/pywikipedia/commonsdelinker/image_replacer.py 2008-10-28 22:15:22 UTC (rev 6040) @@ -43,7 +43,7 @@ self.config.update(getattr(config, 'Replacer', ())) self.template = re.compile(r'\{\{%s\|([^|]*?)\|([^|]*?)(?:(?:\|reason\=(.*?))?)\}\}' % \ self.config['replace_template']) - self.disallowed_replacements = [(re.compile(i[0]), re.compile(i[1])) + self.disallowed_replacements = [(re.compile(i[0], re.I), re.compile(i[1], re.I)) for i in self.config.get('disallowed_replacements', ())] self.site = wikipedia.getSite(persistent_http = True)

15 years, 6 months

[ pywikipediabot-Bugs-2200214 ] yo.wikibooks incorrectly listed as obsolete

by SourceForge.net

Bugs item #2200214, was opened at 2008-10-27 09:12 Message generated for change (Comment added) made by pathoschild You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2200214&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: General Group: v1.0 (example) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jesse PW (pathoschild) Assigned to: Nobody/Anonymous (nobody) Summary: yo.wikibooks incorrectly listed as obsolete Initial Comment: Yo.wikibooks is listed as obsolete, but is open. This can be fixed by removing it from the "self.obsolete" dictionary in pywikipedia/families/wikibooks_family.py on line 318. Version at time of report: Pywikipedia [http] trunk/pywikipedia (r6019, Oct 25 2008, 16:16:12) Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)] ---------------------------------------------------------------------- >Comment By: Jesse PW (pathoschild) Date: 2008-10-28 15:33 Message: There was a discussion to close it, but it was never done. The message on the main page is the default for all new wikis. ---------------------------------------------------------------------- Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2008-10-28 11:08 Message: mmmm what about http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Yo… then ? And http://yo.wikibooks.org/wiki/Oj%C3%BAew%C3%A9_%C3%80k%E1%BB%8D%CC%81k%E1%BB… which reads : "This subdomain is reserved for the creation of a Wikibooks in the Yoruba language. There are currently 4 pages in this Wikibooks." ?? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2200214&group_…

15 years, 6 months

SVN: [6039] branches/rewrite/pywikibot/comms/http.py

by russblau＠svn.wikimedia.org

Revision: 6039 Author: russblau Date: 2008-10-28 12:51:49 +0000 (Tue, 28 Oct 2008) Log Message: ----------- use default user-agent string only if user hasn't set one Modified Paths: -------------- branches/rewrite/pywikibot/comms/http.py Modified: branches/rewrite/pywikibot/comms/http.py =================================================================== --- branches/rewrite/pywikibot/comms/http.py 2008-10-28 10:23:45 UTC (rev 6038) +++ branches/rewrite/pywikibot/comms/http.py 2008-10-28 12:51:49 UTC (rev 6039) @@ -90,7 +90,9 @@ baseuri = "%s://%s/" % (site.protocol(), site.hostname()) uri = urlparse.urljoin(baseuri, uri) - kwargs.setdefault("headers", {})['user-agent'] = useragent + # set default user-agent string + kwargs.setdefault("headers", {}) + kwargs["headers"].setdefault("user-agent", useragent) request = threadedhttp.HttpRequest(uri, *args, **kwargs) http_queue.put(request) request.lock.acquire()

15 years, 6 months

[ pywikipediabot-Bugs-2200214 ] yo.wikibooks incorrectly listed as obsolete

by SourceForge.net

Bugs item #2200214, was opened at 2008-10-27 10:12 Message generated for change (Comment added) made by nicdumz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2200214&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: General Group: v1.0 (example) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jesse PW (pathoschild) Assigned to: Nobody/Anonymous (nobody) Summary: yo.wikibooks incorrectly listed as obsolete Initial Comment: Yo.wikibooks is listed as obsolete, but is open. This can be fixed by removing it from the "self.obsolete" dictionary in pywikipedia/families/wikibooks_family.py on line 318. Version at time of report: Pywikipedia [http] trunk/pywikipedia (r6019, Oct 25 2008, 16:16:12) Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)] ---------------------------------------------------------------------- Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2008-10-28 12:08 Message: mmmm what about http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Yo… then ? And http://yo.wikibooks.org/wiki/Oj%C3%BAew%C3%A9_%C3%80k%E1%BB%8D%CC%81k%E1%BB… which reads : "This subdomain is reserved for the creation of a Wikibooks in the Yoruba language. There are currently 4 pages in this Wikibooks." ?? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2200214&group_…

15 years, 6 months

[ pywikipediabot-Bugs-2193942 ] reading category: memory leak and slow down

by SourceForge.net

Bugs item #2193942, was opened at 2008-10-25 13:10 Message generated for change (Settings changed) made by nicdumz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2193942&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: category Group: None >Status: Pending >Resolution: Fixed Priority: 5 Private: No Submitted By: Simone Malacarne (smalacarne) Assigned to: Nobody/Anonymous (nobody) Summary: reading category: memory leak and slow down Initial Comment: I need to read a very big category (80.000+ articles). So i just do: site = wikipedia.getSite() cat = catlib.Category(site,'category name') gen = pagegenerators.PreloadingGenerator(cat.articles(), pageNumber=100) for page in gen: do_something problem is that the program start using more and more memory (at the end near 2giga ram). Even cpu time increase over time, if first 10.000 articles are processed in 10 min, second 10.000 double that time and so on... it takes about 20 hours to read all the articles. If i use: gen = pagegenerators.CategorizedPageGenerator(cat , recurse=False, start=u'') instead of PreloadingGenerator i dont have mem or cpu leaks but it's slow as hell to read and articles at the time (more than 24 hours to finish). Pywikipedia [http] trunk/pywikipedia (r6015, Oct 24 2008, 18:29:39) Python 2.5.2 (r252:60911, Oct 5 2008, 19:29:17) [GCC 4.3.2] ---------------------------------------------------------------------- Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2008-10-28 11:30 Message: Well, guess what ? I have no idea why we would need to cache the content of a category... I guess someone assumed users would iterate through a category several times. Does anyone has a serious usage case of such a behavior ? I might be wrong but I think that you can always serialize in some way your code to avoid calling several times your generator function. Since r6038, the default generator now uses a naive content getter which does not cache anything. ---------------------------------------------------------------------- Comment By: Simone Malacarne (smalacarne) Date: 2008-10-26 21:43 Message: I track the problem to catlib in the category._getContents function. The function cache something but with a lot of pages memory and cpu use is massive. I try to comment 2 lines in this part: else: print ('not Cached') for tag, page in self._parseCategory(purge, startFrom): if tag == ARTICLE: #self.articleCache.append(page) if not page in cache: #cache.append(page) yield ARTICLE, page and all is fine now, memory use is about 20/30mbyte fix and cpu occupation is normal. Don't know what that cache is used for but it caused me lot of trouble. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2193942&group_…

15 years, 6 months

[ pywikipediabot-Bugs-2193942 ] reading category: memory leak and slow down

by SourceForge.net

Bugs item #2193942, was opened at 2008-10-25 13:10 Message generated for change (Comment added) made by nicdumz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2193942&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: category Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Simone Malacarne (smalacarne) Assigned to: Nobody/Anonymous (nobody) Summary: reading category: memory leak and slow down Initial Comment: I need to read a very big category (80.000+ articles). So i just do: site = wikipedia.getSite() cat = catlib.Category(site,'category name') gen = pagegenerators.PreloadingGenerator(cat.articles(), pageNumber=100) for page in gen: do_something problem is that the program start using more and more memory (at the end near 2giga ram). Even cpu time increase over time, if first 10.000 articles are processed in 10 min, second 10.000 double that time and so on... it takes about 20 hours to read all the articles. If i use: gen = pagegenerators.CategorizedPageGenerator(cat , recurse=False, start=u'') instead of PreloadingGenerator i dont have mem or cpu leaks but it's slow as hell to read and articles at the time (more than 24 hours to finish). Pywikipedia [http] trunk/pywikipedia (r6015, Oct 24 2008, 18:29:39) Python 2.5.2 (r252:60911, Oct 5 2008, 19:29:17) [GCC 4.3.2] ---------------------------------------------------------------------- Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2008-10-28 11:30 Message: Well, guess what ? I have no idea why we would need to cache the content of a category... I guess someone assumed users would iterate through a category several times. Does anyone has a serious usage case of such a behavior ? I might be wrong but I think that you can always serialize in some way your code to avoid calling several times your generator function. Since r6038, the default generator now uses a naive content getter which does not cache anything. ---------------------------------------------------------------------- Comment By: Simone Malacarne (smalacarne) Date: 2008-10-26 21:43 Message: I track the problem to catlib in the category._getContents function. The function cache something but with a lot of pages memory and cpu use is massive. I try to comment 2 lines in this part: else: print ('not Cached') for tag, page in self._parseCategory(purge, startFrom): if tag == ARTICLE: #self.articleCache.append(page) if not page in cache: #cache.append(page) yield ARTICLE, page and all is fine now, memory use is about 20/30mbyte fix and cpu occupation is normal. Don't know what that cache is used for but it caused me lot of trouble. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2193942&group_…

15 years, 6 months

SVN: [6038] trunk/pywikipedia/catlib.py

by nicdumz＠svn.wikimedia.org

Revision: 6038 Author: nicdumz Date: 2008-10-28 10:23:45 +0000 (Tue, 28 Oct 2008) Log Message: ----------- Fix for [2193942 ] reading category: memory leak and slow down : Let's not assume users want to crawl several times a category, and make the default behavior NON CACHING. Why would a user iterate several times on a category content anyway ? Modified Paths: -------------- trunk/pywikipedia/catlib.py Modified: trunk/pywikipedia/catlib.py =================================================================== --- trunk/pywikipedia/catlib.py 2008-10-27 20:51:31 UTC (rev 6037) +++ trunk/pywikipedia/catlib.py 2008-10-28 10:23:45 UTC (rev 6038) @@ -93,7 +93,7 @@ else: return '[[%s]]' % titleWithSortKey - def _getContents(self, recurse=False, purge=False, startFrom=None, cache=None): + def _getAndCacheContents(self, recurse=False, purge=False, startFrom=None, cache=None): """ Cache results of _parseCategory for a second call. @@ -129,7 +129,7 @@ # contents of subcategory are cached by calling # this method recursively; therefore, do not cache # them again - for item in subcat._getContents(newrecurse, purge, cache=cache): + for item in subcat._getAndCacheContents(newrecurse, purge, cache=cache): yield item else: for tag, page in self._parseCategory(purge, startFrom): @@ -147,11 +147,22 @@ # contents of subcategory are cached by calling # this method recursively; therefore, do not cache # them again - for item in page._getContents(newrecurse, purge, cache=cache): + for item in page._getAndCacheContents(newrecurse, purge, cache=cache): yield item if not startFrom: self.completelyCached = True + def _getContentsNaive(self, recurse=False, startFrom=None): + """ + Simple category content yielder. Naive, do not attempts to + cache anything + """ + for tag, page in self._parseCategory(startFrom=startFrom): + yield tag, page + if tag == SUBCATEGORY and recurse: + for item in page._getContentsNaive(recurse=True): + yield item + def _parseCategory(self, purge=False, startFrom=None): """ Yields all articles and subcategories that are in this category. @@ -259,7 +270,7 @@ else: break - def subcategories(self, recurse=False, startFrom=None): + def subcategories(self, recurse=False, startFrom=None, cacheResults=False): """ Yields all subcategories of the current category. @@ -269,9 +280,18 @@ equivalent to recurse = False, recurse = 1 gives first-level subcategories of subcategories but no deeper, etcetera). + cacheResults - cache the category contents: useful if you need to + do several passes on the category members list. The simple cache + system is *not* meant to be memory or cpu efficient for large + categories + Results a sorted (as sorted by MediaWiki), but need not be unique. """ - for tag, subcat in self._getContents(recurse, startFrom=startFrom): + if cacheResults: + gen = self._getAndCacheContents + else: + gen = self._getContentsNaive + for tag, subcat in gen(recurse=recurse, startFrom=startFrom): if tag == SUBCATEGORY: yield subcat @@ -289,7 +309,7 @@ subcats.append(cat) return unique(subcats) - def articles(self, recurse=False, startFrom=None): + def articles(self, recurse=False, startFrom=None, cacheResults=False): """ Yields all articles of the current category. @@ -297,10 +317,19 @@ Recurse can be a number to restrict the depth at which subcategories are included. + cacheResults - cache the category contents: useful if you need to + do several passes on the category members list. The simple cache + system is *not* meant to be memory or cpu efficient for large + categories + Results are unsorted (except as sorted by MediaWiki), and need not be unique. """ - for tag, page in self._getContents(recurse, startFrom=startFrom): + if cacheResults: + gen = self._getAndCacheContents + else: + gen = self._getContentsNaive + for tag, page in gen(recurse=recurse, startFrom=startFrom): if tag == ARTICLE: yield page @@ -342,7 +371,7 @@ def isEmpty(self): # TODO: rename; naming conflict with Page.isEmpty - for tag, title in self._getContents(purge = True): + for tag, title in self._parseCategory(): return False return True

15 years, 6 months

[ pywikipediabot-Patches-2192349 ] Faroese (fo) translations for interwiki.py

by SourceForge.net

Patches item #2192349, was opened at 2008-10-24 19:20 Message generated for change (Settings changed) made by wikipedian You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2192349&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Translations Group: None >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Christoffer (roboticuskhan) Assigned to: Nobody/Anonymous (nobody) Summary: Faroese (fo) translations for interwiki.py Initial Comment: I have submitted a patch file containing the Faroese translations for interwiki.py provided by User:Quackor (http://fo.wikipedia.org/wiki/Br%C3%BAkari:Quackor), a native speaker of Faroese. Thanks. ---------------------------------------------------------------------- Comment By: Christoffer (roboticuskhan) Date: 2008-10-24 19:23 Message: I forgot; I got the translations here: http://fo.wikipedia.org/wiki/Wikipedia_kjak:%C3%81heitan_um_bott_st%C3%B8%C… ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2192349&group_…

15 years, 6 months

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Pywikipedia-l October 2008