Hi folks!
I create pages for Hungarian Wikipedia like http://hu.wikipedia.org/wiki/Wikip%C3%A9dia:K%C3%A9rt_cikkek/fr, http://hu.wikipedia.org/wiki/Wikip%C3%A9dia:K%C3%A9rt_cikkek/en etc. These collect Hungary-related articles from other Wikipedias that have no Hungarian interwiki. Either they must be supplied with an iw or they are a good idea to write new articles.
First I collect all the pages with replace.py, then I upload them and process the list with a newly developed script which I will soon offer for Pywikipedia because it can be used in other Wikipedias.
Itt successfully ran in en, fr, ro wikis but stopped in eswiki. My command: *replace.py -catr:Hungría . @ -lang:es -excepttext:"[[hu:" -savenew:magyarok.txt -always*
The error message follows here. As far as I understand it comes from Python rather than pywiki, but could we somehow handle it? File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 167, in _getContentsNaive for item in page._getContentsNaive(recurse=True): File "C:\Program Files\Pywikipedia\catlib.py", line 164, in _getContentsNaive for tag, page in self._parseCategory(startFrom=startFrom): File "C:\Program Files\Pywikipedia\catlib.py", line 215, in _parseCategory data = query.GetData(params, self.site()) File "C:\Program Files\Pywikipedia\query.py", line 132, in GetData jsontext = json.loads( jsontext ) File "C:\Program Files\Pywikipedia\simplejson__init__.py", line 262, in loads
return _default_decoder.decode(s) File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 251, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 268, in raw_de code obj, end = self._scanner.iterscan(s, **kw).next() File "C:\Program Files\Pywikipedia\simplejson\scanner.py", line 50, in itersca n rval, next_pos = action(m, context) File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 138, in JSONOb ject value, end = iterscan(s, idx=end, context=context).next() File "C:\Program Files\Pywikipedia\simplejson\scanner.py", line 50, in itersca n rval, next_pos = action(m, context) File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 138, in JSONOb ject value, end = iterscan(s, idx=end, context=context).next() File "C:\Program Files\Pywikipedia\simplejson\scanner.py", line 50, in itersca n rval, next_pos = action(m, context) File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 171, in JSONAr ray value, end = iterscan(s, idx=end, context=context).next() File "C:\Program Files\Pywikipedia\simplejson\scanner.py", line 50, in itersca n rval, next_pos = action(m, context) File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 138, in JSONOb ject value, end = iterscan(s, idx=end, context=context).next() File "C:\Program Files\Pywikipedia\simplejson\scanner.py", line 50, in itersca n rval, next_pos = action(m, context) File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 113, in JSONSt ring return scanstring(match.string, match.end(), encoding) File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 85, in scanstr ing if terminator == '"': RuntimeError: maximum recursion depth exceeded in cmp maximum recursion depth exceeded in cmp 935 titles were saved.
I have only 256 MB RAM in my computer. May this cause the above problem? Is there any connection?
No, you just ran into a category loop. A is the main cat, B is a sub cat of A, so A>B>C>D>A is one type of example, this is fairly common in categories.
On Sun, Feb 27, 2011 at 7:50 AM, Bináris wikiposta@gmail.com wrote:
I have only 256 MB RAM in my computer. May this cause the above problem? Is there any connection?
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
On Sun, Feb 27, 2011 at 6:58 AM, John phoenixoverride@gmail.com wrote:
No, you just ran into a category loop. A is the main cat, B is a sub cat of A, so A>B>C>D>A is one type of example, this is fairly common in categories.
As a tip for future reference (e.g. in case someone googles this), I've had good success using Python's built in set() objects for handling this problem. They're fast and memory efficient. So my code goes something like:
seenCategories = set(); # then in my loop: if not cat.title() in seenCategories: seenCategories.add(cat.title()); # do something with this category
Cheers, Morten
2011/2/28 Morten Wang nettrom@gmail.com
seenCategories = set(); # then in my loop: if not cat.title() in seenCategories: seenCategories.add(cat.title()); # do something with this category
I think the "if" line is not necessary here because of definition of sets. Just add. I try to discover this part in pagegenerators. Either the script should detect if a category comes for processing the second time and tell him not to be so greedy, or it should detect the depth of recursion in an except RuntimeError: and step back one.
On Mon, Feb 28, 2011 at 10:50 AM, Bináris wikiposta@gmail.com wrote:
2011/2/28 Morten Wang nettrom@gmail.com
seenCategories = set(); # then in my loop: if not cat.title() in seenCategories: seenCategories.add(cat.title()); # do something with this category
I think the "if" line is not necessary here because of definition of sets. Just add.
The if-line is there to make sure that the category is not used/traversed more than once, which then makes sure that loops are never a problem, sorry if that was not obvious from my code snippet. So the code translates to "if this category has not been seen before, add the category title to the set of seen categories, and continue processing this category".
You could of course also do the opposite, skip the category if it's already in the set.
Cheers, Morten
Yes, my notice is half valid. The is necessary for the actions, not for storing. Anyway, not a big difference in efficiency. :-) I mean, adding to set works with a builtin "if" that is slightly faster than an interpreted if line, and anyway, it will be executed.
seenCategories.add(cat.title()); if not cat.title() in seenCategories: # do something with this category
I think I HAVE FOUND THE BUG! (See the original problem below in quotation.)
First, I show you my experiment. This script works on the below mentioned category that possibly has a loop.
kat=u"Hungría" site=pywikibot.getSite('es') katpage=catlib.Category(site, kat) katlista=catlib.unique(list(katpage.subcategories(recurse=True, cacheResults=True))) *#1* # katlista=katpage.subcategoriesList(recurse=True)#, cacheResults=True) * #2*
This script with #1 working and #2 commented runs 12 minutes and gathers 655 categories. With #2 working and #1 commented out runs very-very long and then exits with the same Python (not pywiki) error as mentioned below -- maximum recursion depth exceeded. The only difference between them is that subcategoriesList has no * cacheResults* parameter. I use subcategories instead with cacheResults, then I make a unique list as subcategoriesList does.
Now, when I use the command *replace.py -catr:Hungría*, -catr in pagenerators uses getCategoryGen, and getCategoryGen calls CategorizedPageGenerator, and CategorizedPageGenerator calls category.articles from catlib.py. Category.articles does have a parameter called cacheResults just as subcategories does, but getCategoryGen and getCategoryGen cannot handle it. So I think this should be hacked that -catr can use category.articles with cache, and thus the generator could avoid infinite loops in categories.
Errata:
2011/3/9 Bináris wikiposta@gmail.com
Category.articles does have a parameter called cacheResults just as subcategories does, but getCategoryGen and getCategoryGen cannot handle it.
...but getCategoryGen and *CategorizedPageGenerator* cannot handle it, of course.
pywikipedia-l@lists.wikimedia.org