First, I show you my experiment. This script works on the below mentioned
category that possibly has a loop.
kat=u"Hungría"
site=pywikibot.getSite('es')
katpage=catlib.Category(site, kat)
katlista=catlib.unique(list(katpage.subcategories(recurse=True,
cacheResults=True))) *#1*
# katlista=katpage.subcategoriesList(recurse=True)#, cacheResults=True) *
#2*
This script with #1 working and #2 commented runs 12 minutes and gathers 655
categories.
With #2 working and #1 commented out runs very-very long and then exits with
the same Python (not pywiki) error as mentioned below -- maximum recursion
depth exceeded.
The only difference between them is that subcategoriesList has no *
cacheResults* parameter. I use subcategories instead with cacheResults, then
I make a unique list as subcategoriesList does.
Now, when I use the command *replace.py -catr:Hungría*, -catr in
pagenerators uses getCategoryGen, and getCategoryGen calls
CategorizedPageGenerator, and CategorizedPageGenerator calls
category.articles from catlib.py. Category.articles does have a parameter
called cacheResults just as subcategories does, but getCategoryGen and
getCategoryGen cannot handle it. So I think this should be hacked that -catr
can use category.articles with cache, and thus the generator could avoid
infinite loops in categories.
--
Bináris
2011/2/27 Bináris
wikiposta@gmail.com
> Hi folks!
>
> I create pages for Hungarian Wikipedia like
>
http://hu.wikipedia.org/wiki/Wikip%C3%A9dia:K%C3%A9rt_cikkek/fr,
>
http://hu.wikipedia.org/wiki/Wikip%C3%A9dia:K%C3%A9rt_cikkek/en etc. These collect
> Hungary-related articles from other Wikipedias that have no Hungarian
> interwiki. Either they must be supplied with an iw or they are a good idea
> to write new articles.
>
> First I collect all the pages with replace.py, then I upload them and
> process the list with a newly developed script which I will soon offer for
> Pywikipedia because it can be used in other Wikipedias.
>
> Itt successfully ran in en, fr, ro wikis but stopped in eswiki.
> My command:
> *replace.py -catr:Hungría . @ -lang:es -excepttext:"[[hu:"
> -savenew:magyarok.txt -always*
>
> The error message follows here. As far as I understand it comes from Python
> rather than pywiki, but could we somehow handle it?
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 167, in
> _getContentsNaive
> for item in page._getContentsNaive(recurse=True):
> File "C:\Program Files\Pywikipedia\catlib.py", line 164, in
> _getContentsNaive
> for tag, page in self._parseCategory(startFrom=startFrom):
> File "C:\Program Files\Pywikipedia\catlib.py", line 215, in
> _parseCategory
> data = query.GetData(params, self.site())
> File "C:\Program Files\Pywikipedia\query.py", line 132, in GetData
> jsontext = json.loads( jsontext )
> File "C:\Program Files\Pywikipedia\simplejson__init__.py", line 262, in
> loads
>
> return _default_decoder.decode(s)
> File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 251, in
> decode
>
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 268, in
> raw_de
> code
> obj, end = self._scanner.iterscan(s, **kw).next()
> File "C:\Program Files\Pywikipedia\simplejson\scanner.py", line 50, in
> itersca
> n
> rval, next_pos = action(m, context)
> File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 138, in
> JSONOb
> ject
> value, end = iterscan(s, idx=end, context=context).next()
> File "C:\Program Files\Pywikipedia\simplejson\scanner.py", line 50, in
> itersca
> n
> rval, next_pos = action(m, context)
> File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 138, in
> JSONOb
> ject
> value, end = iterscan(s, idx=end, context=context).next()
> File "C:\Program Files\Pywikipedia\simplejson\scanner.py", line 50, in
> itersca
> n
> rval, next_pos = action(m, context)
> File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 171, in
> JSONAr
> ray
> value, end = iterscan(s, idx=end, context=context).next()
> File "C:\Program Files\Pywikipedia\simplejson\scanner.py", line 50, in
> itersca
> n
> rval, next_pos = action(m, context)
> File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 138, in
> JSONOb
> ject
> value, end = iterscan(s, idx=end, context=context).next()
> File "C:\Program Files\Pywikipedia\simplejson\scanner.py", line 50, in
> itersca
> n
> rval, next_pos = action(m, context)
> File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 113, in
> JSONSt
> ring
> return scanstring(match.string, match.end(), encoding)
> File "C:\Program Files\Pywikipedia\simplejson\decoder.py", line 85, in
> scanstr
> ing
> if terminator == '"':
> RuntimeError: maximum recursion depth exceeded in cmp
> maximum recursion depth exceeded in cmp
> 935 titles were saved.
>
>