Bugs item #2269688, was opened at 2008-11-12 15:03
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2269688&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Yann Forget (yannforget)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unicode error with djvutext.py
Initial Comment:
on fr.wikisource:
python djvutext.py -index:Livre:Le_Th%C3%A9%C3%A2tre_de_la_R%C3%A9volution._Le_Quatorze_Juillet._Danton._Les_Loups.djvu -djvu:Le_quatorze_juillet_Danton_Les_loups.djvu -pages:375
Checked for running processes. 1 processes currently running, including the current process.
Traceback (most recent call last):
File "djvutext.py", line 249, in <module>
main()
File "djvutext.py", line 236, in main
wikipedia.output("uploading text from %s to %s" % (djvu, index_page) )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13: ordinal not in range(128)
python version.py
Pywikipedia [http] trunk/pywikipedia (r6084, Nov 11 2008, 21:51:31)
Python 2.5.2 (r252, Sep 13 2008, 22:55:01)
[GCC 4.1.2 (Gentoo 4.1.2 p1.1)]
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2269688&group_…
Feature Requests item #2269013, was opened at 2008-11-12 12:42
Message generated for change (Comment added) made by yannforget
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2269013&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Yann Forget (yannforget)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add options -cat and -file to fixing_redirects.py
Initial Comment:
Hello,
Please add the options -cat (changing all pages in a category) and -file (changing all pages listed in a file) to fixing_redirects.py
Thanks, Yann
----------------------------------------------------------------------
>Comment By: Yann Forget (yannforget)
Date: 2008-11-12 12:44
Message:
and also to movepages.py
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2269013&group_…
Feature Requests item #2269013, was opened at 2008-11-12 12:42
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2269013&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Yann Forget (yannforget)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add options -cat and -file to fixing_redirects.py
Initial Comment:
Hello,
Please add the options -cat (changing all pages in a category) and -file (changing all pages listed in a file) to fixing_redirects.py
Thanks, Yann
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2269013&group_…
Bugs item #2193942, was opened at 2008-10-25 11:10
Message generated for change (Comment added) made by sf-robot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2193942&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: category
Group: None
>Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Simone Malacarne (smalacarne)
Assigned to: Nobody/Anonymous (nobody)
Summary: reading category: memory leak and slow down
Initial Comment:
I need to read a very big category (80.000+ articles).
So i just do:
site = wikipedia.getSite()
cat = catlib.Category(site,'category name')
gen = pagegenerators.PreloadingGenerator(cat.articles(), pageNumber=100)
for page in gen:
do_something
problem is that the program start using more and more memory (at the end near 2giga ram). Even cpu time increase over time, if first 10.000 articles are processed in 10 min, second 10.000 double that time and so on... it takes about 20 hours to read all the articles.
If i use:
gen = pagegenerators.CategorizedPageGenerator(cat , recurse=False, start=u'')
instead of PreloadingGenerator i dont have mem or cpu leaks but it's slow as hell to read and articles at the time (more than 24 hours to finish).
Pywikipedia [http] trunk/pywikipedia (r6015, Oct 24 2008, 18:29:39)
Python 2.5.2 (r252:60911, Oct 5 2008, 19:29:17)
[GCC 4.3.2]
----------------------------------------------------------------------
>Comment By: SourceForge Robot (sf-robot)
Date: 2008-11-12 02:20
Message:
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).
----------------------------------------------------------------------
Comment By: NicDumZ Nicolas Dumazet (nicdumz)
Date: 2008-10-28 10:30
Message:
Well, guess what ? I have no idea why we would need to cache the content of
a category... I guess someone assumed users would iterate through a
category several times. Does anyone has a serious usage case of such a
behavior ? I might be wrong but I think that you can always serialize in
some way your code to avoid calling several times your generator function.
Since r6038, the default generator now uses a naive content getter which
does not cache anything.
----------------------------------------------------------------------
Comment By: Simone Malacarne (smalacarne)
Date: 2008-10-26 20:43
Message:
I track the problem to catlib in the category._getContents function.
The function cache something but with a lot of pages memory and cpu use is
massive.
I try to comment 2 lines in this part:
else:
print ('not Cached')
for tag, page in self._parseCategory(purge, startFrom):
if tag == ARTICLE:
#self.articleCache.append(page)
if not page in cache:
#cache.append(page)
yield ARTICLE, page
and all is fine now, memory use is about 20/30mbyte fix and cpu occupation
is normal.
Don't know what that cache is used for but it caused me lot of trouble.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2193942&group_…