[Pywikipedia-l] [ pywikipediabot-Bugs-1842905 ] [patch] catlib _getContentsAndSupercats performance issue
SourceForge.net
noreply at sourceforge.net
Tue Dec 4 02:14:03 UTC 2007
Bugs item #1842905, was opened at 2007-12-02 21:39
Message generated for change (Settings changed) made by toobaz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1842905&group_id=93107
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: category
Group: None
>Status: Deleted
Resolution: None
Priority: 5
Private: No
Submitted By: Pietro Battiston (toobaz)
Assigned to: Nobody/Anonymous (nobody)
Summary: [patch] catlib _getContentsAndSupercats performance issue
Initial Comment:
catlib.py's _getContentsAndSupercats method has a performance issue that in some cases can slow a lot the process of recursiverly downloading all pages or subcategories of a category.
See this example (chosen just because it's short to report, not because it's so pathological):
###########ipython output###############
In [1]: import catlib
Checked for running processes. 1 processes currently running, including the current process.
In [2]: len(catlib.Category('it', 'Categoria:Geometria descrittiva').articlesList(recurse=True))
Getting [[Categoria:Geometria descrittiva]]...
Getting [[Categoria:Coperture a volta]]...
Getting [[Categoria:Corrispondenza biunivoca (geometria descrittiva)]]...
Getting [[Categoria:Curve piane]]...
Getting [[Categoria:Curve tridimensionali]]...
Getting [[Categoria:Glossario (geometria descrittiva)]]...
Getting [[Categoria:Metodi di rappresentazione]]...
Getting [[Categoria:Modellazione geometrica]]...
Getting [[Categoria:Tassellazioni]]...
Getting [[Categoria:Poliedri]]...
Getting [[Categoria:Tassellazioni]]...
Getting [[Categoria:Problemi di misura]]...
Getting [[Categoria:Stub geometria descrittiva]]...
Getting [[Categoria:Superfici]]...
Getting [[Categoria:Sviluppo di solidi]]...
Getting [[Categoria:Tangenza]]...
Out[2]: 393
###########end ipython output###############
As you can see, [[Categoria:Tassellazioni]] is downloaded 2 times. But I can grant you that there are a lot of much worse cases.
Anyway, I'm attaching a patch. After the patch, here are the same commands:
###########ipython output###############
In [1]: import catlib
Checked for running processes. 1 processes currently running, including the current process.
In [2]: len(catlib.Category('it', 'Categoria:Geometria descrittiva').articlesList(recurse=True))
Getting [[Categoria:Geometria descrittiva]]...
Getting [[Categoria:Coperture a volta]]...
Getting [[Categoria:Corrispondenza biunivoca (geometria descrittiva)]]...
Getting [[Categoria:Curve piane]]...
Getting [[Categoria:Curve tridimensionali]]...
Getting [[Categoria:Glossario (geometria descrittiva)]]...
Getting [[Categoria:Metodi di rappresentazione]]...
Getting [[Categoria:Modellazione geometrica]]...
Getting [[Categoria:Tassellazioni]]...
Getting [[Categoria:Poliedri]]...
Getting [[Categoria:Problemi di misura]]...
Getting [[Categoria:Stub geometria descrittiva]]...
Getting [[Categoria:Superfici]]...
Getting [[Categoria:Sviluppo di solidi]]...
Getting [[Categoria:Tangenza]]...
Out[2]: 393
###########end ipython output###############
Notice this patch also solves the problem of eventual loops in categories: catlib won't loop.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1842905&group_id=93107
More information about the Pywikipedia-l
mailing list