[Pywikipedia-l] [ pywikipediabot-Bugs-1842905 ] [patch] catlib _getContentsAndSupercats performance issue

SourceForge.net noreply at sourceforge.net
Sun Dec 2 20:39:27 UTC 2007


Bugs item #1842905, was opened at 2007-12-02 21:39
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1842905&group_id=93107

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: category
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pietro Battiston (toobaz)
Assigned to: Nobody/Anonymous (nobody)
Summary: [patch] catlib _getContentsAndSupercats performance issue

Initial Comment:
catlib.py's _getContentsAndSupercats method has a performance issue that in some cases can slow a lot the process of recursiverly downloading all pages or subcategories of a category.

See this example (chosen just because it's short to report, not because it's so pathological):

###########ipython output###############
In [1]: import catlib
Checked for running processes. 1 processes currently running, including the current process.

In [2]: len(catlib.Category('it', 'Categoria:Geometria descrittiva').articlesList(recurse=True))
Getting [[Categoria:Geometria descrittiva]]...
Getting [[Categoria:Coperture a volta]]...
Getting [[Categoria:Corrispondenza biunivoca (geometria descrittiva)]]...
Getting [[Categoria:Curve piane]]...
Getting [[Categoria:Curve tridimensionali]]...
Getting [[Categoria:Glossario (geometria descrittiva)]]...
Getting [[Categoria:Metodi di rappresentazione]]...
Getting [[Categoria:Modellazione geometrica]]...
Getting [[Categoria:Tassellazioni]]...
Getting [[Categoria:Poliedri]]...
Getting [[Categoria:Tassellazioni]]...
Getting [[Categoria:Problemi di misura]]...
Getting [[Categoria:Stub geometria descrittiva]]...
Getting [[Categoria:Superfici]]...
Getting [[Categoria:Sviluppo di solidi]]...
Getting [[Categoria:Tangenza]]...
Out[2]: 393
###########end ipython output###############

As you can see, [[Categoria:Tassellazioni]] is downloaded 2 times. But I can grant you that there are a lot of much worse cases.

Anyway, I'm attaching a patch. After the patch, here are the same commands:

###########ipython output###############
In [1]: import catlib
Checked for running processes. 1 processes currently running, including the current process.

In [2]: len(catlib.Category('it', 'Categoria:Geometria descrittiva').articlesList(recurse=True))
Getting [[Categoria:Geometria descrittiva]]...
Getting [[Categoria:Coperture a volta]]...
Getting [[Categoria:Corrispondenza biunivoca (geometria descrittiva)]]...
Getting [[Categoria:Curve piane]]...
Getting [[Categoria:Curve tridimensionali]]...
Getting [[Categoria:Glossario (geometria descrittiva)]]...
Getting [[Categoria:Metodi di rappresentazione]]...
Getting [[Categoria:Modellazione geometrica]]...
Getting [[Categoria:Tassellazioni]]...
Getting [[Categoria:Poliedri]]...
Getting [[Categoria:Problemi di misura]]...
Getting [[Categoria:Stub geometria descrittiva]]...
Getting [[Categoria:Superfici]]...
Getting [[Categoria:Sviluppo di solidi]]...
Getting [[Categoria:Tangenza]]...
Out[2]: 393
###########end ipython output###############

Notice this patch also solves the problem of eventual loops in categories: catlib won't loop.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1842905&group_id=93107



More information about the Pywikipedia-l mailing list