Patches item #1799746, was opened at 2007-09-21 18:05
Message generated for change (Comment added) made by wikipedian
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1799746&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Pietrodn (pietrodn)
Assigned to: Nobody/Anonymous (nobody)
Summary: Added two namespaces for en.wikibooks in wikibooks_family.py
Initial Comment:
Hello, the bot gave me these warnings:
WARNING: Missing namespace in family file wikibooks: namespace['en'][112] (it is set to 'Subject')
WARNING: Missing namespace in family file wikibooks: namespace['en'][113] (it is set to 'Subject talk')
So I put the missing namespaces in wikibooks_family.py.
Here is the svn.diff patch file.
----------------------------------------------------------------------
>Comment By: Daniel Herding (wikipedian)
Date: 2007-09-24 11:04
Message:
Logged In: YES
user_id=880694
Originator: NO
Thanks.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1799746&group_…
Revision: 4350
Author: wikipedian
Date: 2007-09-24 09:04:17 +0000 (Mon, 24 Sep 2007)
Log Message:
-----------
applied patch [ 1799746 ] Added two namespaces for en.wikibooks in
wikibooks_family.py by Pietrodn - pietrodn
Modified Paths:
--------------
trunk/pywikipedia/families/wikibooks_family.py
Modified: trunk/pywikipedia/families/wikibooks_family.py
===================================================================
--- trunk/pywikipedia/families/wikibooks_family.py 2007-09-24 08:32:30 UTC (rev 4349)
+++ trunk/pywikipedia/families/wikibooks_family.py 2007-09-24 09:04:17 UTC (rev 4350)
@@ -217,6 +217,14 @@
self.namespaces[111] = {
'en': u'Wikijunior talk',
}
+
+ self.namespaces[112] = {
+ 'en': u'Subject',
+ }
+
+ self.namespaces[113] = {
+ 'en': u'Subject talk',
+ }
# Which languages have a special order for putting interlanguage links,
# and what order is it? If a language is not in interwiki_putfirst,
Feature Requests item #1775389, was opened at 2007-08-16 14:30
Message generated for change (Comment added) made by wikipedian
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1775389&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: restriction for replace.py
Initial Comment:
==Francais== texte d'origine
bonjour
je voudrais proposer un mode de restriction pour le module replace.py afin qu'il ignore tous caractères se trouvant entre [[ et ]] ou entre {{ et }}.
Ceci avec un paramètre en ligne de commande tel que -ignorelink:(YES|no), yes par défaut.
ceci aurait l'avantage de ne pas modifier les liens et les modèles
Merci
==English== translated text
Hello
I would like to propose a mode of limitation for the replace.py module so that he ignores any characters between [[ and ]] or {{ and }}.
This with an on-line parameter of command such as -ignorelink:(YES|no), yes by default.
This would have the advantage not to modify the links and the models.
Thank you
==Fin/End==
----------------------------------------------------------------------
>Comment By: Daniel Herding (wikipedian)
Date: 2007-09-24 10:36
Message:
Logged In: YES
user_id=880694
Originator: NO
You can now skip templates and links by adding this parameter:
-exceptinsidetag:template -exceptinsidetag:link
Note that -exceptinsidetag:link also skips occurences within categories,
interwikis, and images that are not in galleries.
To make sure all image links are skipped, use -exceptinsidetag:gallery
additionally. External links can be skipped with
-exceptinsidetag:hyperlink.
There is more stuff that can be excepted, see replaceExcept() in
wikipedia.py. Also, replace.py can now except pages because of their title
or a contained text part, and can except occurences not only because they
are inside templates, links etc., but also because they are inside a region
which matches a regular expression that you entered.
----------------------------------------------------------------------
Comment By: Annabel (annabel)
Date: 2007-08-16 18:28
Message:
Logged In: YES
user_id=1868962
Originator: NO
I agree to russblau. Good as an option, but it should not be the default
setting. In extension, it should also be good option not to allow text
replacements in external links and image links, but allowing text
replacements in ordinary wiki-links.
----------------------------------------------------------------------
Comment By: Russell Blau (russblau)
Date: 2007-08-16 14:51
Message:
Logged In: YES
user_id=855050
Originator: NO
I suppose this could have some value as an option, but the default should
be NO. Replacing links is one of the major uses of replace.py.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1775389&group_…
Patches item #1800470, was opened at 2007-09-23 08:33
Message generated for change (Comment added) made by wikipedian
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1800470&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: John Vandenberg (zeroj)
Assigned to: Nobody/Anonymous (nobody)
Summary: yahoo page generator
Initial Comment:
For those of us without a Google key to the Google Search API, a Yahoo search page generator can be useful.
Patch attached.
----------------------------------------------------------------------
>Comment By: Daniel Herding (wikipedian)
Date: 2007-09-24 10:26
Message:
Logged In: YES
user_id=880694
Originator: NO
Accepted
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1800470&group_…
Revision: 4348
Author: wikipedian
Date: 2007-09-24 08:26:13 +0000 (Mon, 24 Sep 2007)
Log Message:
-----------
Applied patch [ 1800470 ] yahoo page generator by John Vandenberg -
zeroj
I didn't test this because I don't have a Yahoo AppID.
Modified Paths:
--------------
trunk/pywikipedia/pagegenerators.py
Modified: trunk/pywikipedia/pagegenerators.py
===================================================================
--- trunk/pywikipedia/pagegenerators.py 2007-09-24 08:21:58 UTC (rev 4347)
+++ trunk/pywikipedia/pagegenerators.py 2007-09-24 08:26:13 UTC (rev 4348)
@@ -32,6 +32,10 @@
-filelinks Work on all pages that use a certain image/media file.
Argument can also be given as "-file:filename".
+-yahoo Work on all pages that are found in a Yahoo search.
+ Depends on python module pYsearch. See yahoo_appid in
+ config.py for instructions.
+
-google Work on all pages that are found in a Google search.
You need a Google Web API license key. Note that Google
doesn't give out license keys anymore. See google_key in
@@ -286,6 +290,35 @@
yield wikipedia.Page(site, pagenameofthelink)
offset += step
+class YahooSearchPageGenerator:
+ '''
+ To use this generator, install pYsearch
+ '''
+ def __init__(self, query = None, count = 100): # values larger than 100 fail
+ self.query = query or wikipedia.input(u'Please enter the search query:')
+ self.count = count;
+
+ def queryYahoo(self, query):
+ from yahoo.search.web import WebSearch
+ srch = WebSearch(config.yahoo_appid, query=query, results=self.count)
+
+ dom = srch.get_results()
+ results = srch.parse_results(dom)
+ for res in results:
+ url = res.Url
+ yield url
+
+ def __iter__(self):
+ site = wikipedia.getSite()
+ # restrict query to local site
+ localQuery = '%s site:%s' % (self.query, site.hostname())
+ base = 'http://%s%s' % (site.hostname(), site.nice_get_address(''))
+ for url in self.queryYahoo(localQuery):
+ if url[:len(base)] == base:
+ title = url[len(base):]
+ page = wikipedia.Page(site, title)
+ yield page
+
class GoogleSearchPageGenerator:
'''
To use this generator, you must install the pyGoogle module from
@@ -707,6 +740,12 @@
else:
googleQuery = arg[8:]
gen = GoogleSearchPageGenerator(googleQuery)
+ elif arg.startswith('-yahoo'):
+ if len(arg) == 7:
+ query = wikipedia.input(u'What do you want to search for?')
+ else:
+ query = arg[7:]
+ gen = YahooSearchPageGenerator(query)
else:
return None
# make sure all yielded pages are unique
Patches item #1800492, was opened at 2007-09-23 10:23
Message generated for change (Comment added) made by wikipedian
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1800492&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: John Vandenberg (zeroj)
Assigned to: Nobody/Anonymous (nobody)
Summary: uncategorised page generators
Initial Comment:
The current page generators are not exposed as command line options, and there isnt a page generator for [[Special:Uncategorizedimages]].
This patch provides both.
----------------------------------------------------------------------
>Comment By: Daniel Herding (wikipedian)
Date: 2007-09-24 10:22
Message:
Logged In: YES
user_id=880694
Originator: NO
Accepted, thanks.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1800492&group_…
Revision: 4347
Author: wikipedian
Date: 2007-09-24 08:21:58 +0000 (Mon, 24 Sep 2007)
Log Message:
-----------
applied patch [ 1800492 ] uncategorised page generators by John
Vandenberg - zeroj
"The current page generators are not exposed as command line options,
and
there isnt a page generator for [[Special:Uncategorizedimages]].
This patch provides both."
Modified Paths:
--------------
trunk/pywikipedia/family.py
trunk/pywikipedia/pagegenerators.py
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/family.py
===================================================================
--- trunk/pywikipedia/family.py 2007-09-24 08:19:23 UTC (rev 4346)
+++ trunk/pywikipedia/family.py 2007-09-24 08:21:58 UTC (rev 4347)
@@ -2617,6 +2617,9 @@
def uncategorizedcategories_address(self, code, limit=500):
return "%s?title=%s:Uncategorizedcategories&limit=%d" % (self.path(code), self.special_namespace_url(code), limit)
+ def uncategorizedimages_address(self, code, limit=500):
+ return "%s?title=%s:Uncategorizedimages&limit=%d" % (self.path(code), self.special_namespace_url(code), limit)
+
def uncategorizedpages_address(self, code, limit=500):
return "%s?title=%s:Uncategorizedpages&limit=%d" % (self.path(code), self.special_namespace_url(code), limit)
Modified: trunk/pywikipedia/pagegenerators.py
===================================================================
--- trunk/pywikipedia/pagegenerators.py 2007-09-24 08:19:23 UTC (rev 4346)
+++ trunk/pywikipedia/pagegenerators.py 2007-09-24 08:21:58 UTC (rev 4347)
@@ -19,6 +19,12 @@
-cat Work on all pages which are in a specific category.
Argument can also be given as "-cat:categoryname".
+-uncat Work on all pages which are not categorised.
+
+-uncatcat Work on all categories which are not categorised.
+
+-uncatfiles Work on all files which are not categorised.
+
-file Read a list of pages to treat from the named text file.
Page titles in the file must be enclosed with [[brackets]].
Argument can also be given as "-file:filename".
@@ -166,6 +172,18 @@
if page.title() >= start:
yield page
+def UnCategorizedCategoryGenerator(number = 100, repeat = False, site = None):
+ if site is None:
+ site = wikipedia.getSite()
+ for page in site.uncategorizedcategories(number=number, repeat=repeat):
+ yield page
+
+def UnCategorizedImageGenerator(number = 100, repeat = False, site = None):
+ if site is None:
+ site = wikipedia.getSite()
+ for page in site.uncategorizedimages(number=number, repeat=repeat):
+ yield page
+
def UnCategorizedPageGenerator(number = 100, repeat = False, site = None):
if site is None:
site = wikipedia.getSite()
@@ -635,6 +653,12 @@
gen = TextfilePageGenerator(textfilename)
elif arg.startswith('-cat'):
gen = self.setCategoryGen(arg, 4)
+ elif arg.startswith('-uncatfiles'):
+ gen = UnCategorizedImageGenerator()
+ elif arg.startswith('-uncatcat'):
+ gen = UnCategorizedCategoryGenerator()
+ elif arg.startswith('-uncat'):
+ gen = UnCategorizedPageGenerator()
elif arg.startswith('-subcat'):
gen = self.setCategoryGen(arg, 7, recurse = True)
elif arg.startswith('-ref'):
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2007-09-24 08:19:23 UTC (rev 4346)
+++ trunk/pywikipedia/wikipedia.py 2007-09-24 08:21:58 UTC (rev 4347)
@@ -69,6 +69,7 @@
lonelypages(): Special:Lonelypages
uncategorizedcategories(): Special:Uncategorizedcategories
uncategorizedpages(): Special:Uncategorizedpages
+ uncategorizedimages(): Special:Uncategorizedimages
unusedcategories(): Special:Unusuedcategories
Other functions:
@@ -3795,6 +3796,26 @@
if not repeat:
break
+ def uncategorizedimages(self, number = 10, repeat = False):
+ throttle = True
+ seen = set()
+ ns = self.image_namespace()
+ entryR = re.compile('<a href=".+?" title="(?P<title>%s:.+?)">.+?</a>' % ns)
+ while True:
+ path = self.uncategorizedimages_address(n=number)
+ get_throttle()
+ html = self.getUrl(path)
+ for m in entryR.finditer(html):
+ title = m.group('title')
+
+ if title not in seen:
+ seen.add(title)
+ page = Page(self, title)
+ yield page
+ if not repeat:
+ break
+
+
def uncategorizedpages(self, number = 10, repeat = False):
throttle = True
seen = set()
@@ -4166,6 +4187,9 @@
def uncategorizedcategories_address(self, n=500):
return self.family.uncategorizedcategories_address(self.lang, n)
+ def uncategorizedimages_address(self, n=500):
+ return self.family.uncategorizedimages_address(self.lang, n)
+
def uncategorizedpages_address(self, n=500):
return self.family.uncategorizedpages_address(self.lang, n)
Bugs item #1800873, was opened at 2007-09-24 07:23
Message generated for change (Comment added) made by wikipedian
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1800873&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: John Vandenberg (zeroj)
Assigned to: Nobody/Anonymous (nobody)
Summary: replace.py -excepttext
Initial Comment:
replace.py -excepttext does not work.
----------------------------------------------------------------------
>Comment By: Daniel Herding (wikipedian)
Date: 2007-09-24 10:19
Message:
Logged In: YES
user_id=880694
Originator: NO
Thanks for the note, a "continue" was missing. Fixed in SVN.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1800873&group_…