Bugs item #2136828, was opened at 2008-09-29 21:10
Message generated for change (Comment added) made by spacebirdy
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2136828&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: ulana merops (spacebirdy)
Assigned to: Nobody/Anonymous (nobody)
Summary: wiktionary_family.py - wrong sort order for fr.wikt.
Initial Comment:
Please see http://fr.wiktionary.org/wiki/Discussion_Wiktionnaire:Structure_des_article…
and remove 'fr': self.alphabetic,
in line 416
Syntax on fr.wikt:
http://fr.wiktionary.org/wiki/Wiktionnaire:Structure_des_articles#Liens_int…
I don't know who added this here but it seems wrong, thanks
----------------------------------------------------------------------
>Comment By: ulana merops (spacebirdy)
Date: 2008-10-11 14:02
Message:
Please, I would like to update the bot normally without having to remove
that line all the time, thanks in advance.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2136828&group_…
Bugs item #2158249, was opened at 2008-10-10 22:01
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2158249&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: other
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: weblinkchecker.py doesn't report archive.org links anymore
Initial Comment:
Weblinkchecker does not report archive.org links anymore. On my run on Sept 26, it still reported the archive links, on Oct 3 weblinkchecker reported not a single (from several hundred dead links on that run).
For example http://web.archive.org/web/*/http://www.gruene-muenchen.de/landesverband.64… is available, but is no reported on http://de.wikipedia.org/wiki/Diskussion:Theresa_Schopper
During the run weblinkchecker gives the output:
Consulting the Internet Archive for http://www.gruene-muenchen.de/landesverband.6417.0.html
python version.py
Pywikipedia [http] trunk/pywikipedia (r5945, Oct 10 2008, 11:16:07)
Python 2.5.2 (r252:60911, Oct 5 2008, 19:24:49)
[GCC 4.3.2]
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2158249&group_…
Bugs item #2158228, was opened at 2008-10-10 21:48
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2158228&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: other
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: weblinkchecker.py doesn't report archive.org links anymore
Initial Comment:
Weblinkchecker does not report archive.org links anymore. On my run on Sept 26, it still reported the archive links, on Oct 3 weblinkchecker reported not a single (from several hundred dead links on that run).
For example http://web.archive.org/web/*/http://www.gruene-muenchen.de/landesverband.64… is available, but is no reported on http://de.wikipedia.org/wiki/Diskussion:Theresa_Schopper
During the run weblinkchecker gives the output:
Consulting the Internet Archive for http://www.gruene-muenchen.de/landesverband.6417.0.html
python version.py
Pywikipedia [http] trunk/pywikipedia (r5945, Oct 10 2008, 11:16:07)
Python 2.5.2 (r252:60911, Oct 5 2008, 19:24:49)
[GCC 4.3.2]
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2158228&group_…
Since I actually got a request for information about the rewrite project(!),
here's a summary of where things stand and what other developers can help
with.
For those who aren't aware, the goal of the rewrite branch is to convert the
entire bot framework to use the MediaWiki API instead of screen-scraping for
both reading from and writing to a wiki. Generally, the changes are to be
"behind the scenes," with the goal of maintaining backwards-compatibility
with the old framework as much as possible. Nonetheless, we are taking this
opportunity to clean up some warts in the old framework and add some new
capabilities, so old code won't "just run" without some conversion effort.
Why bother? Because the API is faster and more reliable than
screen-scraping, and we won't have to spend hours hunting and fixing bugs
every time the MediaWiki developers decide to change an HTML tag somewhere
in their page design. As Brion Vibber said, "Screen-scraping
constantly-changing UI is like repeatedly banging
yourself in the head with a bowling ball. It's painful and doesn't
accomplish much, but it feels SO GOOD when you stop!"
http://lists.wikimedia.org/pipermail/wikitech-l/2008-August/039076.html And
he's made it very clear that changes to the UI will be made regardless of
what effect they may have on bots.
* Where we stand
First of all, the code in the rewrite branch actually works; you can check
it out from SVN, run it, and experiment with it on the wiki of your choice.
Not all the functionality of the current framework has been replicated yet,
but you can instantiate a Site or a Page, get the page text, save the page,
and so forth. See the file 'README-conversion.txt' for a brief rundown of
how to convert from the old syntax to the new. You will need to create a
new user-config.py for the new framework, and tuck it away in a different
directory than the one you use for the old framework. (Preferably, this
should be ~/.pywikibot for Unix and similar systems, and C:\Documents and
Settings\USERNAME\Application Settings\pywikibot for Windows systems.) Set
the environment key PYWIKIBOT2_DIR to the name of this directory.
The design of the framework is based on the following layers:
- Communications (http request handling)
- Data (forming API requests and parsing the responses)
- Wiki (objects representing contents of a wiki, including Sites and Pages)
- Bot (the application programs)
Generally, each layer should only interact with the ones immediately above
and below it (although in practice there are a few exceptions).
Recently I have been working on testing the Site object's methods; this has
been exceedingly tedious but very useful, as it has uncovered a number of
bugs. I am hoping to complete this phase soon, as I find the time, then
move on to the Page object and its subclasses.
* How others can help
1. Test the new framework, and report (or, even better, fix) any bugs or
unclear documentation you find.
2. Develop and run unit tests for the Page object and its subclasses.
3. Port existing functions and methods that manipulate wiki text and return
a new text (from wikipedia.py, catlib.py, and so forth) into a new
textlib.py module.
4. Help identify any exceptions to backwards-compatibility, and if
appropriate add a new function/method to map the old framework's code to the
new one.
5. Start writing a new Bot class that can be subclassed by developers for
their bots; this should at a minimum provide the capabilities now in
wikipedia.handleArgs(), including help functionality, and the
pagegenerators.py module.
6. Identify what's missing from this list! ;)
Thanks in advance to anyone who pitches in on this project. And don't
hesitate to bother me with questions!
Russ Blau
Revision: 5954
Author: filnik
Date: 2008-10-10 17:30:54 +0000 (Fri, 10 Oct 2008)
Log Message:
-----------
Minibugfix (?)
Modified Paths:
--------------
trunk/pywikipedia/checkimages.py
Modified: trunk/pywikipedia/checkimages.py
===================================================================
--- trunk/pywikipedia/checkimages.py 2008-10-10 16:48:56 UTC (rev 5953)
+++ trunk/pywikipedia/checkimages.py 2008-10-10 17:30:54 UTC (rev 5954)
@@ -1054,22 +1054,21 @@
except wikipedia.BadTitle:
# Template with wrong name, no need to report, simply skip
continue
- else:
- if template in self.list_licenses: # the list_licenses are loaded in the __init__ (not to load them multimple times)
- seems_ok = True
- exit_cicle = True
- license_found = license_selected # let the last "fake" license normally detected
- break
+ if template in self.list_licenses: # the list_licenses are loaded in the __init__ (not to load them multimple times)
+ seems_ok = True
+ exit_cicle = True
+ license_found = license_selected # let the last "fake" license normally detected
+ break
# previous block was unsuccessful? Try with the next one
for license_selected in licenses_found:
try:
template = self.giveMeTheTemplate(license_selected)
+ if template == None:
+ continue # ok, this template it's not ok, continue..
except wikipedia.BadTitle:
# Template with wrong name, no need to report, simply skip
continue
- try:
- if template == None:
- continue # ok, this template it's not ok, continue..
+ try:
template_text = template.get()
except wikipedia.NoPage:
continue # ok, this template it's not ok, continue..
Revision: 5952
Author: filnik
Date: 2008-10-10 16:47:17 +0000 (Fri, 10 Oct 2008)
Log Message:
-----------
Little fix in the checkImageOnCommons() function, add 'same name' if the images have.. the same name, yes, how do you guess? :)
Modified Paths:
--------------
trunk/pywikipedia/checkimages.py
Modified: trunk/pywikipedia/checkimages.py
===================================================================
--- trunk/pywikipedia/checkimages.py 2008-10-10 15:28:22 UTC (rev 5951)
+++ trunk/pywikipedia/checkimages.py 2008-10-10 16:47:17 UTC (rev 5952)
@@ -793,7 +793,7 @@
""" Checking if the image is on commons """
wikipedia.output(u'Checking if %s is on commons...' % self.image)
commons_site = wikipedia.getSite('commons', 'commons')
- regexOnCommons = r"\n\*\[\[:Image:%s\]\] is also on '''Commons''': \[\[commons:Image:.*?\]\]$" % self.image
+ regexOnCommons = r"\n\*\[\[:Image:%s\]\] is also on '''Commons''': \[\[commons:Image:.*?\]\](?: \(same name\)|)$" % self.image
imagePage = wikipedia.ImagePage(self.site, 'Image:%s' % self.image)
hash_found = imagePage.getHash()
if hash_found == None:
@@ -809,11 +809,14 @@
# Problems? Yes! We have to skip the check part for that image!
# Because it's on commons but someone has added something on your project.
return False
- elif 'stemma' in self.image.lower() and self.site.lang == 'it':
+ elif re.findall(r'\bstemma\b', self.image.lower()) != [] and self.site.lang == 'it':
wikipedia.output(u'%s has "stemma" inside, means that it\'s ok.' % self.image)
return True # Problems? No, it's only not on commons but the image needs a check
else:
- repme = "\n*[[:Image:%s]] is also on '''Commons''': [[commons:Image:%s]]" % (self.image, commons_image_with_this_hash[0])
+ if self.image == commons_image_with_this_hash[0]:
+ repme = "\n*[[:Image:%s]] is also on '''Commons''': [[commons:Image:%s]] (same name)" % (self.image, commons_image_with_this_hash[0])
+ else:
+ repme = "\n*[[:Image:%s]] is also on '''Commons''': [[commons:Image:%s]]" % (self.image, commons_image_with_this_hash[0])
self.report_image(self.image, self.rep_page, self.com, repme, addings = False, regex = regexOnCommons)
# Problems? No, return True
return True
Revision: 5951
Author: filnik
Date: 2008-10-10 15:28:22 +0000 (Fri, 10 Oct 2008)
Log Message:
-----------
Testing phase on commons gives a lot of things to think about.. continuing with the fixing phase
Modified Paths:
--------------
trunk/pywikipedia/checkimages.py
Modified: trunk/pywikipedia/checkimages.py
===================================================================
--- trunk/pywikipedia/checkimages.py 2008-10-10 14:33:40 UTC (rev 5950)
+++ trunk/pywikipedia/checkimages.py 2008-10-10 15:28:22 UTC (rev 5951)
@@ -1017,6 +1017,17 @@
list_licenses.append(pageLicense) # the list has wiki-pages
return list_licenses
+ def giveMeTheTemplate(self, license_selected):
+ #print template.exists()
+ template = wikipedia.Page(self.site, 'Template:%s' % license_selected)
+ if not template.exists():
+ template = wikipedia.Page(self.site, license_selected)
+ if not template.exists():
+ return None # break and exit
+ if template.isRedirectPage():
+ template = template.getRedirectTarget()
+ return template
+
def smartDetection(self, image_text):
seems_ok = False
license_found = None
@@ -1030,16 +1041,13 @@
break
if licenses_found != []:
for license_selected in licenses_found:
- #print template.exists()
- template = wikipedia.Page(self.site, 'Template:%s' % license_selected)
- if not template.exists():
- template = wikipedia.Page(self.site, license_selected)
- if not template.exists():
- exit_cicle = True
- break # break and report
+ # put the first, if there is problem, this will be reported in the log
+ if license_found == None:
+ license_found = license_selected
try:
- if template.isRedirectPage():
- template = template.getRedirectTarget()
+ template = self.giveMeTheTemplate(license_selected)
+ if template == None:
+ continue
except wikipedia.BadTitle:
# Template with wrong name, no need to report, simply skip
continue
@@ -1047,16 +1055,21 @@
if template in self.list_licenses: # the list_licenses are loaded in the __init__ (not to load them multimple times)
seems_ok = True
exit_cicle = True
+ license_found = license_selected # let the last "fake" license normally detected
break
- license_found = license_selected # let the last "fake" license normally detected
# previous block was unsuccessful? Try with the next one
for license_selected in licenses_found:
try:
+ template = self.giveMeTheTemplate(license_selected)
+ except wikipedia.BadTitle:
+ # Template with wrong name, no need to report, simply skip
+ continue
+ try:
template_text = template.get()
+ if template == None:
+ continue # ok, this template it's not ok, continue..
except wikipedia.NoPage:
- seems_ok = False # Empty template (maybe deleted while the script's running)
- exit_cicle = True
- break
+ continue # ok, this template it's not ok, continue..
regex_noinclude = re.compile(r'<noinclude>(.*?)</noinclude>', re.DOTALL)
template_text = regex_noinclude.sub('', template_text)
if second_round == False:
@@ -1065,7 +1078,6 @@
break # only exit from the for, not from the while
else:
exit_cicle = True
- license_found = license_selected # A good license? Ok, let's use it instead
break
if not seems_ok:
rep_text_license_fake = "\n*[[:Image:%s]] seems to have a ''fake license'', license detected: {{tl|%s}}." % (self.image, license_found)
Revision: 5950
Author: filnik
Date: 2008-10-10 14:33:40 +0000 (Fri, 10 Oct 2008)
Log Message:
-----------
Fixing again smartdetection, commons testing phase successful, let's see if there's anything else to add..
Modified Paths:
--------------
trunk/pywikipedia/checkimages.py
Modified: trunk/pywikipedia/checkimages.py
===================================================================
--- trunk/pywikipedia/checkimages.py 2008-10-10 14:14:04 UTC (rev 5949)
+++ trunk/pywikipedia/checkimages.py 2008-10-10 14:33:40 UTC (rev 5950)
@@ -342,7 +342,7 @@
'ta':[u'information'],
'zh':[u'information'],
}
-
+# A page where there's a list of template to skip.
PageWithHiddenTemplates = {
'commons': u'User:Filbot/White_templates#White_templates',
'en':None,
@@ -350,6 +350,14 @@
'ko': u'User:Kwjbot_IV/whitetemplates/list',
}
+# A page where there's a list of template to consider as licenses.
+PageWithAllowedTemplates = {
+ 'commons': u'User:Filbot/Allowed templates',
+ 'en':None,
+ 'it':u'Progetto:Coordinamento/Immagini/Bot/AllowedTemplates',
+ 'ko': u'User:Kwjbot_IV/whitetemplates/list',
+ }
+
# Template added when the bot finds only an hidden template and nothing else.
# Note: every __botnick__ will be repleaced with your bot's nickname (feel free not to use if you don't need it)
HiddenTemplateNotification = {
@@ -497,6 +505,7 @@
self.com = wikipedia.translate(self.site, comm10)
self.hiddentemplate = wikipedia.translate(self.site, HiddenTemplate)
self.pageHidden = wikipedia.translate(self.site, PageWithHiddenTemplates)
+ self.pageAllowed = wikipedia.translate(self.site, PageWithAllowedTemplates)
# Commento = Summary in italian
self.commento = wikipedia.translate(self.site, comm)
# Adding the bot's nickname at the notification text if needed.
@@ -992,6 +1001,20 @@
gen = pagegenerators.CategorizedPageGenerator(cat)
pages = [page for page in gen]
list_licenses.extend(pages)
+
+ # Add the licenses set in the default page as licenses
+ # to check
+ if self.pageAllowed != None:
+ try:
+ pageAllowedText = wikipedia.Page(self.site, self.pageAllowed).get()
+ except (wikipedia.NoPage, wikipedia.IsRedirectPage):
+ pageAllowedText = ''
+ for nameLicense in self.load(pageAllowedText):
+ if not 'template:' in nameLicense.lower():
+ nameLicense = 'Template:%s' % nameLicense
+ pageLicense = wikipedia.Page(self.site, nameLicense)
+ if pageLicense not in list_licenses:
+ list_licenses.append(pageLicense) # the list has wiki-pages
return list_licenses
def smartDetection(self, image_text):
@@ -1000,6 +1023,7 @@
regex_find_licenses = re.compile(r'\{\{(?:[Tt]emplate:|)(.*?)(?:[|\n].*?|)\}\}', re.DOTALL)
licenses_found = regex_find_licenses.findall(image_text)
second_round = False
+
exit_cicle = False # howTo exit from both the for and the while cicle
while 1:
if exit_cicle: # howTo exit from the while
@@ -1033,6 +1057,8 @@
seems_ok = False # Empty template (maybe deleted while the script's running)
exit_cicle = True
break
+ regex_noinclude = re.compile(r'<noinclude>(.*?)</noinclude>', re.DOTALL)
+ template_text = regex_noinclude.sub('', template_text)
if second_round == False:
licenses_found = regex_find_licenses.findall(template_text)
second_round = True