Bugs item #3604180, was opened at 2013-02-11 18:42
Message generated for change (Settings changed) made by
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3604180&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: redirect
Group: None
Status: Open
Resolution: None
>Priority: 7
Private: No
Submitted By: Riley ()
Assigned to: Nobody/Anonymous (nobody)
Summary: Unicode issue running redirect.py
Initial Comment:
Hello everyone, I am having a unicode issue when running redirect.py on wikisource.org. When running the script, pywikipediabot seems to try to change the page names into english.
----------------------------------------------------------------------
Comment By: Riley ()
Date: 2013-02-11 18:44
Message:
I clicked save before I was done writing -.-;
As also noticed in the provided screenshot; the script doesn't save nor
give output of any kind.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3604180&group_…
Bugs item #3605062, was opened at 2013-02-17 06:02
Message generated for change (Comment added) made by tgr_
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3605062&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: other
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Tgr (tgr_)
Assigned to: Nobody/Anonymous (nobody)
Summary: pywikipediabot should use standard output
Initial Comment:
All pywikipedia messages go to standard error, making the proper running of such bots from cron very difficult. Normal messages should go to standard output, and only suprising messages (python errors, block notifications, new message notifcations) should go to standard error.
----------------------------------------------------------------------
>Comment By: Tgr (tgr_)
Date: 2013-02-17 12:50
Message:
I disagree. Non-zero return value is for errors from which the application
could not recover. There might be errors or unexpected important events
which do not cause the bot to fail but should be reported nevertheless.
(For example if an interwikibot gets a talk page message on one of the many
wikis it visits, that should be reported, maybe the bot should stop working
on that wiki until the owner can check the message, but it certainly should
not stop working on all other wikis.)
If you want to show human-readable output to humans, and
machine-processable output to scripts, the proper solution for that is to
detect (via sys.stdout.isatty()) whether you are writing to a terminal, and
format accordingly (and allow overriding the behavior via a command line
switch). That is how sophisticated command line applications usually
operate; compare, for example, the output from 'ls' and 'ls | cat'.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2013-02-17 09:16
Message:
Data that can be used for further processing ('pipe') should be sent to
stdout. All messages that are only relevant for the user should be sent to
stderr. Errors should not be detected by checking if anything was written
to stderr, but by checking the return value (which will be non-zero if an
error occurred).
Basically, this is a well-known cron problem. See, for instance,
http://habilis.net/cronic/
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3605062&group_…
Bugs item #3605062, was opened at 2013-02-17 06:02
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3605062&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: other
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Tgr (tgr_)
Assigned to: Nobody/Anonymous (nobody)
Summary: pywikipediabot should use standard output
Initial Comment:
All pywikipedia messages go to standard error, making the proper running of such bots from cron very difficult. Normal messages should go to standard output, and only suprising messages (python errors, block notifications, new message notifcations) should go to standard error.
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2013-02-17 09:16
Message:
Data that can be used for further processing ('pipe') should be sent to
stdout. All messages that are only relevant for the user should be sent to
stderr. Errors should not be detected by checking if anything was written
to stderr, but by checking the return value (which will be non-zero if an
error occurred).
Basically, this is a well-known cron problem. See, for instance,
http://habilis.net/cronic/
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3605062&group_…
Bugs item #3605068, was opened at 2013-02-17 06:43
Message generated for change (Tracker Item Submitted) made by reza1615
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3605068&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: reza (reza1615)
Assigned to: Nobody/Anonymous (nobody)
Summary: please add math to txtlib.py
Initial Comment:
Hi,
Now cosmetic.py has bug for cases that they have <math> and it can not recognize <math> which is defined in line 718.
please add math to txtlib.py for solving this bug
case: http://fa.wikipedia.org/w/index.php?diff=prev&oldid=9331137
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3605068&group_…
Bugs item #3605062, was opened at 2013-02-17 06:02
Message generated for change (Tracker Item Submitted) made by tgr_
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3605062&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: other
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Tgr (tgr_)
Assigned to: Nobody/Anonymous (nobody)
Summary: pywikipediabot should use standard output
Initial Comment:
All pywikipedia messages go to standard error, making the proper running of such bots from cron very difficult. Normal messages should go to standard output, and only suprising messages (python errors, block notifications, new message notifcations) should go to standard error.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3605062&group_…
Bugs item #3604077, was opened at 2013-02-11 03:41
Message generated for change (Comment added) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3604077&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: login
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: JAn (jandudik)
>Assigned to: xqt (xqt)
Summary: Assertion error
Initial Comment:
interwiki.py -async -cleanup -links:template:Daerah_Hradec_Králové -lang:ms -untranslated -initialredirect
...
Getting 15 pages from wikipedia:ms...
Dump ms (wikipedia) appended.
Traceback (most recent call last):
File "D:\Py\interwiki.py", line 2603, in <module>
main()
File "D:\Py\interwiki.py", line 2577, in main
bot.run()
File "D:\Py\interwiki.py", line 2310, in run
self.queryStep()
File "D:\Py\interwiki.py", line 2283, in queryStep
self.oneQuery()
File "D:\Py\interwiki.py", line 2279, in oneQuery
subject.batchLoaded(self)
File "D:\Py\interwiki.py", line 1216, in batchLoaded
self.done.add(page)
File "D:\Py\interwiki.py", line 733, in add
assert page not in self.tree[site]
AssertionError
D:\Py>version.py
Pywikipedia trunk/pywikipedia/ (r11072, 2013/02/10, 16:52:07, ok)
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
config-settings:
use_api = True
use_api_login = True
unicode test: ok
----------------------------------------------------------------------
>Comment By: xqt (xqt)
Date: 2013-02-15 06:29
Message:
assert test removed in r11080
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3604077&group_…
Support Requests item #3019475, was opened at 2010-06-22 01:59
Message generated for change (Settings changed) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603139&aid=3019475&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Install Problem
Group: None
Status: Closed
Priority: 5
Private: No
Submitted By: https://www.google.com/accounts ()
Assigned to: Nobody/Anonymous (nobody)
Summary: No JSON object could be decoded [FIXED]
Initial Comment:
On Ubuntu 10.04, Karmic LAMP (PHP 5.2) Python 2.6.5, pywikipediabot from 2010-05-29 SVN, using identical server and bot configuration files as on a Mac setup (however, in this case, pywikipediabot reports an IP address, so I didn't need to hack httpd.conf), I get the following:
"Logging into FamilyName:en as UserName via API
Error downloading data: No JSON object could be decoded
Request en:/scriptpath/api.php?
Retrying in x seconds
I changed this to milliseconds to timely see the final error message, which is:
ERROR: ApiGetDataParse cause error No JSON object could be decoded
The program also creates a dump file containing the following:
Error reported: No JSON object could be decoded
127.0.0.1
/scriptpath/api.php?
<feff>{"login":{"result":"NeedToken","token":"[some md5-looking hash]"}}
Any ideas?
----------------------------------------------------------------------
Comment By: https://www.google.com/accounts ()
Date: 2012-04-13 01:34
Message:
Just to share :
I got the issue too. The UTF-16 BOM was inserted in LocalSettings.php
(edited by a MS Windows user).
I removed it and now everything works fine.
----------------------------------------------------------------------
Comment By: https://www.google.com/accounts ()
Date: 2010-08-05 23:05
Message:
Finally!!!! The problem is that DynamicPageList extension had BOMs at the
beginning of its initialization file. Because this is a "require_once"
extension, it seems that the BOM was getting inserted into the headers, and
Ubuntu's version of PHP or Apache (not sure which) does not sanitize those,
whereas the Mac (and seemingly, everyone else's installation) DOES sanitize
the BOMs before parsing. I am not sure why BeautifulSoup.py doesn't catch
this, but for whatever reason it doesn't. Unless you're using UTF-16 files,
you really shouldn't have a BOM anyway...
To check if you have any stray BOM's laying around, Mediawiki has actually
included a handy script in the t/maint directory called "bom.t"
If you're curious, go to your main MediaWiki directory, then "perl
t/maint/bom.t", and it will tell you which files are problematic.
If you just want to blast away and fix the problem, a combination of two
handy scripts took care of the problem for me. Put one or both in an
executable path, but be sure modify the shell script to refer to the
absolute path to the Perl script:
This one I call "RecursiveBOMDefuse.sh"
#!/bin/sh
#
if [ "$1" = "" ] ; then
echo "Usage: $0 directory"
exit
fi
# Get list of files in the directory
find "$1" -type f |
while read Name ; do
# Based on the file name, perform the conversion
case "$Name" in
(*) # markup text
NameTxt="${Name}"
/absolute/path/to/./BOMdefuse.plx "$NameTxt";
#alternatively, you could use perl /absolute/path/to/BOMdefuse.plx
"$NameTxt";
;;
esac
done
The next, I call BOMdefuse.plx, which is a perl script I found at W3C's
website - I'm really not sure why they haven't made this operate
recursively, but the shell takes care of that. If I had the time, I'd fix
the Perl script to handle everything, but I'm just so happy about getting
the bot working again that I'm going back to work on editing/cleaning up
content.
#!/usr/bin/perl
# program to remove a leading UTF-8 BOM from a file
# works both STDIN -> STDOUT and on the spot (with filename as argument)
# from http://people.w3.org/rishida/blog/?p=102
#
if ($#ARGV > 0) {
print STDERR "Too many arguments!\n";
exit;
}
my @file; # file content
my $lineno = 0;
my $filename = @ARGV[0];
if ($filename) {
open( BOMFILE, $filename ) || die "Could not open source file for
reading.";
while (<BOMFILE>) {
if ($lineno++ == 0) {
if ( index( $_, '' ) == 0 ) {
s/^\xEF\xBB\xBF//;
print "BOM found and removed.\n";
}
else { print "No BOM found.\n"; }
}
push @file, $_ ;
}
close (BOMFILE) || die "Can't close source file after reading.";
open (NOBOMFILE, ">$filename") || die "Could not open source file for
writing.";
foreach $line (@file) {
print NOBOMFILE $line;
}
close (NOBOMFILE) || die "Can't close source file after writing.";
}
else { # STDIN -> STDOUT
while (<>) {
if (!$lineno++) {
s/^\xEF\xBB\xBF//;
}
push @file, $_ ;
}
foreach $line (@file) {
print $line;
}
}
Obviously, run a chmod +x on both of these.
then go to your main Mediawiki directory and run "RecursiveBOMDefuse.sh ."
- it may take a minute or two, but it works!
Note: If you use symlinks anywhere in your installation, the script above
does not seem to follow them, so you have to run the script from the actual
directory. Although slightly annoying, this is probably a good thing, as a
bed set of symlinks could send this script off to run through your entire
drive (or if you're on a system with NFS mounts, the whole
network/cluster!!!).
I hope this helps others, and Ubuntu or Pywikipediabot folks, please take a
look at your PHP/Apache and BeautifulSoup.py - stray BOMs should not be
getting through..... (Of course, extension authors should sanitize their
extensions first, but talk about herding cats).
-Alex
----------------------------------------------------------------------
Comment By: https://www.google.com/accounts ()
Date: 2010-06-29 08:01
Message:
Still doesn't work with
Pywikipediabot (r8335 (wikipedia.py), 2010/06/26, 10:07:01)
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3]
(or python 2.5.4)
----------------------------------------------------------------------
Comment By: https://www.google.com/accounts ()
Date: 2010-06-22 10:21
Message:
Thanks for the suggestions and thanks for taking a look.
I'm using the stock 3321-byte api.php from MediaWiki 1.15.4, downloaded
straight from mediawiki.org, dated 2009-05-05 (extracted from the tarball
via tar zxf). I am using a default (apt-get) install of python 2.6.4 on a
fresh install of Ubuntu 10.04, and I just checked out the latest
pywikipediabot from svn via svn co
http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia pywikipedia
several hours ago. I've disabled the confusing mess that is AppArmor, so
there should be no issues there. My terminal is set to UTF-8 encoding.
I get the same problem with python 2.5.4 (e.g., "python2.5 login.py"), but
only on this particular machine.
I have made no changes to urllib2, which is what login.py imports by
default, and I have made no changes to urllib, which is what a default
family file imports.
The family file I am using was created on a Mac in vim. As far as I know,
vim doesn't add UTF-16 BOMs unless explicitly asked to do so, and I have
not explicitly done that. Just in case, on the linux box, I created a new
file and copy-pasted the family file text into it, renamed the old one,
renamed the new one properly, deleted all .pyc files, and I still get this
error. I have changed urllib2 to urllib and vice versa in each, both, and
neither of login.py and the family file, all with the same result.
Here is some more error output, although I am not sure if it helps:
ERROR: ApiGetDataParse caused error No JSON object could be decoded
127.0.0.1
/scriptpath/api.php?. Dump
ApiGetDataParse_FamilyName_en__Tue_Jun_22_18-54-23_2010.dump created.
Traceback (most recent call last):
File "login.py", line 437, in <module>
main()
File "login.py", line 433, in main
loginMan.login()
File "login.py", line 320, in login
cookiedata = self.getCookie(api)
File "login.py", line 182, in getCookie
response, data = query.GetData(predata, self.site, sysop=self.sysop,
back_response = True)
File "/home/user/bots/pywikipedia/query.py", line 170, in GetData
raise lastError
ValueError: No JSON object could be decoded
It looks like BeautifulSoup.py (starting at 1828) should strip out any
<feff> BOMs and replace them with null characters, but it doesn't seem to
be doing that.
I'm using completely stock installs of everything, straight from svn,
repositories, and official websites. My family file is built straight from
the template, and it is identical to the one that works on the Mac and on
an Ubuntu 8.04 install of the same wiki.
I have tried
python login.py -v -clean
and I get the following when viewing the dumpfile via cat:
Error reported: No JSON object could be decoded
127.0.0.1
/hcrscript/api.php?action=logout&format=json
[]
and this, when viewing the dumpfile in vim:
Error reported: No JSON object could be decoded
127.0.0.1
/hcrscript/api.php?action=logout&format=json
<feff>[]
As for other potentially-relevant info, I am using short URLs via
httpd.conf aliases, but this should make no difference at all, as it works
on other systems running php 5.2 and apache 2.2.
alias /scriptpath /path/to/scriptpath
alias /wiki /path/to/scriptpath/index.php
I have /scriptpath set as as the scriptpath in my family file, and my
api.php call is to '%s/api.php' (I have also tried u'%s/api.php' to try to
get BeautifulSoup to convert any errant unicode - I still get the identical
errors).
My syslog and /var/log/messages show no errors, and apache reports "POST
/hcrscript/api.php HTTP/1.1" 200".
I've tried uncommenting the "raise NotImplementedError" line in my family
file and commenting out use_api_login = True in my user-config.py file (or
leaving it as-is), but this just returns:
API disabled because this site does not support.
Retrying by ordinary way...
Logging in to Wiki:en as UserName
Login failed. Wrong password or CAPTCHA answer?
I'm completely stumped.
Thanks for any suggestions/advice you may have....
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2010-06-22 02:59
Message:
The <feff> is a UTF-16 BOM. Either urllib was changed, or you made some
change to api.php, accidentally adding it. Could you double-check if your
api.php is unchanged from the original mediawiki files (in other words:
replace it with an orginal from SVN/release)?
----------------------------------------------------------------------
Comment By: https://www.google.com/accounts ()
Date: 2010-06-22 02:06
Message:
Looking at some earlier logs, I see that this problem first appeared when I
upgraded from Python 2.6.1 to 2.6.2 in May. I am surprised that I seem to
be the only person having this problem.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603139&aid=3019475&group_…
Bugs item #3604456, was opened at 2013-02-13 01:41
Message generated for change (Tracker Item Submitted) made by nobody
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3604456&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Recentchanges misbehave?
Initial Comment:
There are two bugs I found in recentchanges() from wikipedia.py.
First, I think that the function should return every revision. However, now it returns only pages that haven't seen before.
Should
if i['pageid'] not in seen:
seen.add(i['pageid'])
be replaced with
if i['revid'] not in seen:
seen.add(i['revid'])
?
Second, what does parameter 'includeredirects' stand for? It is useless becaue it isn't used anywhere in the function.
Pywikipedia trunk/pywikipedia/ (r11072, 2013/02/10, 16:52:07, ok)
Python 2.7.3 (default, Sep 26 2012, 21:53:58)
[GCC 4.7.2]
config-settings:
use_api = True
use_api_login = True
unicode test: ok
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3604456&group_…