Has anyone had success with submitting data that contains non-English characters via a bot? I'm currently working on some Perl scripts to extract and upload data from a number of external sources related to the subject of our wiki -- thanks, James Birkholz, for the extension code by I decided to go another way -- and some of this material consists of translations of foreign (esp., French and Irish) terms that contain various accented characters.
The basic bot is working -- login, upload 1 or more articles, logout, produce some reports on what happened -- but I haven't had much success with uploading articles non-standard characters. For those familiar with Perl, I'm using the LWP modules (LWP::UserAgent and HTTP::Request::Common, mostly) and thought I might be able to handle this using the URI::Escape module, but no success.
The best workaround I have so far is to replace the characters with HTML entities but that interferes with searching for the terms once they are uploaded.
Any suggestions, advice, or pointers to helpful resources would be appreciated.
John Blumel
On Apr 6, 2005 7:59 PM, John Blumel johnblumel@earthlink.net wrote:
Has anyone had success with submitting data that contains non-English characters via a bot?
I have no experience with it myself, but there's a bot framework written in Python at http://pywikipediabot.sf.net which has plenty of users on all sorts of wikis. At the very least, you might be able to find some hints as to how they dealt with that - there's also a mailling list for the project, at http://lists.sourceforge.net/lists/listinfo/pywikipediabot-users
HTH
On Apr 6, 2005, at 3:57pm, Rowan Collins wrote:
I have no experience with it myself, but there's a bot framework written in Python at http://pywikipediabot.sf.net which has plenty of users on all sorts of wikis. At the very least, you might be able to find some hints as to how they dealt with that - there's also a mailling list for the project, at http://lists.sourceforge.net/lists/listinfo/pywikipediabot-users
Thanks for reminding me about that. I had seen it before but, in the perhaps false hope of saving time, I thought I'd rather not get into learning Python at the moment. Perhaps I'll have to after all.
John Blumel
On Wed, Apr 06, 2005 at 02:59:14PM -0400, John Blumel wrote:
Has anyone had success with submitting data that contains non-English characters via a bot? I'm currently working on some Perl scripts to extract and upload data from a number of external sources related to the subject of our wiki -- thanks, James Birkholz, for the extension code by I decided to go another way -- and some of this material consists of translations of foreign (esp., French and Irish) terms that contain various accented characters.
As Rowan said, use the PyWikipediaBot. It's the standard for modifiying mediawiki-installations via bot and it's api is superior to any other. There are plenty of example scripts in the CVS (don't use the outdated packages but the CVS) which should be a good starting point.
ciao, tom
On Apr 6, 2005, at 2:59pm, John Blumel wrote:
Has anyone had success with submitting data that contains non-English characters via a bot? I'm currently working on some Perl scripts to extract and upload data... but I haven't had much success with uploading articles non-standard characters.
Well, out of sheer stubbornness, and after a couple of days in character encoding hell, I finally figured out how to get this to work. It wasn't exactly intuitive but, if I ensure that the data files containing the GftP article are Unicode (UTF-8) encoded and then encode them as Latin-1 (ISO-8859-1) in the bot script, the submissions to the wiki, which is configured for UTF-8, go through without any data corruption. This also works for the page titles which had been not getting escaped properly.
I'll leave it for someone else to explain why this works this way.
John Blumel
mediawiki-l@lists.wikimedia.org