1. I'm testing my skill and I run my script under cron. The python script begin with these rows (and it runs):
# -*- coding: utf-8 -*- #!/usr/bin/python
import os,sys
if not sys.platform=="win32": sys.path.append('/home/alebot/pywikipedia') os.chdir("/home/alebot/scripts")
Then I tried to move to batch job sheduling, but... my script gives an error: now the server dislikes sys.path row. Why? I obviously have to study more: but what/where have I sto study? :-(
2. The script bring into life a python bot, who reads RecentChanges at 10 minutes intervals by a cron routine. Is perhaps more efficient a #irc bot listening it.wikisource #irc channel for recent changes in your opinion? Where can I find a good python script to read #irc channels?
Thanks - I apologize for so banal questions.
Alex
On Thu, Dec 9, 2010 at 4:54 PM, Alex Brollo alex.brollo@gmail.com wrote:
Then I tried to move to batch job sheduling, but... my script gives an error: now the server dislikes sys.path row. Why? I obviously have to study more: but what/where have I sto study? :-(
Please give the specific error message. It is hard to believe that the error is "the server dislikes sys.path".
Bryan
2010/12/9 Bryan Tong Minh bryan.tongminh@gmail.com
On Thu, Dec 9, 2010 at 4:54 PM, Alex Brollo alex.brollo@gmail.com wrote:
Then I tried to move to batch job sheduling, but... my script gives an error: now the server dislikes sys.path row. Why? I obviously have to
study
more: but what/where have I sto study? :-(
Please give the specific error message. It is hard to believe that the error is "the server dislikes sys.path".
:-) It gives an error for that line, precisely mentioning sys.path. I didn't save the message, but I can try to reproduce it.
Alex
irc listening with python is fairly easy; just use a socket
import socket IRC = socket.socket(socket.AF_INET, socket.SOCK_STREAM) IRC.connect(('irc.freenode.net', 6667)) while True: text = IRC.recv(1024) msgs = text.split('\n') for msg in msgs: if msg.split(' ', 1)[0] == "PING": pong = msg.split(' ', 1)[1] IRC.send("PONG %s" % pong) print msg
If you want to do periodically things, like writing the output to a file very 10 minutes, you have to set a timeout. Otherwise the script will wait at the recv-line till it receives data
2010/12/9 Alex Brollo alex.brollo@gmail.com
- I'm testing my skill and I run my script under cron. The python script begin with these rows (and it runs):
# -*- coding: utf-8 -*- #!/usr/bin/python import os,sys if not sys.platform=="win32": sys.path.append('/home/alebot/pywikipedia') os.chdir("/home/alebot/scripts")
Then I tried to move to batch job sheduling, but... my script gives an error: now the server dislikes sys.path row. Why? I obviously have to study more: but what/where have I sto study? :-( 2. The script bring into life a python bot, who reads RecentChanges at 10 minutes intervals by a cron routine. Is perhaps more efficient a #irc bot listening it.wikisource #irc channel for recent changes in your opinion? Where can I find a good python script to read #irc channels? Thanks - I apologize for so banal questions. Alex
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Oops, forgot to put a return after the pongmsg, like this: IRC.send("PONG %s\n" % pong)
The IRC-server will try to process the line after it finds a \n in your msg
Op 9 december 2010 17:04:24 UTC+1 heeft Sumurai8 Sumurai8@wikiweet.nl het volgende geschreven:
irc listening with python is fairly easy; just use a socket
import socket IRC = socket.socket(socket.AF_INET, socket.SOCK_STREAM) IRC.connect(('irc.freenode.net', 6667)) while True: text = IRC.recv(1024) msgs = text.split('\n') for msg in msgs: if msg.split(' ', 1)[0] == "PING": pong = msg.split(' ', 1)[1] IRC.send("PONG %s" % pong) print msg
If you want to do periodically things, like writing the output to a file very 10 minutes, you have to set a timeout. Otherwise the script will wait at the recv-line till it receives data
2010/12/9 Alex Brollo alex.brollo@gmail.com
- I'm testing my skill and I run my script under cron. The python script begin with these rows (and it runs):
# -*- coding: utf-8 -*- #!/usr/bin/python import os,sys if not sys.platform=="win32": sys.path.append('/home/alebot/pywikipedia') os.chdir("/home/alebot/scripts")
Then I tried to move to batch job sheduling, but... my script gives an error: now the server dislikes sys.path row. Why? I obviously have to study more: but what/where have I sto study? :-( 2. The script bring into life a python bot, who reads RecentChanges at 10 minutes intervals by a cron routine. Is perhaps more efficient a #irc bot listening it.wikisource #irc channel for recent changes in your opinion? Where can I find a good python script to read #irc channels? Thanks - I apologize for so banal questions. Alex
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Sumurai8 (DD) wrote:
Oops, forgot to put a return after the pongmsg, like this: IRC.send("PONG %s\n" % pong)
The IRC-server will try to process the line after it finds a \n in your msg
According to the protocol, it should be a CRLF (\r\n). Although a bare \n seems to be commonly accepted as well.
On Thu, Dec 9, 2010 at 5:36 PM, Platonides platonides@gmail.com wrote:
Sumurai8 (DD) wrote:
Oops, forgot to put a return after the pongmsg, like this: IRC.send("PONG %s\n" % pong)
The IRC-server will try to process the line after it finds a \n in your msg
According to the protocol, it should be a CRLF (\r\n). Although a bare \n seems to be commonly accepted as well.
In fact some ircds only look at the first 4 chars, PONG, regardless whether there is a new line at all.
Bryan
Well... you can actually send every 3 minutes a PONG-message without listening to the IRC-channel and the server will gladly accept that ^_^ . That's what I did at the time I didn't know about the timeout-option of a socket :) But most of the time it is just better to follow the rules and end each line with \r\n (nice, didn't know about that, so changed it in my script :) ), send a PONG-msg followed by everything that was send after the PING-message, etc, etc.
2010/12/9 Bryan Tong Minh bryan.tongminh@gmail.com:
On Thu, Dec 9, 2010 at 5:36 PM, Platonides platonides@gmail.com wrote:
Sumurai8 (DD) wrote:
Oops, forgot to put a return after the pongmsg, like this: IRC.send("PONG %s\n" % pong)
The IRC-server will try to process the line after it finds a \n in your msg
According to the protocol, it should be a CRLF (\r\n). Although a bare \n seems to be commonly accepted as well.
In fact some ircds only look at the first 4 chars, PONG, regardless whether there is a new line at all.
Bryan
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Long ago I have noticed that the irc server is kicking my bot out after some time from some reason.
Then I looked closer and noticed there is a server's ping around that mishaps. Alright, then I just added an ad-hoc pong:
public void responsePing(String line) { try { out.println("PONG :" + line.substring(line.indexOf(":")+1)); } catch(Throwable th) { // ... } }
And said it to go to hell. Pure storytelling is not why I am writing this. I have a question. I was returning the server whatever it was sending to me as a ping. This is how it worked like two years ago. Has something changed?
M
You are probably missing a PING-message whilest listening to IRC and then closes the connection when it doesn't receive a PONG in like 180 seconds.
2010/12/9 Михајло Анђелковић michael.angelkovich@gmail.com:
Long ago I have noticed that the irc server is kicking my bot out after some time from some reason.
Then I looked closer and noticed there is a server's ping around that mishaps. Alright, then I just added an ad-hoc pong:
public void responsePing(String line) { try { out.println("PONG :" + line.substring(line.indexOf(":")+1)); } catch(Throwable th) { // ... } }
And said it to go to hell. Pure storytelling is not why I am writing this. I have a question. I was returning the server whatever it was sending to me as a ping. This is how it worked like two years ago. Has something changed?
M
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Ok. I've my listening bot; at the beginning I've been a little confused by irc color codes, but I realized that they can be used to parse effectively #irc messages.
Just the time to mount my "irc color-based parser" then I'll be ready to use data.... enough for now I presume.
Then I'll read again your posts, to see if I can understand them (I only used some keyword to browse the web searching what I need...)
Thank you again.
Alex
Михајло Анђелковић wrote:
Long ago I have noticed that the irc server is kicking my bot out after some time from some reason.
Then I looked closer and noticed there is a server's ping around that mishaps. Alright, then I just added an ad-hoc pong:
public void responsePing(String line) { try { out.println("PONG :" + line.substring(line.indexOf(":")+1)); } catch(Throwable th) { // ... } }
And said it to go to hell. Pure storytelling is not why I am writing this. I have a question. I was returning the server whatever it was sending to me as a ping. This is how it worked like two years ago. Has something changed?
M
No.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Михајло Анђелковић:
I was returning the server whatever it was sending to me as a ping. This is how it worked like two years ago. Has something changed?
This is the correct way to do it, but many IRC client implementations are lazy and are written by people who guess the protocol, and end up with an incomplete understanding, or simple don't care enough to implement it correctly since the incorrect solution works in most cases.
(As someone else pointed out, many IRC servers (especially hybrid-derived ones) use a "ping cookie" on connect; it was originally intended to defeat source address spoofing, but more recently has been used against other attacks. If you don't reply correctly to a ping from one of these servers, you won't be able to connect.)
- river.
PS: I cringe every time I see someone "parsing" IRC lines with things like strncmp(line, "PRIVMSG ", 8) or strstr(line, " :"). The IRC protocol is very simple, and tokenising it properly is really not that difficult. (Every argument is separated by a space; if the first byte of the argument is ":", remove it and stop splitting.)
River Tarnell wrote:
PS: I cringe every time I see someone "parsing" IRC lines with things like strncmp(line, "PRIVMSG ", 8) or strstr(line, " :"). The IRC protocol is very simple, and tokenising it properly is really not that difficult. (Every argument is separated by a space; if the first byte of the argument is ":", remove it and stop splitting.)
You forget the first argument, where a leading : means that it is a full name (the only one you will ever see as a client). Except the PING, which has no sender (it's always the local server), all command follow the pattern: <sender> <action> <parameters> The action can be performed by a "user" (joining a channel, sending a message...) in which case it is the sender, or it may be a numeric where the sender is a server. The number of parameters depend on the action, with the last one taking up to the end of line if beginning by : Theoretically, some arguments should be counted from the beginning and others from the end, thus allowing new parameters to be added in the middle. In the practise, the client format is fixed and that isn't really relevant.
Far from being interested about #irc protocol in general, my interest is focused on irc.wikimedia.org read-only channels, and on parameter segment of specific rows of specific channels (but I see that all channels follow a similar pattern).
My idea is to use my very basic listening bot to select rows, to parse parameters and to write them on a file. End of irc bot work. Another cron script will read (and delete) output file and operate on the list of new/edited pages. Consider that tasks I'm implementing do not require an immediate revision of pages by the bot; on the contrary, it's more efficient, IMHO, to wait some time after an human user edit, since often human editors don't use PreView and find something to fix as they see the result of their edit.
So, considering only the last edit of a page in an interval of 10-15 minutes, many unuseful edits by bot can be avoided.
It's something very far from "async programming", I guess: a primitive approach, but IMHO it should run.
Alex
Sumurai8 (DD) wrote:
Well... you can actually send every 3 minutes a PONG-message without listening to the IRC-channel and the server will gladly accept that ^_^ . That's what I did at the time I didn't know about the timeout-option of a socket :) But most of the time it is just better to follow the rules and end each line with \r\n (nice, didn't know about that, so changed it in my script :) ), send a PONG-msg followed by everything that was send after the PING-message, etc, etc.
Some ircds will, with every right to do so, not complete your login into the network in that case. Strangely, I don't see that kind of protection in freenode's ircd-seven despite being alledgedly protected from the javascript spam that plagued the last days of hyperion[1].
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Sumurai8 (DD):
text = IRC.recv(1024) msgs = text.split('\n')
This seems to have a bug: if there's more than 1024 bytes waiting, you could receive only part of the final message; so you will truncate that message, and the next recv will receive the other half (which will then be effectively junk).
- river.
It's just a plain idea how you can make an irc bot. Possible solutions are making the buffer bigger or preserving the last message if it doesn't end with a \n. For WikiLinkBot the first solution works just fine (If reading the recent changes every 10 minutes just works fine, making a bigger buffer should do the job (max. 500 edits in 600 seconds, then just make the buffer a little bigger).
Sumurai8
2010/12/9 River Tarnell river.tarnell@wikimedia.de:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Sumurai8 (DD):
text = IRC.recv(1024) msgs = text.split('\n')
This seems to have a bug: if there's more than 1024 bytes waiting, you could receive only part of the final message; so you will truncate that message, and the next recv will receive the other half (which will then be effectively junk).
- river. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD)
iEYEARECAAYFAk0A/0QACgkQIXd7fCuc5vKX8QCeKN77J7YXVJaO5utUVMyxCC5a ubsAnR/+E/8WtjZuD1Qrc78S5v68ZQ5/ =z4ru -----END PGP SIGNATURE-----
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Alex Brollo wrote:
- The script bring into life a python bot, who reads RecentChanges at
10 minutes intervals by a cron routine. Is perhaps more efficient a #irc bot listening it.wikisource #irc channel for recent changes in your opinion?
Yes. Specially since you presumably want to get *all* RecentChanges which makes the 10 minutes value arbitrary.
2010/12/9 Platonides platonides@gmail.com
Alex Brollo wrote:
- The script bring into life a python bot, who reads RecentChanges at
10 minutes intervals by a cron routine. Is perhaps more efficient a #irc bot listening it.wikisource #irc channel for recent changes in your opinion?
Yes. Specially since you presumably want to get *all* RecentChanges which makes the 10 minutes value arbitrary.
Thanks to all from you. My 10 minutes interval readings were only a trick to skip over my "continuously listening" unskillness. I'll study a little bit the socket stuff and your code, then - I guess - I'll ask you again for details/troubles. :-)
Consider that I'm VERY slow when learning new routines.... and presently I've no idea about what precisely is "a socket". :-)
Alex
Alex Brollo wrote:
Then I tried to move to batch job sheduling, but... my script gives an error: now the server dislikes sys.path row. Why? I obviously have to study more: but what/where have I sto study? :-(
- The script bring into life a python bot, who reads RecentChanges at 10
minutes intervals by a cron routine. Is perhaps more efficient a #irc bot listening it.wikisource #irc channel for recent changes in your opinion? Where can I find a good python script to read #irc channels?
Gahhh, this list. Nobody suggested just using Python's Twisted?[1] So much easier than trying to write your own script in Python using sockets and manual pongs and all that jazz.
You're more than welcome to look around my home directory (check /home/mzmcbride/scripts/irc/) for some IRC bots. The bot I specifically use to relay irc.wikimedia.org to irc.freenode.net is on another server, but I'd be happy to post the code for you if you'd like. His name is snitch and he supports all Wikimedia wikis, multiple channels, and stalks per-page, per-user, or per-wiki.
MZMcBride
MZMcBride schrieb:
Alex Brollo wrote:
- The script bring(s) into life a python bot, who reads
RecentChanges at 10 minutes intervals by a cron routine. Is perhaps more efficient a #irc bot listening it.wikisource #irc channel for recent changes in your opinion? Where can I find a good python script to read #irc channels?
Gahhh, this list. Nobody suggested just using Python's Twisted?[1] So much easier than trying to write your own script in Python using sockets and manual pongs and all that jazz.
The process of IRC listening is not that dramatic, regardless of language. That could easily be made manually.
You're more than welcome to look around my home directory (check /home/mzmcbride/scripts/irc/) for some IRC bots. The bot I specifically use to relay irc.wikimedia.org to irc.freenode.net is on another server, but I'd be happy to post the code for you if you'd like. His name is snitch and he supports all Wikimedia wikis, multiple channels, and stalks per-page, per-user, or per-wiki.
Interesting.
Here’s my RE that parses the RC IRC message in all aspects I know of:
The first line splits the server line into the actual IRC message and the channel (i.e. wiki) it is coming from. The sending nick is ignored since noone is allowed to talk at all and because it may change.
The second splits the message into its 6 constituent parts. That works for every single line at the moment (sometimes a detail changes and we are left with a mess), be it even a log entry and not an ordinary edit, because the surrounding markup is present at every line. Sometimes the message is too long for the IRC format (which allows for 512 bytes including the final \r\n), so beware of cut off lines.
The REs are in the re_syntax(n) Tcl-style format (since this is taken from my MediaWiki Tcl Library [~gifti/bot/irc.tcl]) but can easily be adopted to other languages I assume. I use \003 and \002 instead of direct ASCII for better readability and transportability. Consider that the color codes are sometimes with leading zeros, sometimes not.
regexp {:[^ ]+ PRIVMSG #([^ ]+) :(.*?)} $line -> channel message
regexp {\00314[[\00307(.*)\00314]]\0034 (.*)\00310 \00302(.*)\003 \0035*\003 \00303(.*)\003 \0035*\003 (*\002*+*([^)]*)\002*)* \00310(.*?)\003*} $message -> title action url user bytes comment
Giftpflanze
2010/12/10 Giftpflanze m.p.roppelt@web.de
Gahhh, this list. Nobody suggested just using Python's Twisted?[1] So much easier than trying to write your own script in Python using sockets and manual pongs and all that jazz.
I'm going to drag as deep as I can into http://krondo.com/?p=1209. Thanks for suggestion. This will help me into the second step: and now that I have my clean parsed #irc message... how can I use it for my tasks, sometimes simple, sometimes far from simple, while listening for other messages? I'd try a DIY (do it yourself) way... but I guess that it's not so an exotic problem, nad that's much better to study a little bit.
Here’s my RE that parses the RC IRC message in all aspects I know of:
The first line splits the server line into the actual IRC message and the channel (i.e. wiki) it is coming from. The sending nick is ignored since noone is allowed to talk at all and because it may change.
The second splits the message into its 6 constituent parts. That works for every single line at the moment (sometimes a detail changes and we are left with a mess), be it even a log entry and not an ordinary edit, because the surrounding markup is present at every line. Sometimes the message is too long for the IRC format (which allows for 512 bytes including the final \r\n), so beware of cut off lines.
The REs are in the re_syntax(n) Tcl-style format (since this is taken from my MediaWiki Tcl Library [~gifti/bot/irc.tcl]) but can easily be adopted to other languages I assume. I use \003 and \002 instead of direct ASCII for better readability and transportability. Consider that the color codes are sometimes with leading zeros, sometimes not.
regexp {:[^ ]+ PRIVMSG #([^ ]+) :(.*?)} $line -> channel message
regexp {\00314[[\00307(.*)\00314]]\0034 (.*)\00310 \00302(.*)\003 \0035*\003 \00303(.*)\003 \0035*\003 (*\002*+*([^)]*)\002*)* \00310(.*?)\003*} $message -> title action url user bytes comment
VERY interesting, thank you!
Alex
2010/12/10 MZMcBride z@mzmcbride.com
Gahhh, this list. Nobody suggested just using Python's Twisted?[1] So much easier than trying to write your own script in Python using sockets and manual pongs and all that jazz.
Once more, it's amazing to see how different meanings can have the word "easier". :-) Recently I tried to follow the suggestion of an expert, encouraging me to use NetBeans for jthon... "easier" than my old beloved Idle interface for basic python .... I hardly survived. :-P
You are so largely overestimating my abstract knowledge of such stuff... :-( Nevertheless, my rough, basic, DIY routines run, and they do "magic" jobs! :-) So I presume easier, for me, to go on step by step with rough, brief, banal scripts...
Alex
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
MZMcBride:
Gahhh, this list. Nobody suggested just using Python's Twisted?
Someone is suggesting it: you. That's pretty much the point of the list; there's more than one person on it.
- river.
toolserver-l@lists.wikimedia.org