2010/12/10 Giftpflanze m.p.roppelt@web.de
Gahhh, this list. Nobody suggested just using Python's Twisted?[1] So much easier than trying to write your own script in Python using sockets and manual pongs and all that jazz.
I'm going to drag as deep as I can into http://krondo.com/?p=1209. Thanks for suggestion. This will help me into the second step: and now that I have my clean parsed #irc message... how can I use it for my tasks, sometimes simple, sometimes far from simple, while listening for other messages? I'd try a DIY (do it yourself) way... but I guess that it's not so an exotic problem, nad that's much better to study a little bit.
Here’s my RE that parses the RC IRC message in all aspects I know of:
The first line splits the server line into the actual IRC message and the channel (i.e. wiki) it is coming from. The sending nick is ignored since noone is allowed to talk at all and because it may change.
The second splits the message into its 6 constituent parts. That works for every single line at the moment (sometimes a detail changes and we are left with a mess), be it even a log entry and not an ordinary edit, because the surrounding markup is present at every line. Sometimes the message is too long for the IRC format (which allows for 512 bytes including the final \r\n), so beware of cut off lines.
The REs are in the re_syntax(n) Tcl-style format (since this is taken from my MediaWiki Tcl Library [~gifti/bot/irc.tcl]) but can easily be adopted to other languages I assume. I use \003 and \002 instead of direct ASCII for better readability and transportability. Consider that the color codes are sometimes with leading zeros, sometimes not.
regexp {:[^ ]+ PRIVMSG #([^ ]+) :(.*?)} $line -> channel message
regexp {\00314[[\00307(.*)\00314]]\0034 (.*)\00310 \00302(.*)\003 \0035*\003 \00303(.*)\003 \0035*\003 (*\002*+*([^)]*)\002*)* \00310(.*?)\003*} $message -> title action url user bytes comment
VERY interesting, thank you!
Alex