Hey list,
I have an IRC bot. I'm integrating the mediawiki RC announcement into
the bot by having it listen on a UDP ip/port, parsing the incoming
message, then announcing it on IRC.
I'm having trouble in the "parsing" part of that. The RC announcement
message sent from MW is very messy, with random numbers and other
characters mixed in. I've been trying to write a regex to group this
into the Username, the edit reason, the page edited, and the URL of
the diff....but I'm having quite a bit of trouble.
I have this in my LocalSettings.php:
$wgRC2UDPAddress = '127.0.0.1';
$wgRC2UDPPort = '1223';
$wgRC2UDPPrefix = 'Wiki: ';
The string sent to the socket is something like:
Wiki: 14[[07To Do14]]4 10
02http://domain.tld/wiki/index.php?diff=230&oldid=201 5* 03Username 5*
(-45) 10Removed IRC line; added something else
The regex I wrote works when I test it by putting the above text into
a string and applying the regex. However, it DOES NOT work when I
actually run the bot and parse the data coming over the socket. I'm
not sure why it acts like this. I first thought it had to do with line
endings, but I tried removing the ^ and $, as well as setting the "m"
flag, for multi-line (where . matches linebreaks as well).
The regex I have now is:
/Wiki: [0-9]{2}\[\[[0-9]{2}(.+)[0-9]{2}\]\].*(http:\/\/domain.tld\/wiki\/index.php.+)
[0-9]\* [0-9]{2}(.+) [0-9]\* .+ [0-9]{2}(.*)/
Note, this is a PCRE regex.
And again, it works fine when I'm testing it against a string of the
text, but not the actual data being sent over the socket. I have no
idea why.
Is there some sort of generic regex that is available somewhere for
parsing this text? Or what? Why does MediaWiki choose such a messy
string to use as the announcement? It just seems odd to me and very
troublesome.
Thanks for the help.