Hi,
I have an idea for new service we could implement on tools project that would greatly save system resources. I would like to have some kind of feedback.
Imagine a daemon similar to inet.d
It would watch the recentchages of ALL wikis we have @wm and users could subscribe (using web browser or some terminal interface) to this service, so that on certain events (page X was modified), this bot dispatcher would do something (submit their bot on grid / sent some signal / tcp packet somewhere / insert data to redis etc etc).
This way bot designers could very easily hook their bots to certain events without having to write their own "wiki-watchers". This would be extremely useful, not just for bots that should be triggered on event (someone edit some page) but also for bots that run periodically.
For example: archiving bot is now running in a way, that it checks ALL pages where the template for archiving is. No matter if these talk pages are dead for years or not. Using such a dispatcher, everytime when a talk page was modified some script or process could be launched that would add it to some queue (redis like, even the dispatcher could have this as an event so that no process would need to be launched) and archiving bot would only check these pages that are active, instead of thousands dead pages.
This way we could very efficiently schedule bots and save ton of system resources (cpu / memory / IO / network / even production servers load). It would also make it far easier for bot operators to create new tasks / bots as they would not need to program "wiki-watchers" themselves.
What you think about it?
After some thinking and talking to YuviPanda I decided to make it just as an ordinary tool instead of whole service. So that it would mostly consist of a daemon that based on user subscriptions insert stuff to redis queues.
This kind of limits the possibility of spawning a bot on change, but on other hand it shouldn't be hard to create own launcher (I will probably create some templates for that) which you can cron yourself (so that it would be some lightweight script that would just run, check redis and eventually run a bot).
On Sat, Jul 27, 2013 at 12:34 PM, Petr Bena benapetr@gmail.com wrote:
Hi,
I have an idea for new service we could implement on tools project that would greatly save system resources. I would like to have some kind of feedback.
Imagine a daemon similar to inet.d
It would watch the recentchages of ALL wikis we have @wm and users could subscribe (using web browser or some terminal interface) to this service, so that on certain events (page X was modified), this bot dispatcher would do something (submit their bot on grid / sent some signal / tcp packet somewhere / insert data to redis etc etc).
This way bot designers could very easily hook their bots to certain events without having to write their own "wiki-watchers". This would be extremely useful, not just for bots that should be triggered on event (someone edit some page) but also for bots that run periodically.
For example: archiving bot is now running in a way, that it checks ALL pages where the template for archiving is. No matter if these talk pages are dead for years or not. Using such a dispatcher, everytime when a talk page was modified some script or process could be launched that would add it to some queue (redis like, even the dispatcher could have this as an event so that no process would need to be launched) and archiving bot would only check these pages that are active, instead of thousands dead pages.
This way we could very efficiently schedule bots and save ton of system resources (cpu / memory / IO / network / even production servers load). It would also make it far easier for bot operators to create new tasks / bots as they would not need to program "wiki-watchers" themselves.
What you think about it?
On Sat, Jul 27, 2013 at 6:42 PM, Marc A. Pelletier marc@uberbox.org wrote:
On 07/27/2013 08:37 AM, Petr Bena wrote:
So that it would mostly consist of a daemon that based on user subscriptions insert stuff to redis queues.
Wouldn't it be much easier to implement it as, say, a dbus and allow people to subscribe to the feed, instead?
Doesn't the 'd' in dbus stand for 'desktop'? :) I've never heard of dbus being used in server side situations, when there are plenty of alternatives available (Redis, RabbitMQ, etc). gerrit-to-redis, grrrit-wm and suchabot already use this architecture for gerrit streams, no reason it can't scale for Wiki RC Changes.
I think the way I currently develop it, it's going to be far easier for target users to use it than it could ever be if we were using dbus or such
On Sat, Jul 27, 2013 at 5:20 PM, Marc A. Pelletier marc@uberbox.org wrote:
On 07/27/2013 11:10 AM, Yuvi Panda wrote:
grrrit-wm and suchabot already use this architecture for gerrit streams, no reason it can't scale for Wiki RC Changes.
That's actually a persuasive argument (i.e., let's not multiply mechanisms).
-- Marc
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Sat, Jul 27, 2013 at 8:37 AM, Petr Bena benapetr@gmail.com wrote:
After some thinking and talking to YuviPanda I decided to make it just as an ordinary tool instead of whole service. So that it would mostly consist of a daemon that based on user subscriptions insert stuff to redis queues.
Do you mean that the bot owners themselves would be responsible for running this tool?
It would watch the recentchages of ALL wikis we have @wm and users
could subscribe (using web browser or some terminal interface) to this service, so that on certain events (page X was modified), this bot dispatcher would do something (submit their bot on grid / sent some signal / tcp packet somewhere / insert data to redis etc etc).
This sounds like a nice idea, but it'd be bounds more difficult to design than the current bot solution. Mainly because unlike the current model (where UDP is just spammed as fast as possible), this would require filtering through a rule list and actually processing requests. The service would have to keep up with RC. It's not impossible, and if anything I think it's a pretty cool idea. It'd just require some thought as to how the service would handle overload, if the service would support being spanned across a server pool, how the service would be concurrent, etc.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Sat, Jul 27, 2013 at 10:25 PM, Tyler Romeo tylerromeo@gmail.com wrote:
On Sat, Jul 27, 2013 at 8:37 AM, Petr Bena benapetr@gmail.com wrote:
After some thinking and talking to YuviPanda I decided to make it just as an ordinary tool instead of whole service. So that it would mostly consist of a daemon that based on user subscriptions insert stuff to redis queues.
Do you mean that the bot owners themselves would be responsible for running this tool?
nope, they would just subscribe and it would fill up the redis queues for them
It would watch the recentchages of ALL wikis we have @wm and users
could subscribe (using web browser or some terminal interface) to this service, so that on certain events (page X was modified), this bot dispatcher would do something (submit their bot on grid / sent some signal / tcp packet somewhere / insert data to redis etc etc).
This sounds like a nice idea, but it'd be bounds more difficult to design than the current bot solution. Mainly because unlike the current model (where UDP is just spammed as fast as possible), this would require filtering through a rule list and actually processing requests. The service would have to keep up with RC. It's not impossible, and if anything I think it's a pretty cool idea. It'd just require some thought as to how the service would handle overload, if the service would support being spanned across a server pool, how the service would be concurrent, etc.
In this moment it would likely itself connect to irc and relay the current feed, just in redis format, the difference is, that writing own irc parser is not just more complicated, but it also involves a whole process being up to read the RC feed.
This way the bot doesn't need to have a single process running, but for example, fetches the queue of pages / edits that it needs to process from redis after start
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I created some proposal with technical specifications, most interesting is probably the usage:
https://wikitech.wikimedia.org/wiki/Bot_Dispatcher#Example_usage
On Sat, Jul 27, 2013 at 10:47 PM, Petr Bena benapetr@gmail.com wrote:
On Sat, Jul 27, 2013 at 10:25 PM, Tyler Romeo tylerromeo@gmail.com wrote:
On Sat, Jul 27, 2013 at 8:37 AM, Petr Bena benapetr@gmail.com wrote:
After some thinking and talking to YuviPanda I decided to make it just as an ordinary tool instead of whole service. So that it would mostly consist of a daemon that based on user subscriptions insert stuff to redis queues.
Do you mean that the bot owners themselves would be responsible for running this tool?
nope, they would just subscribe and it would fill up the redis queues for them
It would watch the recentchages of ALL wikis we have @wm and users
could subscribe (using web browser or some terminal interface) to this service, so that on certain events (page X was modified), this bot dispatcher would do something (submit their bot on grid / sent some signal / tcp packet somewhere / insert data to redis etc etc).
This sounds like a nice idea, but it'd be bounds more difficult to design than the current bot solution. Mainly because unlike the current model (where UDP is just spammed as fast as possible), this would require filtering through a rule list and actually processing requests. The service would have to keep up with RC. It's not impossible, and if anything I think it's a pretty cool idea. It'd just require some thought as to how the service would handle overload, if the service would support being spanned across a server pool, how the service would be concurrent, etc.
In this moment it would likely itself connect to irc and relay the current feed, just in redis format, the difference is, that writing own irc parser is not just more complicated, but it also involves a whole process being up to read the RC feed.
This way the bot doesn't need to have a single process running, but for example, fetches the queue of pages / edits that it needs to process from redis after start
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Le 27/07/13 12:34, Petr Bena a écrit :
It would watch the recentchages of ALL wikis we have @wm and users could subscribe (using web browser or some terminal interface) to this service, so that on certain events (page X was modified), this bot dispatcher would do something (submit their bot on grid / sent some signal / tcp packet somewhere / insert data to redis etc etc).
We already such a system! The recent changes entries are formatted as IRC colorized messages which are sent over UDP and relayed on irc.wikimedia.org (look at #en.wikipedia).
The related parameters are $wgRC2UDPAddress and $wgRC2UDPPrefix.
So a bot writer can hop on that channel and consume the feed provided there.
It has a few issues though:
1) the system is not resilient (machine die, no more events) 2) messages are not machine friendly 3) there is no schema description to ensure MediaWiki send the format expected by bots
Ori Livneh has developed EventLogging which sounds to me like a good replacement for the IRC stream described above. You basically have to write a JSON schema, then send a query with the key/value you want to have in the system, it would validate them and send them in a zero mq system. From there, sub system can subscribe to a stream of messages and do whatever they want with them (ie write to a database, send irc notification or pubsub or whatever).
The main advantages of EventLogging are:
a) it is already written b) it is production grade (monitored, got incident doc etc) c) it works d) WMF staff supports it e) has doc https://www.mediawiki.org/wiki/Extension:EventLogging/Guide
:-)
I think you kind of misunderstood my proposal hashar :) I know that, IRC feed is where the dispatcher is going to take data from, the difference is, that dispatcher is a special service for bot operators, that allow them to subscribe for selected pages / authors (even using regular expressions) and it would filter these for them from RC feed (currently the IRC version) and fill them up in a redis queue they specify in a format they prefer.
This was bots need to run much less often, and bot operators need to do much less work watching the activities on wiki's. I don't know if people will like this or not, but it is surely going to be useful at least for 1 bot operator in future, and that would be me :-)
And I really believe that once I create a proper documentation for this so that people understand how it works, many others will find it useful. It is just a subscription service that let you do /something/ (where something in this moment is element of { "redis queue" } but in future might be more than that.
It should be a flexible subscription system which works completely other way than current RC feed does. RC feed provides you with all changes in real time. This thing will provide you with filtered changes, even back in time (you will pick them up from redis queue).
The most simple thing to use as an example would be a bot that should do something with every edit to pages Wikipedia:SomeProject/* (like review / archive whatever). The bot operator would just issue command similar to this: https://wikitech.wikimedia.org/wiki/Bot_Dispatcher#Example_usage in order to create a redis queue of edits matching Wikipedia:SomeProject/.* regex
I am very bad in explaining of stuff, but I believe once people understand what I am about to create, they would eventually find it useful :-)
On Sun, Jul 28, 2013 at 2:16 PM, Antoine Musso hashar+wmf@free.fr wrote:
Le 27/07/13 12:34, Petr Bena a écrit :
It would watch the recentchages of ALL wikis we have @wm and users could subscribe (using web browser or some terminal interface) to this service, so that on certain events (page X was modified), this bot dispatcher would do something (submit their bot on grid / sent some signal / tcp packet somewhere / insert data to redis etc etc).
We already such a system! The recent changes entries are formatted as IRC colorized messages which are sent over UDP and relayed on irc.wikimedia.org (look at #en.wikipedia).
The related parameters are $wgRC2UDPAddress and $wgRC2UDPPrefix.
So a bot writer can hop on that channel and consume the feed provided there.
It has a few issues though:
- the system is not resilient (machine die, no more events)
- messages are not machine friendly
- there is no schema description to ensure MediaWiki send the format
expected by bots
Ori Livneh has developed EventLogging which sounds to me like a good replacement for the IRC stream described above. You basically have to write a JSON schema, then send a query with the key/value you want to have in the system, it would validate them and send them in a zero mq system. From there, sub system can subscribe to a stream of messages and do whatever they want with them (ie write to a database, send irc notification or pubsub or whatever).
The main advantages of EventLogging are:
a) it is already written b) it is production grade (monitored, got incident doc etc) c) it works d) WMF staff supports it e) has doc https://www.mediawiki.org/wiki/Extension:EventLogging/Guide
:-)
-- Antoine "hashar" Musso
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
And if someone was wondering why subscribing to changes is better than watching them real time:
* No need to implement irc client in your bot, just a simple redis queue downloading * Your bot doesn't need to run to wait for a change at all (which save resources greatly) it can just start once there are items in a queue * You don't need to bother with invention of some parser for current IRC messages, you can just pick a format easy to deserialize (like json) * If your bot crashes, you will not miss any edits (on other hand if dispatcher daemon crashes you would :P but I hope we make it as stable as possible) * No need to create any edit filtering etc, this can be already part of your subscription * Easy way to distribute work in parallel across multi-instance bots. Once a single bot fetches item, it disappear from redis queue * And many other reasons I just can't think of right now
On Sun, Jul 28, 2013 at 6:35 PM, Petr Bena benapetr@gmail.com wrote:
I think you kind of misunderstood my proposal hashar :) I know that, IRC feed is where the dispatcher is going to take data from, the difference is, that dispatcher is a special service for bot operators, that allow them to subscribe for selected pages / authors (even using regular expressions) and it would filter these for them from RC feed (currently the IRC version) and fill them up in a redis queue they specify in a format they prefer.
This was bots need to run much less often, and bot operators need to do much less work watching the activities on wiki's. I don't know if people will like this or not, but it is surely going to be useful at least for 1 bot operator in future, and that would be me :-)
And I really believe that once I create a proper documentation for this so that people understand how it works, many others will find it useful. It is just a subscription service that let you do /something/ (where something in this moment is element of { "redis queue" } but in future might be more than that.
It should be a flexible subscription system which works completely other way than current RC feed does. RC feed provides you with all changes in real time. This thing will provide you with filtered changes, even back in time (you will pick them up from redis queue).
The most simple thing to use as an example would be a bot that should do something with every edit to pages Wikipedia:SomeProject/* (like review / archive whatever). The bot operator would just issue command similar to this: https://wikitech.wikimedia.org/wiki/Bot_Dispatcher#Example_usage in order to create a redis queue of edits matching Wikipedia:SomeProject/.* regex
I am very bad in explaining of stuff, but I believe once people understand what I am about to create, they would eventually find it useful :-)
On Sun, Jul 28, 2013 at 2:16 PM, Antoine Musso hashar+wmf@free.fr wrote:
Le 27/07/13 12:34, Petr Bena a écrit :
It would watch the recentchages of ALL wikis we have @wm and users could subscribe (using web browser or some terminal interface) to this service, so that on certain events (page X was modified), this bot dispatcher would do something (submit their bot on grid / sent some signal / tcp packet somewhere / insert data to redis etc etc).
We already such a system! The recent changes entries are formatted as IRC colorized messages which are sent over UDP and relayed on irc.wikimedia.org (look at #en.wikipedia).
The related parameters are $wgRC2UDPAddress and $wgRC2UDPPrefix.
So a bot writer can hop on that channel and consume the feed provided there.
It has a few issues though:
- the system is not resilient (machine die, no more events)
- messages are not machine friendly
- there is no schema description to ensure MediaWiki send the format
expected by bots
Ori Livneh has developed EventLogging which sounds to me like a good replacement for the IRC stream described above. You basically have to write a JSON schema, then send a query with the key/value you want to have in the system, it would validate them and send them in a zero mq system. From there, sub system can subscribe to a stream of messages and do whatever they want with them (ie write to a database, send irc notification or pubsub or whatever).
The main advantages of EventLogging are:
a) it is already written b) it is production grade (monitored, got incident doc etc) c) it works d) WMF staff supports it e) has doc https://www.mediawiki.org/wiki/Extension:EventLogging/Guide
:-)
-- Antoine "hashar" Musso
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
It's basically a push system for bots rather than pull, which I agree is a significantly better solution.
EventLogging looks interesting. I haven't read through that entire guide, but the first paragraph or so kind of makes it sound like it's meant more for analytics. Would it be suitable for this application?
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Sun, Jul 28, 2013 at 12:48 PM, Petr Bena benapetr@gmail.com wrote:
And if someone was wondering why subscribing to changes is better than watching them real time:
- No need to implement irc client in your bot, just a simple redis
queue downloading
- Your bot doesn't need to run to wait for a change at all (which save
resources greatly) it can just start once there are items in a queue
- You don't need to bother with invention of some parser for current
IRC messages, you can just pick a format easy to deserialize (like json)
- If your bot crashes, you will not miss any edits (on other hand if
dispatcher daemon crashes you would :P but I hope we make it as stable as possible)
- No need to create any edit filtering etc, this can be already part
of your subscription
- Easy way to distribute work in parallel across multi-instance bots.
Once a single bot fetches item, it disappear from redis queue
- And many other reasons I just can't think of right now
On Sun, Jul 28, 2013 at 6:35 PM, Petr Bena benapetr@gmail.com wrote:
I think you kind of misunderstood my proposal hashar :) I know that, IRC feed is where the dispatcher is going to take data from, the difference is, that dispatcher is a special service for bot operators, that allow them to subscribe for selected pages / authors (even using regular expressions) and it would filter these for them from RC feed (currently the IRC version) and fill them up in a redis queue they specify in a format they prefer.
This was bots need to run much less often, and bot operators need to do much less work watching the activities on wiki's. I don't know if people will like this or not, but it is surely going to be useful at least for 1 bot operator in future, and that would be me :-)
And I really believe that once I create a proper documentation for this so that people understand how it works, many others will find it useful. It is just a subscription service that let you do /something/ (where something in this moment is element of { "redis queue" } but in future might be more than that.
It should be a flexible subscription system which works completely other way than current RC feed does. RC feed provides you with all changes in real time. This thing will provide you with filtered changes, even back in time (you will pick them up from redis queue).
The most simple thing to use as an example would be a bot that should do something with every edit to pages Wikipedia:SomeProject/* (like review / archive whatever). The bot operator would just issue command similar to this: https://wikitech.wikimedia.org/wiki/Bot_Dispatcher#Example_usage in order to create a redis queue of edits matching Wikipedia:SomeProject/.* regex
I am very bad in explaining of stuff, but I believe once people understand what I am about to create, they would eventually find it useful :-)
On Sun, Jul 28, 2013 at 2:16 PM, Antoine Musso hashar+wmf@free.fr
wrote:
Le 27/07/13 12:34, Petr Bena a écrit :
It would watch the recentchages of ALL wikis we have @wm and users could subscribe (using web browser or some terminal interface) to this service, so that on certain events (page X was modified), this bot dispatcher would do something (submit their bot on grid / sent some signal / tcp packet somewhere / insert data to redis etc etc).
We already such a system! The recent changes entries are formatted as IRC colorized messages which are sent over UDP and relayed on irc.wikimedia.org (look at #en.wikipedia).
The related parameters are $wgRC2UDPAddress and $wgRC2UDPPrefix.
So a bot writer can hop on that channel and consume the feed provided
there.
It has a few issues though:
- the system is not resilient (machine die, no more events)
- messages are not machine friendly
- there is no schema description to ensure MediaWiki send the format
expected by bots
Ori Livneh has developed EventLogging which sounds to me like a good replacement for the IRC stream described above. You basically have to write a JSON schema, then send a query with the key/value you want to have in the system, it would validate them and send them in a zero mq system. From there, sub system can subscribe to a stream of messages and do whatever they want with them (ie write to a database, send irc notification or pubsub or whatever).
The main advantages of EventLogging are:
a) it is already written b) it is production grade (monitored, got incident doc etc) c) it works d) WMF staff supports it e) has doc https://www.mediawiki.org/wiki/Extension:EventLogging/Guide
:-)
-- Antoine "hashar" Musso
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I think it's suitable for anything that requires this kind of system. If analytics would benefit from redis-queue machine readable "watchlist" that supports regexes and various edit filters, then yes. It can be of course extended in many ways (I am currently trying to write it in a very extendible way)
On Mon, Jul 29, 2013 at 12:55 AM, Tyler Romeo tylerromeo@gmail.com wrote:
It's basically a push system for bots rather than pull, which I agree is a significantly better solution.
EventLogging looks interesting. I haven't read through that entire guide, but the first paragraph or so kind of makes it sound like it's meant more for analytics. Would it be suitable for this application?
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Sun, Jul 28, 2013 at 12:48 PM, Petr Bena benapetr@gmail.com wrote:
And if someone was wondering why subscribing to changes is better than watching them real time:
- No need to implement irc client in your bot, just a simple redis
queue downloading
- Your bot doesn't need to run to wait for a change at all (which save
resources greatly) it can just start once there are items in a queue
- You don't need to bother with invention of some parser for current
IRC messages, you can just pick a format easy to deserialize (like json)
- If your bot crashes, you will not miss any edits (on other hand if
dispatcher daemon crashes you would :P but I hope we make it as stable as possible)
- No need to create any edit filtering etc, this can be already part
of your subscription
- Easy way to distribute work in parallel across multi-instance bots.
Once a single bot fetches item, it disappear from redis queue
- And many other reasons I just can't think of right now
On Sun, Jul 28, 2013 at 6:35 PM, Petr Bena benapetr@gmail.com wrote:
I think you kind of misunderstood my proposal hashar :) I know that, IRC feed is where the dispatcher is going to take data from, the difference is, that dispatcher is a special service for bot operators, that allow them to subscribe for selected pages / authors (even using regular expressions) and it would filter these for them from RC feed (currently the IRC version) and fill them up in a redis queue they specify in a format they prefer.
This was bots need to run much less often, and bot operators need to do much less work watching the activities on wiki's. I don't know if people will like this or not, but it is surely going to be useful at least for 1 bot operator in future, and that would be me :-)
And I really believe that once I create a proper documentation for this so that people understand how it works, many others will find it useful. It is just a subscription service that let you do /something/ (where something in this moment is element of { "redis queue" } but in future might be more than that.
It should be a flexible subscription system which works completely other way than current RC feed does. RC feed provides you with all changes in real time. This thing will provide you with filtered changes, even back in time (you will pick them up from redis queue).
The most simple thing to use as an example would be a bot that should do something with every edit to pages Wikipedia:SomeProject/* (like review / archive whatever). The bot operator would just issue command similar to this: https://wikitech.wikimedia.org/wiki/Bot_Dispatcher#Example_usage in order to create a redis queue of edits matching Wikipedia:SomeProject/.* regex
I am very bad in explaining of stuff, but I believe once people understand what I am about to create, they would eventually find it useful :-)
On Sun, Jul 28, 2013 at 2:16 PM, Antoine Musso hashar+wmf@free.fr
wrote:
Le 27/07/13 12:34, Petr Bena a écrit :
It would watch the recentchages of ALL wikis we have @wm and users could subscribe (using web browser or some terminal interface) to this service, so that on certain events (page X was modified), this bot dispatcher would do something (submit their bot on grid / sent some signal / tcp packet somewhere / insert data to redis etc etc).
We already such a system! The recent changes entries are formatted as IRC colorized messages which are sent over UDP and relayed on irc.wikimedia.org (look at #en.wikipedia).
The related parameters are $wgRC2UDPAddress and $wgRC2UDPPrefix.
So a bot writer can hop on that channel and consume the feed provided
there.
It has a few issues though:
- the system is not resilient (machine die, no more events)
- messages are not machine friendly
- there is no schema description to ensure MediaWiki send the format
expected by bots
Ori Livneh has developed EventLogging which sounds to me like a good replacement for the IRC stream described above. You basically have to write a JSON schema, then send a query with the key/value you want to have in the system, it would validate them and send them in a zero mq system. From there, sub system can subscribe to a stream of messages and do whatever they want with them (ie write to a database, send irc notification or pubsub or whatever).
The main advantages of EventLogging are:
a) it is already written b) it is production grade (monitored, got incident doc etc) c) it works d) WMF staff supports it e) has doc https://www.mediawiki.org/wiki/Extension:EventLogging/Guide
:-)
-- Antoine "hashar" Musso
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Le 28/07/13 18:35, Petr Bena a écrit :
I think you kind of misunderstood my proposal hashar :) I know that, IRC feed is where the dispatcher is going to take data from, the difference is, that dispatcher is a special service for bot operators, that allow them to subscribe for selected pages / authors (even using regular expressions) and it would filter these for them from RC feed (currently the IRC version) and fill them up in a redis queue they specify in a format they prefer.
Petan, MzMcBribe, Ori and I had an IRC discussion on that topic this morning. Here is a quick summary.
What I dislike in your proposal is that you are still relying on the IRC feed service which is not the best way to publish metadata. It is really meant to be consumed by IRC client for friendly human displaying.
For the context the related code is in RecentChange::getIRCLine() and as an exemple there is the title formatting:
"\00314[[\00307$title\00314]]";
Not easily parseable. Moreover the code has plenty of exceptions and craft a URL for end user to click.
As I understood it, your bot would parse the horrible IRC syntax, craft some JSON and write it inn Redis for bots to consume. Thus bots authors will no more have to care about IRC format. That is an improvement, but we can do better.
Instead, we could have MediaWiki send JSON directly. Victor Vasiliev propsed a change to provide a JSON feed:
https://gerrit.wikimedia.org/r/#/c/52922
We could have that feed send to EventLogging zero mqueue, and write subscribers to it that would put the RC events in Redis.
To achieve that:
- we need Victor patch to be polished up and deployed - find out what need to be written to Redis (one queue per bot? A shared queue?) - write a zmq subscriber to publish in Redis
Eventually provide some library for bots author to easily query their Redis queue.
In the end you have: - a very robust feeding system which is on par with the other events feeds we are already maintaining - got rid of IRC formatting - nice JSON out of the box :-]
wikitech-l@lists.wikimedia.org