And if someone was wondering why subscribing to changes is better than watching them real time:
* No need to implement irc client in your bot, just a simple redis queue downloading * Your bot doesn't need to run to wait for a change at all (which save resources greatly) it can just start once there are items in a queue * You don't need to bother with invention of some parser for current IRC messages, you can just pick a format easy to deserialize (like json) * If your bot crashes, you will not miss any edits (on other hand if dispatcher daemon crashes you would :P but I hope we make it as stable as possible) * No need to create any edit filtering etc, this can be already part of your subscription * Easy way to distribute work in parallel across multi-instance bots. Once a single bot fetches item, it disappear from redis queue * And many other reasons I just can't think of right now
On Sun, Jul 28, 2013 at 6:35 PM, Petr Bena benapetr@gmail.com wrote:
I think you kind of misunderstood my proposal hashar :) I know that, IRC feed is where the dispatcher is going to take data from, the difference is, that dispatcher is a special service for bot operators, that allow them to subscribe for selected pages / authors (even using regular expressions) and it would filter these for them from RC feed (currently the IRC version) and fill them up in a redis queue they specify in a format they prefer.
This was bots need to run much less often, and bot operators need to do much less work watching the activities on wiki's. I don't know if people will like this or not, but it is surely going to be useful at least for 1 bot operator in future, and that would be me :-)
And I really believe that once I create a proper documentation for this so that people understand how it works, many others will find it useful. It is just a subscription service that let you do /something/ (where something in this moment is element of { "redis queue" } but in future might be more than that.
It should be a flexible subscription system which works completely other way than current RC feed does. RC feed provides you with all changes in real time. This thing will provide you with filtered changes, even back in time (you will pick them up from redis queue).
The most simple thing to use as an example would be a bot that should do something with every edit to pages Wikipedia:SomeProject/* (like review / archive whatever). The bot operator would just issue command similar to this: https://wikitech.wikimedia.org/wiki/Bot_Dispatcher#Example_usage in order to create a redis queue of edits matching Wikipedia:SomeProject/.* regex
I am very bad in explaining of stuff, but I believe once people understand what I am about to create, they would eventually find it useful :-)
On Sun, Jul 28, 2013 at 2:16 PM, Antoine Musso hashar+wmf@free.fr wrote:
Le 27/07/13 12:34, Petr Bena a écrit :
It would watch the recentchages of ALL wikis we have @wm and users could subscribe (using web browser or some terminal interface) to this service, so that on certain events (page X was modified), this bot dispatcher would do something (submit their bot on grid / sent some signal / tcp packet somewhere / insert data to redis etc etc).
We already such a system! The recent changes entries are formatted as IRC colorized messages which are sent over UDP and relayed on irc.wikimedia.org (look at #en.wikipedia).
The related parameters are $wgRC2UDPAddress and $wgRC2UDPPrefix.
So a bot writer can hop on that channel and consume the feed provided there.
It has a few issues though:
- the system is not resilient (machine die, no more events)
- messages are not machine friendly
- there is no schema description to ensure MediaWiki send the format
expected by bots
Ori Livneh has developed EventLogging which sounds to me like a good replacement for the IRC stream described above. You basically have to write a JSON schema, then send a query with the key/value you want to have in the system, it would validate them and send them in a zero mq system. From there, sub system can subscribe to a stream of messages and do whatever they want with them (ie write to a database, send irc notification or pubsub or whatever).
The main advantages of EventLogging are:
a) it is already written b) it is production grade (monitored, got incident doc etc) c) it works d) WMF staff supports it e) has doc https://www.mediawiki.org/wiki/Extension:EventLogging/Guide
:-)
-- Antoine "hashar" Musso
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l