I think you kind of misunderstood my proposal hashar :) I know that,
IRC feed is where the dispatcher is going to take data from, the
difference is, that dispatcher is a special service for bot operators,
that allow them to subscribe for selected pages / authors (even using
regular expressions) and it would filter these for them from RC feed
(currently the IRC version) and fill them up in a redis queue they
specify in a format they prefer.
This was bots need to run much less often, and bot operators need to
do much less work watching the activities on wiki's. I don't know if
people will like this or not, but it is surely going to be useful at
least for 1 bot operator in future, and that would be me :-)
And I really believe that once I create a proper documentation for
this so that people understand how it works, many others will find it
useful. It is just a subscription service that let you do /something/
(where something in this moment is element of { "redis queue" } but in
future might be more than that.
It should be a flexible subscription system which works completely
other way than current RC feed does. RC feed provides you with all
changes in real time. This thing will provide you with filtered
changes, even back in time (you will pick them up from redis queue).
The most simple thing to use as an example would be a bot that should
do something with every edit to pages Wikipedia:SomeProject/* (like
review / archive whatever). The bot operator would just issue command
similar to this:
https://wikitech.wikimedia.org/wiki/Bot_Dispatcher#Example_usage in
order to create a redis queue of edits matching
Wikipedia:SomeProject/.* regex
I am very bad in explaining of stuff, but I believe once people
understand what I am about to create, they would eventually find it
useful :-)
On Sun, Jul 28, 2013 at 2:16 PM, Antoine Musso <hashar+wmf(a)free.fr> wrote:
Le 27/07/13 12:34, Petr Bena a écrit :
It would watch the recentchages of ALL wikis we
have @wm and users
could subscribe (using web browser or some terminal interface) to this
service, so that on certain events (page X was modified), this bot
dispatcher would do something (submit their bot on grid / sent some
signal / tcp packet somewhere / insert data to redis etc etc).
We already such a system! The recent changes entries are formatted as
IRC colorized messages which are sent over UDP and relayed on
irc.wikimedia.org (look at #en.wikipedia).
The related parameters are $wgRC2UDPAddress and $wgRC2UDPPrefix.
So a bot writer can hop on that channel and consume the feed provided there.
It has a few issues though:
1) the system is not resilient (machine die, no more events)
2) messages are not machine friendly
3) there is no schema description to ensure MediaWiki send the format
expected by bots
Ori Livneh has developed EventLogging which sounds to me like a good
replacement for the IRC stream described above. You basically have to
write a JSON schema, then send a query with the key/value you want to
have in the system, it would validate them and send them in a zero mq
system. From there, sub system can subscribe to a stream of messages
and do whatever they want with them (ie write to a database, send irc
notification or pubsub or whatever).
The main advantages of EventLogging are:
a) it is already written
b) it is production grade (monitored, got incident doc etc)
c) it works
d) WMF staff supports it
e) has doc
https://www.mediawiki.org/wiki/Extension:EventLogging/Guide
:-)
--
Antoine "hashar" Musso
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l