It's basically a push system for bots rather than pull, which I agree is a
significantly better solution.
EventLogging looks interesting. I haven't read through that entire guide,
but the first paragraph or so kind of makes it sound like it's meant more
for analytics. Would it be suitable for this application?
*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
| tylerromeo(a)gmail.com
On Sun, Jul 28, 2013 at 12:48 PM, Petr Bena <benapetr(a)gmail.com> wrote:
And if someone was wondering why subscribing to
changes is better than
watching them real time:
* No need to implement irc client in your bot, just a simple redis
queue downloading
* Your bot doesn't need to run to wait for a change at all (which save
resources greatly) it can just start once there are items in a queue
* You don't need to bother with invention of some parser for current
IRC messages, you can just pick a format easy to deserialize (like
json)
* If your bot crashes, you will not miss any edits (on other hand if
dispatcher daemon crashes you would :P but I hope we make it as stable
as possible)
* No need to create any edit filtering etc, this can be already part
of your subscription
* Easy way to distribute work in parallel across multi-instance bots.
Once a single bot fetches item, it disappear from redis queue
* And many other reasons I just can't think of right now
On Sun, Jul 28, 2013 at 6:35 PM, Petr Bena <benapetr(a)gmail.com> wrote:
I think you kind of misunderstood my proposal
hashar :) I know that,
IRC feed is where the dispatcher is going to take data from, the
difference is, that dispatcher is a special service for bot operators,
that allow them to subscribe for selected pages / authors (even using
regular expressions) and it would filter these for them from RC feed
(currently the IRC version) and fill them up in a redis queue they
specify in a format they prefer.
This was bots need to run much less often, and bot operators need to
do much less work watching the activities on wiki's. I don't know if
people will like this or not, but it is surely going to be useful at
least for 1 bot operator in future, and that would be me :-)
And I really believe that once I create a proper documentation for
this so that people understand how it works, many others will find it
useful. It is just a subscription service that let you do /something/
(where something in this moment is element of { "redis queue" } but in
future might be more than that.
It should be a flexible subscription system which works completely
other way than current RC feed does. RC feed provides you with all
changes in real time. This thing will provide you with filtered
changes, even back in time (you will pick them up from redis queue).
The most simple thing to use as an example would be a bot that should
do something with every edit to pages Wikipedia:SomeProject/* (like
review / archive whatever). The bot operator would just issue command
similar to this:
https://wikitech.wikimedia.org/wiki/Bot_Dispatcher#Example_usage in
order to create a redis queue of edits matching
Wikipedia:SomeProject/.* regex
I am very bad in explaining of stuff, but I believe once people
understand what I am about to create, they would eventually find it
useful :-)
On Sun, Jul 28, 2013 at 2:16 PM, Antoine Musso <hashar+wmf(a)free.fr>
wrote:
> Le 27/07/13 12:34, Petr Bena a écrit :
>> It would watch the recentchages of ALL wikis we have @wm and users
>> could subscribe (using web browser or some terminal interface) to this
>> service, so that on certain events (page X was modified), this bot
>> dispatcher would do something (submit their bot on grid / sent some
>> signal / tcp packet somewhere / insert data to redis etc etc).
>
> We already such a system! The recent changes entries are formatted as
> IRC colorized messages which are sent over UDP and relayed on
>
irc.wikimedia.org (look at #en.wikipedia).
>
> The related parameters are $wgRC2UDPAddress and $wgRC2UDPPrefix.
>
> So a bot writer can hop on that channel and consume the feed provided
there.
>
> It has a few issues though:
>
> 1) the system is not resilient (machine die, no more events)
> 2) messages are not machine friendly
> 3) there is no schema description to ensure MediaWiki send the format
> expected by bots
>
>
> Ori Livneh has developed EventLogging which sounds to me like a good
> replacement for the IRC stream described above. You basically have to
> write a JSON schema, then send a query with the key/value you want to
> have in the system, it would validate them and send them in a zero mq
> system. From there, sub system can subscribe to a stream of messages
> and do whatever they want with them (ie write to a database, send irc
> notification or pubsub or whatever).
>
>
> The main advantages of EventLogging are:
>
> a) it is already written
> b) it is production grade (monitored, got incident doc etc)
> c) it works
> d) WMF staff supports it
> e) has doc
https://www.mediawiki.org/wiki/Extension:EventLogging/Guide
>
> :-)
>
>
> --
> Antoine "hashar" Musso
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l