I somehow clicked reply instead of reply to all, my response is bellow...
On Mon, Jul 29, 2013 at 1:14 PM, Petr Bena benapetr@gmail.com wrote:
On Mon, Jul 29, 2013 at 1:04 PM, Antoine Musso hashar+wmf@free.fr wrote:
Le 28/07/13 18:35, Petr Bena a écrit :
I think you kind of misunderstood my proposal hashar :) I know that, IRC feed is where the dispatcher is going to take data from, the difference is, that dispatcher is a special service for bot operators, that allow them to subscribe for selected pages / authors (even using regular expressions) and it would filter these for them from RC feed (currently the IRC version) and fill them up in a redis queue they specify in a format they prefer.
Petan, MzMcBribe, Ori and I had an IRC discussion on that topic this morning. Here is a quick summary.
What I dislike in your proposal is that you are still relying on the IRC feed service which is not the best way to publish metadata. It is really meant to be consumed by IRC client for friendly human displaying.
As I said on irc, the source code is very flexible, and indeed I am now relying on the /only/ feed service we have in this moment, which is IRC feed. No matter if we like it or not, it's the only service we have and I MUST use it because there is no other thing. Once there is anything better I can use that instead of IRC.
For the context the related code is in RecentChange::getIRCLine() and as an exemple there is the title formatting:
"\00314[[\00307$title\00314]]";
Not easily parseable. Moreover the code has plenty of exceptions and craft a URL for end user to click.
As I understood it, your bot would parse the horrible IRC syntax, craft some JSON and write it inn Redis for bots to consume. Thus bots authors will no more have to care about IRC format. That is an improvement, but we can do better.
That is sort of true. The dispatcher will convert the current irc message to some serializable class item. That can be serialized to whatever format the bot developer who is target consumer prefer. In this moment plain text (separated values with pipe) / xml and json are available
Instead, we could have MediaWiki send JSON directly. Victor Vasiliev propsed a change to provide a JSON feed:
https://gerrit.wikimedia.org/r/#/c/52922
We could have that feed send to EventLogging zero mqueue, and write subscribers to it that would put the RC events in Redis.
That's indeed interesting, for dispatcher this means only that the current parser of edits would be replaced with json parser (instead of irc parser). However the subscribers you talk about is exactly what dispatcherd is doing now (its existence kind of kills the requirement of bot developers to create their own, which may be a lot of work). People can subscribe to RC feed using a simple 2 line (in future hopefully 1 line) command in terminal, which automagically creates a redis queue filled with edits, see https://wikitech.wikimedia.org/wiki/Bot_Dispatcher#Example_usage
To achieve that:
- we need Victor patch to be polished up and deployed
- find out what need to be written to Redis (one queue per bot? A
shared queue?)
- write a zmq subscriber to publish in Redis
Eventually provide some library for bots author to easily query their Redis queue.
In the end you have:
- a very robust feeding system which is on par with the other events
feeds we are already maintaining
- got rid of IRC formatting
- nice JSON out of the box :-]
-- Antoine "hashar" Musso