I somehow clicked reply instead of reply to all, my response is bellow...
On Mon, Jul 29, 2013 at 1:14 PM, Petr Bena <benapetr(a)gmail.com> wrote:
On Mon, Jul 29, 2013 at 1:04 PM, Antoine Musso
<hashar+wmf(a)free.fr> wrote:
Le 28/07/13 18:35, Petr Bena a écrit :
I think you kind of misunderstood my proposal
hashar :) I know that,
IRC feed is where the dispatcher is going to take data from, the
difference is, that dispatcher is a special service for bot operators,
that allow them to subscribe for selected pages / authors (even using
regular expressions) and it would filter these for them from RC feed
(currently the IRC version) and fill them up in a redis queue they
specify in a format they prefer.
Petan, MzMcBribe, Ori and I had an IRC discussion on that topic this
morning. Here is a quick summary.
What I dislike in your proposal is that you are still relying on the IRC
feed service which is not the best way to publish metadata. It is really
meant to be consumed by IRC client for friendly human displaying.
As I said on irc, the source code is very flexible, and indeed I am
now relying on the /only/ feed service we have in this moment, which
is IRC feed. No matter if we like it or not, it's the only service we
have and I MUST use it because there is no other thing. Once there is
anything better I can use that instead of IRC.
For the context the related code is in
RecentChange::getIRCLine() and as
an exemple there is the title formatting:
"\00314[[\00307$title\00314]]";
Not easily parseable. Moreover the code has plenty of exceptions and
craft a URL for end user to click.
As I understood it, your bot would parse the horrible IRC syntax, craft
some JSON and write it inn Redis for bots to consume. Thus bots authors
will no more have to care about IRC format. That is an improvement, but
we can do better.
That is sort of true. The dispatcher will convert the current irc
message to some serializable class item. That can be serialized to
whatever format the bot developer who is target consumer prefer. In
this moment plain text (separated values with pipe) / xml and json are
available
Instead, we could have MediaWiki send JSON
directly. Victor Vasiliev
propsed a change to provide a JSON feed:
https://gerrit.wikimedia.org/r/#/c/52922
We could have that feed send to EventLogging zero mqueue, and write
subscribers to it that would put the RC events in Redis.
That's indeed interesting, for dispatcher this means only that the
current parser of edits would be replaced with json parser (instead of
irc parser). However the subscribers you talk about is exactly what
dispatcherd is doing now (its existence kind of kills the requirement
of bot developers to create their own, which may be a lot of work).
People can subscribe to RC feed using a simple 2 line (in future
hopefully 1 line) command in terminal, which automagically creates a
redis queue filled with edits, see
https://wikitech.wikimedia.org/wiki/Bot_Dispatcher#Example_usage
> To achieve that:
>
> - we need Victor patch to be polished up and deployed
> - find out what need to be written to Redis (one queue per bot? A
> shared queue?)
> - write a zmq subscriber to publish in Redis
>
> Eventually provide some library for bots author to easily query their
> Redis queue.
>
>
> In the end you have:
> - a very robust feeding system which is on par with the other events
> feeds we are already maintaining
> - got rid of IRC formatting
> - nice JSON out of the box :-]
>
>
> --
> Antoine "hashar" Musso