Le 28/07/13 18:35, Petr Bena a écrit :
I think you kind of misunderstood my proposal hashar
:) I know that,
IRC feed is where the dispatcher is going to take data from, the
difference is, that dispatcher is a special service for bot operators,
that allow them to subscribe for selected pages / authors (even using
regular expressions) and it would filter these for them from RC feed
(currently the IRC version) and fill them up in a redis queue they
specify in a format they prefer.
Petan, MzMcBribe, Ori and I had an IRC discussion on that topic this
morning. Here is a quick summary.
What I dislike in your proposal is that you are still relying on the IRC
feed service which is not the best way to publish metadata. It is really
meant to be consumed by IRC client for friendly human displaying.
For the context the related code is in RecentChange::getIRCLine() and as
an exemple there is the title formatting:
"\00314[[\00307$title\00314]]";
Not easily parseable. Moreover the code has plenty of exceptions and
craft a URL for end user to click.
As I understood it, your bot would parse the horrible IRC syntax, craft
some JSON and write it inn Redis for bots to consume. Thus bots authors
will no more have to care about IRC format. That is an improvement, but
we can do better.
Instead, we could have MediaWiki send JSON directly. Victor Vasiliev
propsed a change to provide a JSON feed:
https://gerrit.wikimedia.org/r/#/c/52922
We could have that feed send to EventLogging zero mqueue, and write
subscribers to it that would put the RC events in Redis.
To achieve that:
- we need Victor patch to be polished up and deployed
- find out what need to be written to Redis (one queue per bot? A
shared queue?)
- write a zmq subscriber to publish in Redis
Eventually provide some library for bots author to easily query their
Redis queue.
In the end you have:
- a very robust feeding system which is on par with the other events
feeds we are already maintaining
- got rid of IRC formatting
- nice JSON out of the box :-]
--
Antoine "hashar" Musso