switching to something better than irc.wikimedia.org

List overview All Threads
Download

newer

older

RSS extension on Wiikipedias &...

Free Disk Space needed for import

Petr Bena

1 Mar 2013 1 Mar '13

9:55 a.m.

Hi,

I think that irc feed of recent changes is working great, but there is still a lot of space for improvement.

As Ryan Lane suggested once, we could probably use system of queues instead of irc which would be even more advanced. My suggestion is to create some kind of feed that would be in machine parseable format, like XML

This feed would be distributed by some kind of dispatcher living on some server, like feed.wikimedia.org and offering not just recent changes but also a recent history (for example last 5000 changes per project)

In case that service which is parsing this feed would be down for a moment, it could retrieve a backlog of changes.

The current feed irc.wikimedia.org should stay, but we could change it so that the current bot is retrieving the data from new xml feed instead of directly from apaches.

Show replies by date

Tyler Romeo

1 Mar 1 Mar

1:26 p.m.

New subject: switching to something better than irc.wikimedia.org

Hey,

It sounds like an interesting idea. Actually, AWS (I've been working with it recently for Extension:AWS) has a similar architecture, where you establish a push notification service using their Simple Notification Service and have it send messages to a queue using their Simple Queue Service.

The difficulty in replacing IRC would be that, first of all, it would almost definitely have to be a push-based service, where the wiki would publish the message and the notification server would send out the recent change to all the subscribed clients. This begs the question of whether there's an existing piece of software that does this or whether this would require implementing a daemon in the form of a maintenance script that handle the job.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com

On Fri, Mar 1, 2013 at 3:55 AM, Petr Bena benapetr@gmail.com wrote:

...

Hi,

I think that irc feed of recent changes is working great, but there is still a lot of space for improvement.

As Ryan Lane suggested once, we could probably use system of queues instead of irc which would be even more advanced. My suggestion is to create some kind of feed that would be in machine parseable format, like XML

This feed would be distributed by some kind of dispatcher living on some server, like feed.wikimedia.org and offering not just recent changes but also a recent history (for example last 5000 changes per project)

In case that service which is parsing this feed would be down for a moment, it could retrieve a backlog of changes.

The current feed irc.wikimedia.org should stay, but we could change it so that the current bot is retrieving the data from new xml feed instead of directly from apaches.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Petr Bena

2:22 p.m.

New subject: switching to something better than irc.wikimedia.org

I believe it would require to create a new daemon (preferably written in c++) which I am willing to write, that could do similar what the ircd does right now. And that is delivery of new change to all connected clients.

There would be preferably set of processes that are working together on this system. Some kind of cache daemon that would contain the history for all projects, dispatcher that would handle requests of clients and retrieve the data from cache daemon and listener which would retrieve the UDP traffic from wiki's and forward them to cache daemon.

We could of course create a multithreaded single process daemon as well, but that would make it little bit less stable, given that crash of any thread would bring down whole system.

On Fri, Mar 1, 2013 at 1:26 PM, Tyler Romeo tylerromeo@gmail.com wrote:

...

Hey,

It sounds like an interesting idea. Actually, AWS (I've been working with it recently for Extension:AWS) has a similar architecture, where you establish a push notification service using their Simple Notification Service and have it send messages to a queue using their Simple Queue Service.

The difficulty in replacing IRC would be that, first of all, it would almost definitely have to be a push-based service, where the wiki would publish the message and the notification server would send out the recent change to all the subscribed clients. This begs the question of whether there's an existing piece of software that does this or whether this would require implementing a daemon in the form of a maintenance script that handle the job.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com

On Fri, Mar 1, 2013 at 3:55 AM, Petr Bena benapetr@gmail.com wrote:

...
Hi,

I think that irc feed of recent changes is working great, but there is still a lot of space for improvement.

As Ryan Lane suggested once, we could probably use system of queues instead of irc which would be even more advanced. My suggestion is to create some kind of feed that would be in machine parseable format, like XML

This feed would be distributed by some kind of dispatcher living on some server, like feed.wikimedia.org and offering not just recent changes but also a recent history (for example last 5000 changes per project)

In case that service which is parsing this feed would be down for a moment, it could retrieve a backlog of changes.

The current feed irc.wikimedia.org should stay, but we could change it so that the current bot is retrieving the data from new xml feed instead of directly from apaches.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Daniel Friesen

2:35 p.m.

New subject: switching to something better than irc.wikimedia.org

We actually have an open RFC on this topic:

https://www.mediawiki.org/wiki/Requests_for_comment/Structured_data_push_not...

-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]

Petr Bena

2:48 p.m.

New subject: switching to something better than irc.wikimedia.org

I see that the RFC is considering multiple formats, why not support all of them? We could make the client request the format they like, either XML or JSON, that would be a matter of dispatcher how it produce the output data.

On Fri, Mar 1, 2013 at 2:35 PM, Daniel Friesen daniel@nadir-seen-fire.com wrote:

...

We actually have an open RFC on this topic:

https://www.mediawiki.org/wiki/Requests_for_comment/Structured_data_push_not...

-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Happy Melon

2:56 p.m.

New subject: switching to something better than irc.wikimedia.org

Because we made that mistake with the API, and now we're stuck with a bunch of deadweight formats that do nothing other than increase maintenance costs. If your first preference as a client developer is for JSON, it's really not that hard for you to go get a library to receive it in XML instead, or vice versa. That's the whole point of a standardised format.

--HM

On 1 March 2013 13:48, Petr Bena benapetr@gmail.com wrote:

...

I see that the RFC is considering multiple formats, why not support all of them? We could make the client request the format they like, either XML or JSON, that would be a matter of dispatcher how it produce the output data.

On Fri, Mar 1, 2013 at 2:35 PM, Daniel Friesen daniel@nadir-seen-fire.com wrote:

...
We actually have an open RFC on this topic:

https://www.mediawiki.org/wiki/Requests_for_comment/Structured_data_push_not...

...
-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Petr Bena

3:04 p.m.

New subject: switching to something better than irc.wikimedia.org

The problem is that while XML is widely accepted standard supported on all platforms and languages, JSON, even if it might be better, is not so well supported in this moment. For this reason I think it would be cool to be able to offer multiple outputs.

In the end, as you said, it's not that hard to get a library which converts it from one to other, so why we can't use such a library on side of dispatcher instead of forcing developers of clients to seek this library for their language so that they can convert it

On Fri, Mar 1, 2013 at 2:56 PM, Happy Melon happy.melon.wiki@gmail.com wrote:

...

Because we made that mistake with the API, and now we're stuck with a bunch of deadweight formats that do nothing other than increase maintenance costs. If your first preference as a client developer is for JSON, it's really not that hard for you to go get a library to receive it in XML instead, or vice versa. That's the whole point of a standardised format.

--HM

On 1 March 2013 13:48, Petr Bena benapetr@gmail.com wrote:

...
I see that the RFC is considering multiple formats, why not support all of them? We could make the client request the format they like, either XML or JSON, that would be a matter of dispatcher how it produce the output data.

On Fri, Mar 1, 2013 at 2:35 PM, Daniel Friesen daniel@nadir-seen-fire.com wrote:

...
We actually have an open RFC on this topic:

https://www.mediawiki.org/wiki/Requests_for_comment/Structured_data_push_not...

...
-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Tyler Romeo

3:10 p.m.

New subject: switching to something better than irc.wikimedia.org

On Fri, Mar 1, 2013 at 9:04 AM, Petr Bena benapetr@gmail.com wrote:

...

The problem is that while XML is widely accepted standard supported on all platforms and languages, JSON, even if it might be better, is not so well supported in this moment. For this reason I think it would be cool to be able to offer multiple outputs.

JSON is *very* widely supported in almost every language. It would definitely not be a problem if we only supported JSON. Furthermore, JSON represents data using only native types, all of which are existent in PHP. In other words, a PHP/C++/etc. variable can be directly serialized into JSON, whereas in XML this is very much not the case due to attributes and the ability to have child elements of different types, making it much more difficult to implement.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com

Petr Bena

3:16 p.m.

New subject: switching to something better than irc.wikimedia.org

I have not yet found a good and stable library for JSON parsing in c#, should you know some let me know :)

However, I disagree with "I feel like such a project would take an insane amount of resources to develop." If we wouldn't make it insanely complicated, it won't take insane amount of time ;). The cache daemon could be memcached which is already written and stable. Listener is a simple daemon that just listen in UDP, parse the data from mediawiki and store them in memcached in some universal format, and dispatcher is just process that takes the data from cache, convert them to specified format and send them to client.

Sounds easy ;)

On Fri, Mar 1, 2013 at 3:10 PM, Tyler Romeo tylerromeo@gmail.com wrote:

...

On Fri, Mar 1, 2013 at 9:04 AM, Petr Bena benapetr@gmail.com wrote:

...
The problem is that while XML is widely accepted standard supported on all platforms and languages, JSON, even if it might be better, is not so well supported in this moment. For this reason I think it would be cool to be able to offer multiple outputs.

JSON is *very* widely supported in almost every language. It would definitely not be a problem if we only supported JSON. Furthermore, JSON represents data using only native types, all of which are existent in PHP. In other words, a PHP/C++/etc. variable can be directly serialized into JSON, whereas in XML this is very much not the case due to attributes and the ability to have child elements of different types, making it much more difficult to implement.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Tyler Romeo

3:47 p.m.

New subject: switching to something better than irc.wikimedia.org

On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena benapetr@gmail.com wrote:

...

I have not yet found a good and stable library for JSON parsing in c#, should you know some let me know :)

Take a look at http://www.json.org/. They have a list of implementations for different languages.

However, I disagree with "I feel like such a project would take an

...

insane amount of resources to develop." If we wouldn't make it insanely complicated, it won't take insane amount of time ;). The cache daemon could be memcached which is already written and stable. Listener is a simple daemon that just listen in UDP, parse the data from mediawiki and store them in memcached in some universal format, and dispatcher is just process that takes the data from cache, convert them to specified format and send them to client.

Here's a quick list of things that are basic requirements we'd have to implement:

- Multi-threading, which is in and of itself a pain in the a**. - Some sort of queue for messages, rather than hoping the daemon can send out every message in realtime. - Ability for clients to register with the daemon (and a place to store a client list) - Multiple methods of notification (IRC would be one, XMPP might be a candidate, and a simple HTTP endpoint would be a must).

Just those basics isn't an easy task, especially considering unless WMF allocates resources to it the project would be run solely by those who have enough free time. Also, I wouldn't use memcached as a caching daemon, primarily because I'm not sure such an application even needs a caching daemon. All it does is relay messages.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com

Petr Bena

5:04 p.m.

New subject: switching to something better than irc.wikimedia.org

I still don't see it as too much complex. Matter of month(s) for volunteers with limited time.

However I quite don't see what is so complicated on last 2 points. Given the frequency of updates it's most simple to have the client (user / bot / service that need to read the feed) open the persistent connection to server (dispatcher) which fork itself just as sshd does and the new process handle all requests from this client. The client somehow specify what kind of feed they want to have (that's the registration part) and forked dispatcher keeps it updated with information from cache.

Nothing hard. And what's the problem with multithreading huh? :) BTW I don't really think there is a need for multithreading at all, but even if there was, it shouldn't be so hard.

On Fri, Mar 1, 2013 at 3:47 PM, Tyler Romeo tylerromeo@gmail.com wrote:

...

On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena benapetr@gmail.com wrote:

...
I have not yet found a good and stable library for JSON parsing in c#, should you know some let me know :)

Take a look at http://www.json.org/. They have a list of implementations for different languages.

However, I disagree with "I feel like such a project would take an

...
insane amount of resources to develop." If we wouldn't make it insanely complicated, it won't take insane amount of time ;). The cache daemon could be memcached which is already written and stable. Listener is a simple daemon that just listen in UDP, parse the data from mediawiki and store them in memcached in some universal format, and dispatcher is just process that takes the data from cache, convert them to specified format and send them to client.

Here's a quick list of things that are basic requirements we'd have to implement:

Multi-threading, which is in and of itself a pain in the a**.

Some sort of queue for messages, rather than hoping the daemon can

send out every message in realtime.

Ability for clients to register with the daemon (and a place to store

a client list)

Multiple methods of notification (IRC would be one, XMPP might be a

candidate, and a simple HTTP endpoint would be a must).

Just those basics isn't an easy task, especially considering unless WMF allocates resources to it the project would be run solely by those who have enough free time. Also, I wouldn't use memcached as a caching daemon, primarily because I'm not sure such an application even needs a caching daemon. All it does is relay messages.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Yuvi Panda

5:36 p.m.

New subject: switching to something better than irc.wikimedia.org

0mq? RabbitMQ? Seem to fit the use case pretty well / closely.

-- Yuvi Panda T http://yuvi.in/blog

Petr Bena

5:43 p.m.

New subject: switching to something better than irc.wikimedia.org

Closely, but seems a bit overcomplicated to me. What I proposed is as simple as you could just use telnet to retrieve the last changes.

In rabbitMQ for example you need to use 3rd libraries for client so that you can connect to server and obtain some data... But I don't have a problem with using anything that already works and is fast and stable. Just please let's make it better than what we have now (making it worse would be no fun :P)

On Fri, Mar 1, 2013 at 5:36 PM, Yuvi Panda yuvipanda@gmail.com wrote:

...

0mq? RabbitMQ? Seem to fit the use case pretty well / closely.

-- Yuvi Panda T http://yuvi.in/blog _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Asher Feldman

5:46 p.m.

New subject: switching to something better than irc.wikimedia.org

I don't think a custom daemon would actually be needed.

http://redis.io/topics/pubsub

While I was at flickr, we implemented a pubsub based system to push notifications of all photo uploads and metadata changes to google using redis as the backend. The rate of uploads and edits at flickr in 2010 was orders of magnitude greater than the rate of edits across all wmf projects. Publishing to a redis pubsub channel does grow in cost as the number of subscribers increases but I don't see a problem at our scale. If so, there are ways around it.

We are planning on migrating the wiki job queues from mysql to redis in the next few weeks, so it's already a growing piece of our infrastructure. I think the bulk of the work here would actually just be in building a frontend webservice that supports websockets / long polling, provides a clean api, and preferably uses oauth or some form of registration to ward off abuse and allow us to limit the growth of subscribers as we scale.

On Friday, March 1, 2013, Petr Bena wrote:

...

I still don't see it as too much complex. Matter of month(s) for volunteers with limited time.

However I quite don't see what is so complicated on last 2 points. Given the frequency of updates it's most simple to have the client (user / bot / service that need to read the feed) open the persistent connection to server (dispatcher) which fork itself just as sshd does and the new process handle all requests from this client. The client somehow specify what kind of feed they want to have (that's the registration part) and forked dispatcher keeps it updated with information from cache.

Nothing hard. And what's the problem with multithreading huh? :) BTW I don't really think there is a need for multithreading at all, but even if there was, it shouldn't be so hard.

On Fri, Mar 1, 2013 at 3:47 PM, Tyler Romeo <tylerromeo@gmail.comjavascript:;> wrote:

...
On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena <benapetr@gmail.comjavascript:;>

wrote:

...
...
I have not yet found a good and stable library for JSON parsing in c#, should you know some let me know :)

Take a look at http://www.json.org/. They have a list of implementations for different languages.

However, I disagree with "I feel like such a project would take an

...
insane amount of resources to develop." If we wouldn't make it insanely complicated, it won't take insane amount of time ;). The cache daemon could be memcached which is already written and stable. Listener is a simple daemon that just listen in UDP, parse the data from mediawiki and store them in memcached in some universal format, and dispatcher is just process that takes the data from cache, convert them to specified format and send them to client.

Here's a quick list of things that are basic requirements we'd have to implement:

Multi-threading, which is in and of itself a pain in the a**.

Some sort of queue for messages, rather than hoping the daemon can

send out every message in realtime.

Ability for clients to register with the daemon (and a place to

store

...
a client list)

Multiple methods of notification (IRC would be one, XMPP might be a

candidate, and a simple HTTP endpoint would be a must).

Just those basics isn't an easy task, especially considering unless WMF allocates resources to it the project would be run solely by those who

have

...
enough free time. Also, I wouldn't use memcached as a caching daemon, primarily because I'm not sure such an application even needs a caching daemon. All it does is relay messages.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com javascript:; _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Petr Bena

5:54 p.m.

New subject: switching to something better than irc.wikimedia.org

web frontend you say?

if you compare the raw data of irc protocol (1 rc feed message) and raw data of a http request and response for one page consisting only of that 1 rc feed message, you will see a huge difference in size and performance.

Also all kinds of authentication required doesn't seem like an improvement to me. It will only complicate what is simple now. Have there been many attempts to abuse irc.wikimedia.org so far? there is no authentication at all.

On Fri, Mar 1, 2013 at 5:46 PM, Asher Feldman afeldman@wikimedia.org wrote:

...

I don't think a custom daemon would actually be needed.

http://redis.io/topics/pubsub

While I was at flickr, we implemented a pubsub based system to push notifications of all photo uploads and metadata changes to google using redis as the backend. The rate of uploads and edits at flickr in 2010 was orders of magnitude greater than the rate of edits across all wmf projects. Publishing to a redis pubsub channel does grow in cost as the number of subscribers increases but I don't see a problem at our scale. If so, there are ways around it.

We are planning on migrating the wiki job queues from mysql to redis in the next few weeks, so it's already a growing piece of our infrastructure. I think the bulk of the work here would actually just be in building a frontend webservice that supports websockets / long polling, provides a clean api, and preferably uses oauth or some form of registration to ward off abuse and allow us to limit the growth of subscribers as we scale.

On Friday, March 1, 2013, Petr Bena wrote:

...
I still don't see it as too much complex. Matter of month(s) for volunteers with limited time.

However I quite don't see what is so complicated on last 2 points. Given the frequency of updates it's most simple to have the client (user / bot / service that need to read the feed) open the persistent connection to server (dispatcher) which fork itself just as sshd does and the new process handle all requests from this client. The client somehow specify what kind of feed they want to have (that's the registration part) and forked dispatcher keeps it updated with information from cache.

Nothing hard. And what's the problem with multithreading huh? :) BTW I don't really think there is a need for multithreading at all, but even if there was, it shouldn't be so hard.

On Fri, Mar 1, 2013 at 3:47 PM, Tyler Romeo <tylerromeo@gmail.comjavascript:;> wrote:

...
On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena <benapetr@gmail.comjavascript:;>

wrote:

...
...
I have not yet found a good and stable library for JSON parsing in c#, should you know some let me know :)

Take a look at http://www.json.org/. They have a list of implementations for different languages.

However, I disagree with "I feel like such a project would take an

...
insane amount of resources to develop." If we wouldn't make it insanely complicated, it won't take insane amount of time ;). The cache daemon could be memcached which is already written and stable. Listener is a simple daemon that just listen in UDP, parse the data from mediawiki and store them in memcached in some universal format, and dispatcher is just process that takes the data from cache, convert them to specified format and send them to client.

Here's a quick list of things that are basic requirements we'd have to implement:

Multi-threading, which is in and of itself a pain in the a**.

Some sort of queue for messages, rather than hoping the daemon can

send out every message in realtime.

Ability for clients to register with the daemon (and a place to

store

...
a client list)

Multiple methods of notification (IRC would be one, XMPP might be a

candidate, and a simple HTTP endpoint would be a must).

Just those basics isn't an easy task, especially considering unless WMF allocates resources to it the project would be run solely by those who

have

...
enough free time. Also, I wouldn't use memcached as a caching daemon, primarily because I'm not sure such an application even needs a caching daemon. All it does is relay messages.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com javascript:; _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Asher Feldman

6:28 p.m.

New subject: switching to something better than irc.wikimedia.org

On Friday, March 1, 2013, Petr Bena wrote:

...

web frontend you say?

if you compare the raw data of irc protocol (1 rc feed message) and raw data of a http request and response for one page consisting only of that 1 rc feed message, you will see a huge difference in size and performance.

I was sugesting it for websockets or a long poll, the above comparison isn't relevant. Connection is established, with its protocol overhead. It stays open and messages are continually pushed from the server. Not a web request for a page containing one rc message.

Also all kinds of authentication required doesn't seem like an

...

improvement to me. It will only complicate what is simple now. Have there been many attempts to abuse irc.wikimedia.org so far? there is no authentication at all.

Maybe none is needed but I don't think the irc feed interests anyone outside of a very small community. Doing something a little more modern might attract different uses. It might not, but I have no idea.

...

On Fri, Mar 1, 2013 at 5:46 PM, Asher Feldman <afeldman@wikimedia.orgjavascript:;> wrote:

...
I don't think a custom daemon would actually be needed.

http://redis.io/topics/pubsub

While I was at flickr, we implemented a pubsub based system to push notifications of all photo uploads and metadata changes to google using redis as the backend. The rate of uploads and edits at flickr in 2010 was orders of magnitude greater than the rate of edits across all wmf

projects.

...
Publishing to a redis pubsub channel does grow in cost as the number of subscribers increases but I don't see a problem at our scale. If so,

there

...
are ways around it.

We are planning on migrating the wiki job queues from mysql to redis in

the

...
next few weeks, so it's already a growing piece of our infrastructure. I think the bulk of the work here would actually just be in building a frontend webservice that supports websockets / long polling, provides a clean api, and preferably uses oauth or some form of registration to ward off abuse and allow us to limit the growth of subscribers as we scale.

On Friday, March 1, 2013, Petr Bena wrote:

...
I still don't see it as too much complex. Matter of month(s) for volunteers with limited time.

However I quite don't see what is so complicated on last 2 points. Given the frequency of updates it's most simple to have the client (user / bot / service that need to read the feed) open the persistent connection to server (dispatcher) which fork itself just as sshd does and the new process handle all requests from this client. The client somehow specify what kind of feed they want to have (that's the registration part) and forked dispatcher keeps it updated with information from cache.

Nothing hard. And what's the problem with multithreading huh? :) BTW I don't really think there is a need for multithreading at all, but even if there was, it shouldn't be so hard.

On Fri, Mar 1, 2013 at 3:47 PM, Tyler Romeo <tylerromeo@gmail.comjavascript:;

javascript:;>

...
...
wrote:

...
On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena <benapetr@gmail.comjavascript:;

javascript:;>

...
...
wrote:

...
...
I have not yet found a good and stable library for JSON parsing in

c#,

...
...
...
...
should you know some let me know :)

Take a look at http://www.json.org/. They have a list of

implementations

...
...
...
for different languages.

However, I disagree with "I feel like such a project would take an

...
insane amount of resources to develop." If we wouldn't make it insanely complicated, it won't take insane amount of time ;). The cache daemon could be memcached which is already written and stable. Listener is a simple daemon that just listen in UDP, parse the data from mediawiki and store them in memcached in some universal format, and dispatcher is just process that takes the data from cache,

convert

...
...
...
...
them to specified format and send them to client.

Here's a quick list of things that are basic requirements we'd have to implement:

Multi-threading, which is in and of itself a pain in the a**.

Some sort of queue for messages, rather than hoping the daemon

can

...
...
...
send out every message in realtime.

Ability for clients to register with the daemon (and a place to

store

...
a client list)

Multiple methods of notification (IRC would be one, XMPP might

be a

...
...
...
candidate, and a simple HTTP endpoint would be a must).

Just those basics isn't an easy task, especially considering unless

WMF

...
...
...
allocates resources to it the project would be run solely by those who

have

...
enough free time. Also, I wouldn't use memcached as a caching daemon, primarily because I'm not sure such an application even needs a

caching

...
...
...
daemon. All it does is relay messages.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com javascript:;javascript:; _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Tyler Romeo

6:10 p.m.

New subject: switching to something better than irc.wikimedia.org

On Fri, Mar 1, 2013 at 11:46 AM, Asher Feldman afeldman@wikimedia.orgwrote:

...

don't think a custom daemon would actually be needed.

http://redis.io/topics/pubsub

While I was at flickr, we implemented a pubsub based system to push notifications of all photo uploads and metadata changes to google using redis as the backend. The rate of uploads and edits at flickr in 2010 was orders of magnitude greater than the rate of edits across all wmf projects. Publishing to a redis pubsub channel does grow in cost as the number of subscribers increases but I don't see a problem at our scale. If so, there are ways around it.

We are planning on migrating the wiki job queues from mysql to redis in the next few weeks, so it's already a growing piece of our infrastructure. I think the bulk of the work here would actually just be in building a frontend webservice that supports websockets / long polling, provides a clean api, and preferably uses oauth or some form of registration to ward off abuse and allow us to limit the growth of subscribers as we scale.

Interesting. Didn't know Redis had something like this. I'm not too knowledgeable about Redis, but would clients be able to subscribe directly to Redis queues? Or would that be a security issue (like allowing people to access Memcached would be) and we would have to implement our own notification service anyway?

0mq? RabbitMQ? Seem to fit the use case pretty well / closely.

Hmm, I've always only thought of RabbitMQ as a messaging service between linked applications, but I guess it could be used as a type of push notification service as well.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com

Asher Feldman

6:37 p.m.

New subject: switching to something better than irc.wikimedia.org

On Friday, March 1, 2013, Tyler Romeo wrote:

...

On Fri, Mar 1, 2013 at 11:46 AM, Asher Feldman <afeldman@wikimedia.orgjavascript:;

...
wrote:

...
don't think a custom daemon would actually be needed.

http://redis.io/topics/pubsub

While I was at flickr, we implemented a pubsub based system to push notifications of all photo uploads and metadata changes to google using redis as the backend. The rate of uploads and edits at flickr in 2010 was orders of magnitude greater than the rate of edits across all wmf

projects.

...
Publishing to a redis pubsub channel does grow in cost as the number of subscribers increases but I don't see a problem at our scale. If so,

there

...
are ways around it.

We are planning on migrating the wiki job queues from mysql to redis in

the

...
next few weeks, so it's already a growing piece of our infrastructure. I think the bulk of the work here would actually just be in building a frontend webservice that supports websockets / long polling, provides a clean api, and preferably uses oauth or some form of registration to ward off abuse and allow us to limit the growth of subscribers as we scale.

Interesting. Didn't know Redis had something like this. I'm not too knowledgeable about Redis, but would clients be able to subscribe directly to Redis queues? Or would that be a security issue (like allowing people to access Memcached would be) and we would have to implement our own notification service anyway?

I think a very light weight proxy that only passes subscribe commands to redis would work. A read only redis slave could be provided but I don't think it includes a way to limit what commands clients can run, including administrative ones. I think we'd want a thin proxy layer in front anyways, to track and if necessary, selectively limit access. It could be very simple though.

...

0mq? RabbitMQ? Seem to fit the use case pretty well / closely.

Hmm, I've always only thought of RabbitMQ as a messaging service between linked applications, but I guess it could be used as a type of push notification service as well.

...

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com javascript:; _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Tyler Romeo

6:42 p.m.

New subject: switching to something better than irc.wikimedia.org

On Fri, Mar 1, 2013 at 12:37 PM, Asher Feldman afeldman@wikimedia.orgwrote:

...

I think a very light weight proxy that only passes subscribe commands to redis would work. A read only redis slave could be provided but I don't think it includes a way to limit what commands clients can run, including administrative ones. I think we'd want a thin proxy layer in front anyways, to track and if necessary, selectively limit access. It could be very simple though.

Mhm, in that case this might be a viable solution.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com

Jay Ashworth

6:48 p.m.

New subject: switching to something better than irc.wikimedia.org

----- Original Message -----

...

From: "Tyler Romeo" tylerromeo@gmail.com

...

...
I think a very light weight proxy that only passes subscribe commands to redis would work. A read only redis slave could be provided but I don't think it includes a way to limit what commands clients can run, including administrative ones. I think we'd want a thin proxy layer in front anyways, to track and if necessary, selectively limit access. It could be very simple though.

Mhm, in that case this might be a viable solution.

Dumb question: is the work ESR recently did on irker on-topic for this conversation, and did anyone know it existed? :-)

Cheers, -- jra

-- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA #natog +1 727 647 1274

Damian Zaremba

9 Mar 9 Mar

11:26 p.m.

New subject: switching to something better than irc.wikimedia.org

On 01/03/2013 16:46, Asher Feldman wrote:

...

I don't think a custom daemon would actually be needed.

http://redis.io/topics/pubsub

While I was at flickr, we implemented a pubsub based system to push notifications of all photo uploads and metadata changes to google using redis as the backend. The rate of uploads and edits at flickr in 2010 was orders of magnitude greater than the rate of edits across all wmf projects. Publishing to a redis pubsub channel does grow in cost as the number of subscribers increases but I don't see a problem at our scale. If so, there are ways around it.

This is basically the way the new Disqus platforms works also - with a thin http proxy infront and the client side (public, javascript) doing 'sort of long' polling.

Quite an interesting talk from europython - https://www.youtube.com/watch?v=PeVB5DNptD4 - if you're into that sort of thing.

- Damian

Tim Starling

1 Mar 1 Mar

6:56 p.m.

New subject: switching to something better than irc.wikimedia.org

On 01/03/13 06:04, Petr Bena wrote:

...

The problem is that while XML is widely accepted standard supported on all platforms and languages, JSON, even if it might be better, is not so well supported in this moment. For this reason I think it would be cool to be able to offer multiple outputs.

XMPP is XML. If you embed JSON inside it, you are requiring two parsers instead of one, and unnecessarily increasing the complexity.

-- Tim Starling

Tyler Romeo

3:02 p.m.

New subject: switching to something better than irc.wikimedia.org

The RFC doesn't seem to have gotten much interest (only a burst of edits from Krinkle in August and then it died). But interesting nonetheless.

The one thing I do know is that if this were to be implemented, it would probably be pretty complex. It would have to support at least a couple of different push methods (I'd imagine IRC would be one of them for backwards compatibility) and would have to be efficient enough to handle the load of receiving and sending Wikipedia's recent changes. Like Petr said, the client would probably be in C++ or some other language that supports true multithreading and is efficient enough.

At that point, it might as well be its own product, i.e., a generic push notification server similar to Amazon's SNS. I feel like such a project would take an insane amount of resources to develop. Between design, coding, and testing, we'd be lucky to see it implemented by MW 1.25 ;)

Nonetheless, it sounds like a fun project, and if some developers would be interested in putting together a generic C++ push notification server, I'd be happy to help out.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com

Petr Bena

3:07 p.m.

New subject: switching to something better than irc.wikimedia.org

OK Inserted this to hackaton topics as well...

On Fri, Mar 1, 2013 at 3:02 PM, Tyler Romeo tylerromeo@gmail.com wrote:

...

The RFC doesn't seem to have gotten much interest (only a burst of edits from Krinkle in August and then it died). But interesting nonetheless.

The one thing I do know is that if this were to be implemented, it would probably be pretty complex. It would have to support at least a couple of different push methods (I'd imagine IRC would be one of them for backwards compatibility) and would have to be efficient enough to handle the load of receiving and sending Wikipedia's recent changes. Like Petr said, the client would probably be in C++ or some other language that supports true multithreading and is efficient enough.

At that point, it might as well be its own product, i.e., a generic push notification server similar to Amazon's SNS. I feel like such a project would take an insane amount of resources to develop. Between design, coding, and testing, we'd be lucky to see it implemented by MW 1.25 ;)

Nonetheless, it sounds like a fun project, and if some developers would be interested in putting together a generic C++ push notification server, I'd be happy to help out.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Tim Starling

6:47 p.m.

New subject: switching to something better than irc.wikimedia.org

On 01/03/13 05:22, Petr Bena wrote:

...

I believe it would require to create a new daemon (preferably written in c++) which I am willing to write, that could do similar what the ircd does right now. And that is delivery of new change to all connected clients.

When this has been proposed in the past, XMPP has been a popular solution, and could be done without writing new daemons. Arbitrary XML can easily be embedded in it, like Google Wave.

-- Tim Starling

Antoine Musso

3 Mar 3 Mar

8:53 p.m.

New subject: switching to something better than irc.wikimedia.org

Le 01/03/13 09:47, Tim Starling a écrit :

...

On 01/03/13 05:22, Petr Bena wrote:

...
...
I believe it would require to create a new daemon (preferably written in c++) which I am willing to write, that could do similar what the ircd does right now. And that is delivery of new change to all connected clients.

When this has been proposed in the past, XMPP has been a popular solution, and could be done without writing new daemons. Arbitrary XML can easily be embedded in it, like Google Wave.

Tim, you are awesome. I was just having that discussion yesterday and lacked a recent example of your sense of humor :-] Thx!

-- Antoine "hashar" Musso

Chad

1 Mar 1 Mar

5:44 p.m.

New subject: switching to something better than irc.wikimedia.org

On Fri, Mar 1, 2013 at 12:55 AM, Petr Bena benapetr@gmail.com wrote:

...

Hi,

I think that irc feed of recent changes is working great, but there is still a lot of space for improvement.

As Ryan Lane suggested once, we could probably use system of queues instead of irc which would be even more advanced. My suggestion is to create some kind of feed that would be in machine parseable format, like XML

This feed would be distributed by some kind of dispatcher living on some server, like feed.wikimedia.org and offering not just recent changes but also a recent history (for example last 5000 changes per project)

In case that service which is parsing this feed would be down for a moment, it could retrieve a backlog of changes.

The current feed irc.wikimedia.org should stay, but we could change it so that the current bot is retrieving the data from new xml feed instead of directly from apaches.

There's been a request for years to provide this data with XMPP. https://bugzilla.wikimedia.org/17450

https://bugzilla.wikimedia.org/30555 also seems related to the RFC.

-Chad

Victor Vasiliev

7:10 p.m.

New subject: switching to something better than irc.wikimedia.org

On 03/01/2013 03:55 AM, Petr Bena wrote:

...

As Ryan Lane suggested once, we could probably use system of queues instead of irc which would be even more advanced. My suggestion is to create some kind of feed that would be in machine parseable format, like XML

Whatever you do, please, don't use XML. Whenever you use XML, a unicorn dies a painful death. Think of the unicorns.

Whatever we end up with, I believe it should use JSON. Or at least YAML, though it's not as widely supported as JSON.

Also, we will need a WebSocket interface in addition to whatever other protocol we will have.

-- Victor.

Tyler Romeo

7:12 p.m.

New subject: switching to something better than irc.wikimedia.org

On Fri, Mar 1, 2013 at 1:10 PM, Victor Vasiliev vasilvv@gmail.com wrote:

...

Whatever you do, please, don't use XML. Whenever you use XML, a unicorn dies a painful death. Think of the unicorns.

My sentiments exactly. Ever try writing XML schema? It's painful.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com

4156

Age (days ago)

4164

Last active (days ago)

wikitech-l@lists.wikimedia.org

28 comments

12 participants

tags (0)

participants (12)

Antoine Musso
Asher Feldman
Chad
Damian Zaremba
Daniel Friesen
Happy Melon
Jay Ashworth
Petr Bena
Tim Starling
Tyler Romeo
Victor Vasiliev
Yuvi Panda