Hey Gergo, thanks for the heads up!
The big questions here is: how does it scale? Sending events to 100 clients may work, but does it work for 100 thousand?
And then there's several more important details to sort out: What's the granularity of subscription - a wiki? A page? Where does filtering by namespace etc happen? How big is the latency? How does recovery/re-sync work after disconnect/downtime?
I have not read the entire conversation, so the answers might already be there - my appologies if they are, just point me there.
Anyway, if anyone has a good solution for sending wiki-events to a large number of subscribers, yes, please let us (WMDE/Wikidata) know about it!
Am 26.09.2016 um 22:07 schrieb Gergo Tisza:
On Mon, Sep 26, 2016 at 5:57 AM, Andrew Otto otto@wikimedia.org wrote:
A public resumable stream of Wikimedia events would allow folks outside of WMF networks to build realtime stream processing tooling on top of our data. Folks with their own Spark or Flink or Storm clusters (in Amazon or labs or wherever) could consume this and perform complex stream processing (e.g. machine learning algorithms (like ORES), windowed trending aggregations, etc.).
I recall WMDE trying something similar a year ago (via PubSubHubbub) and getting vetoed by ops. If they are not aware yet, might be worth contacting them and asking if the new streaming service would cover their use cases (it was about Wikidata change invalidation on third-party wikis, I think). _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l