Thanks for the feedback everyone!
Due to the simplicity of the HTTP stream model, we are moving forward with
that, instead of websockets/socket.io. We hope to have an initial version
of this serving existent EventBus events this quarter. Next we will focus
on more features (filtering), and also work towards deprecating both
RCStream and RCFeed.
You can follow progress of this effort on Fabricator:
On Thu, Sep 29, 2016 at 10:29 AM, Marko Obrovac <mobrovac(a)wikimedia.org>
wrote:
Hello,
Regarding Wikidata, it is important to make the distinction here between
the WMF internal use and public-facing facilities. The underlying
sub-system that the public event streams will be relying on is called
EventBus~[1], which is (currently) comprised of:
(i) The producer HTTP proxy service. It allows (internal) users to produce
events using a REST HTTP interface. It also validates events against the
currently-supported set of JSON event schemas~[2].
(ii) The Kafka cluster, which is in charge of queuing the produced events
and delivering them to consumer clients. The event streams are separated
into topics, e.g. a revision-create topic, a page-move topic, etc.
(iii) The Change Propagation service~[3]. It is the main Kafka consumer at
this point. In its most basic form, it executes HTTP requests triggered by
user-defined rules for certain topics. The aim of the service is to able to
update dependant entities starting from a resource/event. One example is
recreating the needed data for a page when it is edited. When a user edits
a page, ChangeProp receives an event in the revision-create topic and sends
a no-cache request to RESTBase to render it. After RB has completed the
request, another request is sent to the mobile content service to do the
same, because the output of the mobile content service for a given page
relies on the latest RB/Parsoid HTML.
Currently, the biggest producer of events is MediaWiki itself. The aim of
this e-mail thread is to add a forth component to the system - the public
event stream consumption. However, for the Wikidata case, we think the
Change Propagation service should be used (i.e. we need to keep it
internal). If you recall, Daniel, we did kind of start talking about
putting WD updates onto EventBus in Esino Lario.
In-lined the responses to your questions.
On 27 September 2016 at 14:50, Daniel Kinzler <daniel.kinzler(a)wikimedia.de
wrote:
> Hey Gergo, thanks for the heads up!
> The big questions here is: how does
it scale? Sending events to 100
> clients may
> work, but does it work for 100 thousand?
Yes, it does. Albeit, not instantly. We limit the concurrency of execution
to mitigate huge spikes and overloading the system. For example, Change
Propagation handles template transclusions: when a template is edited, all
of the pages it is transcluded in need to re-rendered, i.e. their HTMLs
have to be recreated. For important templates, that might mean re-rendering
millions of pages. The queue is populated with the relevant pages and the
backlog is "slowly" processed. "Slowly" here refers to the fact that
at
most X pages are re-rendered at the same time, where X is governed by the
concurrency factor. In the concrete example of important templates, it
usually takes a couple of days to go through the backlog of re-renders.
> And then there's several more
important details to sort out: What's the
> granularity of subscription - a wiki? A page? Where does filtering by
> namespace
> etc happen?
As Andrew noted, the basic granularity is the topic, i.e. the type/schema
of the events that are to be received. Roughly, that means that a consumer
can obtain either all page edits, or page renames (for all WMF wikis)
without performing any kind of filtering. Change Propagation, however,
allows one to filter events out based on any of the fields contained in the
events themselves, which means you are able to receive only events for a
specific wiki, a specific page or namespace. For example, Change
Propagation already handles situations where a Wikidata item is edited: it
re-renders the page summaries for all pages that the given item is
transcluded in, but does so only for the
www.wikidata.org domain and
namespace 0~[4].
How big is the latency?
For MediaWiki events, the observed latency of acting on an event has been
at most a couple of hundred milliseconds on average, but it is usually
below that threshold. There are some events, though, which lag behind up to
a couple of days, most notably big template updates / transclusions. This
graph~[5] plots Change Propagation's delay in processing the events for
each defined rule. The "backlog per rule" metric measures the delay between
event production and event consumption. Here, event production refers to
the time stamp MediaWiki observed the event, while event consumption refers
to the time that Change Propagation dequeues it from Kafka and starts
executing it.
> How does recovery/re-sync work after
> disconnect/downtime?
Because relying on EventBus and, specifically, Change Propagation, means
consuming events via push HTTP requests, the receiving entity does not have
to worry about this in this context (public event streams are different
matter, though). EventBus handles offsets internally, so even if Change
Propagation stops working for some time or cannot connect to Kafka, it will
resume processing events form where it left off once the pipeline is
accessible again. If, on the other hand, the service receiving the HTTP
requests is down or unreachable, Change Propagation has a built-in retry
mechanism that is triggered to resend requests whenever an erroneous
response is received from the service.
I hope this helps. Would be happy to talk more about this specific topic
some more.
Cheers,
Marko
> I have not read the entire
conversation, so the answers might already be
> there -
> my appologies if they are, just point me there.
> Anyway, if anyone has a good solution
for sending wiki-events to a large
> number
> of subscribers, yes, please let us (WMDE/Wikidata) know about it!
> Am 26.09.2016 um 22:07 schrieb Gergo
Tisza:
> > On Mon, Sep 26, 2016 at 5:57 AM, Andrew Otto <otto(a)wikimedia.org>
wrote:
>
> >> A public resumable
stream of Wikimedia events would allow folks
> >> outside of WMF networks to build realtime stream processing tooling on
> top
> >> of our data. Folks with their own Spark or Flink or Storm clusters
(in
> >> Amazon or labs or wherever) could consume this and perform complex
> stream
> >> processing (e.g. machine learning algorithms (like ORES), windowed
> trending
> >> aggregations, etc.).
> >>
>
> > I recall WMDE trying
something similar a year ago (via PubSubHubbub)
and
getting
vetoed by ops. If they are not aware yet, might be worth
contacting
> them and asking if the new streaming service would cover their use
cases
> > (it was about Wikidata change invalidation on third-party wikis, I
> think).
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l(a)lists.wikimedia.org
> >
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> --
> Daniel Kinzler
> Senior Software Developer
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
_______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[1]
https://www.mediawiki.org/wiki/EventBus
[2]
https://github.com/wikimedia/mediawiki-event-schemas/tree/
master/jsonschema
[3]
https://www.mediawiki.org/wiki/Change_propagation
[4]
https://github.com/wikimedia/mediawiki-services-change-
propagation-deploy/blob/ea8cdf85e700b74918a3e59ac6058a
1a952b3e60/scap/templates/config.yaml.j2#L556
[5]
https://grafana.wikimedia.org/dashboard/db/eventbus?panelId=10&fullscre…
--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l