On Thu, Jul 1, 2021 at 3:10 PM Andrew Otto <otto@wikimedia.org> wrote:
This isn't helpful now, but your use case is relevant to something I hope to pursue in the future: comprehensive mediawiki change events, including content.  I don't have a great place yet for collecting these use cases, so I added it to Modern Event Platform parent ticket so I don't forget. :)


I don't think this is the use-case at all. As someone else already pointed out, diffs don't always give you the context and might be unparsable wikitext. So what you can do is either:
1) Send always the full content of the page changed in the stream, along with the diff. This is IMHO extremely wasteful, but it's also easy to implement
2) find a way to analyze the edits  and emit specialized event tags that define what has changed. This is the correct way to go forward, IMHO, but it requires much more engineering time.

I don't think there is really a big value in adding the full content of the page to every edit event. I'd rather suggest that people fetch the parsoid HTML from the API, and ensure we do good edge-side caching.


Cheers,

Giuseppe
P.S. Please note that I'm only referring to streams offered to tools and in general to the public internet. Internally to the production cluster the use of content in events might (or might not) prove directly useful in some cases.
 

--
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation