Subject: stream.wikimedia.org - stream retention change
Date: Mon, 21 Mar 2022 09:00:59 -0400
From: Andrew Otto <otto@wikimedia.org>


tl;dr: all publicly available event streams at stream.wikimedia.org will have their retention time set to 7 days.

Many of the streams available at stream.wikimedia.org have retention times of 31 days. This means that at any given time, the past 31 days of these streams are consumable.

Sometimes, within these streams, certain data may accidentally contain personally identifiable information. For example, someone might accidentally enter their personal email into a revision comment field. On the wikis, this information can be quickly suppressed so that it is not viewable externally. However, because streams are historical and immutable, it is difficult to remove this information from the stream history. 

To help mitigate the risk of PII exposure, we are reducing the retention of these streams to 7 days. We plan to make this change on Monday April 4th 2022.


In the future, we would like to intentionally remove this data from streams. Doing so requires us to maintain new services that produce new streams with PII information redacted. Doing this is not a trivial thing to stand up, hence this mitigation effort for now.

-Andrew Otto

Wikimedia Foundation