tl;dr
Ignore this email if you do not use MediaWiki event streams.
On Monday December 11 2023, all MediaWiki related event streams will
have artificial
canary events
<https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams#Canary_Even…>
injected into them. If you use any of these streams, you should discard
thesee canary events.
Add code to your consumers that discards events where meta.domain ==
"canary".
Canary Events
At WMF, we use artificial 'canary' AKA 'heartbeat' events
<https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#can…>
to differentiate between a broken event stream and an empty event stream.
Canary events should be produced at least once an hour. If there are no
events in a stream for an hour, then something is likely broken with that
stream.
These artificial canary events can be identified by the fact that their
meta.domain field is set to "canary". If you use any of the streams listed
below, you will need to add code that discards any events where meta.domain
== "canary".
Back in 2020, we began producing canary events into all new streams, but we
never got around to enabling these for streams that already existed. We
needed to ensure that all consumers of these streams filtered out the
canary events. We're just finally getting around to enabling canary events
for all streams.
We will enable canary event production
<https://phabricator.wikimedia.org/T266798> for the following streams on
Monday, December 11th, 2023:
- mediawiki.recentchange
- mediawiki.page-create
- mediawiki.page-delete
- mediawiki.page-links-change
- mediawiki.page-move
- mediawiki.page-properties-change
- mediawiki.page-restrictions-change
- mediawiki.page-suppress
- mediawiki.page-undelete
- mediawiki.revision-create
- mediawiki.revision-visibility-change
- mediawiki.user-blocks-change
- mediawiki.centralnotice.campaign-change
- mediawiki.centralnotice.campaign-create
- mediawiki.centralnotice.campaign-delete
If you consume any of these streams, either external to WMF networks using
EventStreams, or internally using Kafka, please ensure that your consumer
logic discards events where meta.domain == "canary" before this date. (Note
that not all of these streams are exposed publicly at stream.wikimedia.org
<https://stream.wikimedia.org/?doc#/streams>.)
Thank you,
-Andrew Otto & the WMF Data Engineering team
<https://wikitech.wikimedia.org/wiki/Data_Engineering>
References
- T266798 - Enable canary events for all MediaWiki streams
<https://phabricator.wikimedia.org/T266798>
- T251609 - Automate ingestion and refinement into Hive of event data from
Kafka using stream configs and canary/heartbeat events
<https://phabricator.wikimedia.org/T251609>
On Mon, 23 Oct, 2023 at 9:14 AM <
analytics-announce-bounces(a)lists.wikimedia.org> wrote:
> Your mail to 'analytics-announce(a)lists.wikimedia.org' with the subject
>
>
>
> Is being held until the list moderator can review it for approval.
>
> The message is being held because:
>
> Message has more than 10 recipients
> Message has no subject
>
> Either the message will get posted to the list, or you will receive
> notification of the moderator's decision.
>
Hello,
We need to schedule a reboot of the servers that provide copies of the
Mediawiki databases for analytics purposes.
https://wikitech.wikimedia.org/wiki/Analytics/Systems/MariaDB
These are the servers: dbstore1003,dbstore1005, and dbstore1007.
I'm intending to carry out this work at 09:30 UTC next Tuesday the 9th
of May. I will restart all three servers in succession, so I expect the
maintenance to be complete within approximately 30 minutes.
Please note that the Wiki Replica databases are not affected by this
maintenance: https://wikitech.wikimedia.org/wiki/Wiki_Replicas
Please do let me know if you have any queries or if this choice of
maintenance window is likely to cause you any inconvenience.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
Apologies for the short notice. The SRE team will be carrying out an
upgrade of the switches in eqiad row D later today
(https://phabricator.wikimedia.org/T333377) at approximately 14:00 UTC.
The network outage to this row resulting from this work is expected to
be around 30 minutes, all being well.
In support of this work, the Data Engineering team will be putting HDFS
file system into safe mode at approximately 13:30 today, which means
that write operations to the cluster will be refused.
Jobs sent to the YARN cluster will also be refused from around the same
time, so please try to plan any work that you may have for the cluster
to avoid this maintenance window.
Read-only access to Hive, Presto, Superset, Turnilo, should continue to
function normally throughout the maintenance window.
Finally, two of the stats servers (stat1005 and stat1006) will be
unavailable, so please save any work that you may have on these servers
before the loss of connectivity.
Please do reach out via any of the normal channels (email:
analytics(a)lists.wikimedia.org , IRC: #wikimedia-analytics , Slack
#data-engineering ) if you have any queries or concerns.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
The SRE team will be carrying out an upgrade of the switches in eqiad row
C: (https://phabricator.wikimedia.org/T331882) at 13:00 UTC. The network
outage to this row resulting from this work is expected to be around 30
minutes, all being well.
In support of this work, the Data Engineering team will be putting HDFS
file system into safe mode at approximately 12:30 UTC today, which means
that write operations to the cluster will be refused.
Jobs sent to the YARN cluster will also be refused from around the same
time, so please try to plan any work that you may have for the cluster to
avoid this maintenance window.
Some additional internal-facing services for analytics such as Hive,
Superset, Presto, and the Druid-analytics cluster will also be largely
unavailable for some periods while the switch upgrade takes place.
The public-facing Analytics Query Service (AQS) will continue to function,
albeit with a degraded response to some queries. However Wikistats (
stats.wikimedia.org) will be unavailable whilst the switch upgrade is in
progress.
Finally, two of the stats servers, stat1007 and stat1009, will be
unavailable, so please save any work that you may have on these servers
before the loss of connectivity.
Please reach out via any of the normal channels (email:
analytics(a)lists.wikimedia.org , IRC: #wikimedia-analytics , Slack
#data-engineering ) if you have any queries or concerns.
Kind regards,
Steve Munene
Hi everybody,
Today the SRE team will perform network maintenance in our eqiad cluster,
and Superset/Turnilo/Matomo UIs may be unavailable between 13:00 and 15:00
UTC. Please check https://phabricator.wikimedia.org/T331882 for more info.
As always if you have any questions please reach out to the Data
Engineering team :)
Luca
Hello,
Tomorrow the SRE team will be carrying out an upgrade of the switches in
eqiad row B: (https://phabricator.wikimedia.org/T330165) at 14:00 UTC.
The network outage to this row resulting from this work is expected to
be around 30 minutes, all being well.
In support of this work, the Data Engineering team will be putting HDFS
file system into safe mode at approximately 13:30 UTC tomorrow, which
means that write operations to the cluster will be refused.
Jobs sent to the YARN cluster will also be refused from around the same
time, so please try to plan any work that you may have for the cluster
to avoid this maintenance window.
Some additional internal-facing services for analytics such as Hive,
Superset, Presto, and the Druid-analytics cluster will also be largely
unavailable for some periods while the switch upgrade takes place.
The public-facing Analytics Query Service (AQS) will continue to
function, albebeit with a degraded response to some queries. However
Wikistats (stats.wikimedia.org) will be unavailable whilst the switch
upgrade is in progress.
Finally, two of the stats servers, stat1007 and stat1009, will be
unavailable, so please save any work that you may have on these servers
before the loss of connectivity.
Please do reach out via any of the normal channels (email:
analytics(a)lists.wikimedia.org , IRC: #wikimedia-analytics , Slack
#data-engineering ) if you have any queries or concerns.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>