TLDR:

The ticket sounds much worse than it is.

The basic request:

Echo (is currently) will be sending e-mails appearing as from four e-mail addresses:
* (currently): notifications@mediawiki.org (for mediawiki.org notifications)
* (currently): no-reply-notifications@mediawiki.org (for mediawiki.org notifications)
* (April): notifications@wikipedia.org (for enwiki notifications)
* (April): no-reply-notifications@wikipedia.org (for enwiki notifications)

Currently, Pre-echo talk-page notifications are sent through the system as from two e-mail address:
* from: wiki@wikimedia.org
* Reply is: reply@not.possible

If necessary, can ops makes sure the new addresses go down a black hole? :-) If possible, logging the counts of the number of e-mails sent to no-reply-* before deletion would be nice.

Outgoing:

The RT ticket only covers a confirmation that incoming e-mails are dropped. For outgoing e-mails, right now, e-mails are sent through the current SMTP/Sendmail infrastructure for talk page notifications (pre and post echo) and this is unchanged.

The load is in the same ballpark of pre and post echo with some modest increase expected due to (mostly) new user focused notifications.

Expectation of load:

* Most of echo currently just replaces the existing talk page e-mail notifications, so expect the volume to be relatively unchanged (or a modest increase in traffic) than currently
* Features like watchlist notifications do not exist in echo, if they did, they'd not send e-mail notifications due to flood. These "stream" like notifications would be part of "Flow", not "Echo".
* Over time, this may increase as Echo becomes a more permanent part of the wiki infrastructure, so some sort of TechOps monitoring (if not already in place) is advisable at some point.
* The EE team will give more accurate estimates going forward

Process stuff:

* Ideally, this would have been caught and requested by a SRE
* Until then, in the medium term Luke Welling will substitute as a point of contact in Features with Ops.
* For this RT, in the short term Ryan Kaldari will be the engineering point of contact in Features with Ops with Luke assisting.

On Mar 20, 2013, at 5:52 PM, Faidon Liambotis <faidon@wikimedia.org> wrote:

On Wed, Mar 20, 2013 at 11:44:18PM +0000, Fabrice Florin via RT wrote:
Dear Wikimedia Operations,

Please set up the following 4 email accounts for the Echo notifications
project, if they are available:

<snip>

We'd be grateful if you could set these up this week, as we plan to start
using these addresses widely on our live mediawiki.org and
wikipedia.orgsites by the week of March 25, 2013.

<snip>

Note that we expect a very high volume of emails to be sent 'from' the
< notifications@wikipedia.org> address this year (e.g. millions of
emails per month), and a lower volume of responses 'to' the <no-reply>
addresses (e.g.  thousands of emails per month).

As far as I know, this deployment wasn't announced before in ops@ or
engineering@, it wasn't mentioned on the last (or any previous) project
& deployment meetings and it's not on the roadmap update sent by Greg to
engineering@ earlier this week. I also don't see anything relevant or
clear enough in the engineering roadmap spreadsheet (there's a "[Mar 21]
mediawiki release" entry, but I don't understand what that means).


The plan is to roll out a new version of echo on wikimedia.org today (Thursday March 21). Then it would be rolled out (almost) two weeks later on enwiki on April 2nd(?) contingent on no unresolved issues and exchanging release dates with AFTv5 according to plan. I believe the Engineerng Projects Roadmap[1] reflect this, as these dates became more settled (I believe it has been on the Roadmap for months, but the dates were not put in until last week or earlier this week).

The need for an e-mail address was not reflected in the above, due to a dropped ball on our part. Partially this is I didn't even know the requirement until yesterday. :-) The requirement has been documented in the Echo Feature's requirements wiki page[2] since at least late November, so I think nobody put 2 and 2 together because it's placement in the sample message on the document (one would have to also know that this is not the e-mail address of existing notifications[3]). Hopefully this sort of shortfall would be caught in the future by a SRE.


I don't think it's reasonable to drop this on ops with two working days
notice and expect this to be implemented in time for such a deadline of
yours.

This is valid. I'm not certain myself if this is a requirement for release or as urgent as it appears in the ticket.

It is important to note how existing notifications are currently handled[3] (without Echo). Currently Talk Page notifications are sent from "wiki@wikimedia.org" with a reply-to being "reply@not.possible." The impression I get is that Legal has determined this is needs to be changed going forward:

1) wikimedia.org domain should be reserved for WMF-related stuff
2) the domain of the originating wiki's notification should identify itself as the origin of the notification
3) As for "reply@not.possible" I don't want even to know what the legal ramifications are there. ;-)

My understanding is that LCA put this into a requirement in Echo early in the process to fix this. Since 90%(?) of echo's e-mail notifications are simply a replacement of the current talk page notification, fixing this technical debt requirement sidecar'd Echo'.

It's also not reasonable in my opinion to *make* such plans and request
the operations group to implement them without ever involving them on
decisions for such a system and without getting those plans vetted by
our architects (unless this has happened; apologies for the
miscommunication if so).

My thinking as to what happened was that long ago, someone cheekily put it wiki@wikimedia.org and reply@not.possible to satisfy the engineering requriements of talk page notification, and the Community team has been dealing with the consequences every since. Since Echo's notifications replace this one, it made sense to request to fix this problem. When the request percolated through the system, it was observed that email notification address creations were always handled by IT, not Ops (because of the wikimedia.org domain name), so the channels request went through there. It was then determined by IT that they control of the MX on mediawiki.org and wikipedia.org domains belongs to Ops and the request was forwarded there. (This kind of makes me wonder if the original choice of using wiki@wikimedia.org was designed to avoid making Ops aware of talk page e-mail notifications being used at all.)

If necessary, pursuant of LCA approval, leave this technical debt unpaid and go back to the original wiki@wikimedia.org and reply@not.possible, as these are just configuration changes in LocalSettings.php. Almost all the Echo e-mail traffic volume right now simply replaces the existing Talk page notification system with the current one.

Just to give a few examples:

You mention "high volume" and "millions" but not much more than that.
How many millions is "millions"? How high volume are we talking about?
At what rate and what about peaks/bursts? We can't "just" create an
alias for those domains if it's high, unless you want all of foundation
staff & list mails to be stuck in two poor old overloaded machines with
queues filed with presumably hundred of thousands of bounces. Or maybe
that's just too far fetched, but again, we can't just plan and deploy
lightly and without numbers.

They'll have accurate estimates, but this part of the request is playing a bit loose with the timeline.

First, it is important to emphasize that this is sent through the same SMTP/Sendmail that we're using for talk page notifications, the only thing changed is the from and reply to:

Currently and for the release Echo is just going to replace the existing talk page e-mail notification so e-mail traffic volume is essentially unchanged delta the following:

* echo introduces some new notifications focused on new users (so at the rate of user-registration, we can expect extra e-mails to be sent (~30k/month?)
* echo introduces a couple(?) new notifications focused on essential tasks that active editors/admins would like as e-mail notifications instead of checking pages manually. However, very few editors/admins are involved here so I'd expect this to increase ~100/month
* echo introduces bundling, so if a user is being e-mailed too much, subsequent e-mails are throttled and bundled. so expect e-mail volume to decrease.

This may change over time as Echo becomes more used, so if the current mailservers sending talk page notifications might get loaded at a rate that will increase faster than standard wiki growth. Some monitoring will be important going forward.


What about the *outgoing* mail relay? What makes you think it's capable
of sustaining the volume you're about to send to it? It might be the
case, but this needs to be considered internally.

The outgoing relay is the same as currently for talk page notications. In Mediawiki the PEAR Mail library is identical, the only difference is the payload can support HTML mimetype mail.

You also don't seem to have thought about VERP at all. How are we going
to invalidate non-existing, full or unavailable email addresses?

Am I correct, VERP is for fixing broken outgoing e-mails that result in bounces? Echo uses whatever the existing solution is handled for talk page notifications. If there is none, then this is technical debt that Features should be working with Ops to eliminate going forward.

Why do you need the no-reply counts? How are going to use this
information?

I don't know, Fabrice, can you field this request? I imagine this is a was introduced by LCA. I don't believe this is a requirement for Echo release.

I think I speak for the team when I say that this needs to be thought
and coordinated more and that we can't just drop everything we're
currently doing to do this within your deadline.

I hope an explanation of the RT ticket has helped greatly toward tlowering the burden on ops. Beyond that, I'm sure we can make allowances to make this not be an undue burden on ops.

[1]: https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0Aoizbfxc5g6KdEkza0xkQnJlM0o0TXlwQXhDOUFvYnc#gid=0
[2]: http://www.mediawiki.org/wiki/Echo_(Notifications)/Feature_requirements#Email_notifications
[3]: http://en.wikipedia.org/wiki/Help:Email_notification


terry chay  최태리
Director of Features Engineering
Wikimedia Foundation
“Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment.

p: +1 (415) 839-6885 x6832
m: +1 (408) 480-8902
aim: terrychay