On Mon, Jan 24, 2011 at 4:02 PM, Platonides <Platonides(a)gmail.com> wrote:
The original spec had feedback based precisely on
enwiki numbers.
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022220.html
So about 100? Note that there are invalid addresses marked as confirmed
in wikipedia.
Ok so from the breakdown at
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022237.html…
202 email address records that were marked as confirmed, but failed
the
proposed validation check at the time and couldn't be corrected by stripping
whitespace:
The breakdown of the 202 is as follows.
Reordered into:
Now allowed by the current revision of the HTML 5 spec as implemented in
User::isValidEmailAddr:
Single trailing dot in local part: 40 (prohibited by
RFC but plausibly
deliverable)
Multiple consecutive dots: 20 (prohibited by RFC but
plausibly
deliverable)
Easily correctable by the user removing the extra bits upon being prompted,
as doing so would not change the actual delivery:
* Single trailing dot in domain part: 100 (prohibited
by RFC but plausibly
deliverable)
Valid address in angle brackets (with other junk
around it): 21
(permitted by RFC, kind of, and plausibly deliverable)
* Comment: 3 (permitted by RFC and plausibly deliverable)
v---- LINE OF DOOM ---v
Clearly wrong in typical context, should indeed be rejected (or changed to
@localhost for legit cases):
* No @: 9 (unlikely to be deliverable)
Not quite sure what's going on but most look like stray chars that would be
ignored or else invalid and possibly bogusly marked as confirmed:
* Miscellaneous: 9 (one containing [NO]@[SPAM], two
with trailing >,
one in "quotes", one with single leading dot in local part, two with
single leading comma in local part, one with leading ": ", one with
leading "\")
So from the August 2009 survey on English Wikipedia, that leaves 18 email
addresses out of over 3 million listed as confirmed, of which a few *might*
be deliverable addresses that could not be fixed by the user tweaking them
during input (ie, they actually rely on those extra chars being there in
order to be delivered to the right person).
To me it sounds like we're pretty good with this; it wouldn't hurt to make
sure that existing addresses that are stored funny (eg with extra whitespace
or trailing dots on the domain name) continue to work as long as they've
been previously.
Also wouldn't hurt to do a current survey, and to include some other
language sites.
Of interest -- gmail's validation rules were also posted in that thread:
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022268.html
-- brion