On Mon, Jan 24, 2011 at 4:02 PM, Platonides Platonides@gmail.com wrote:
The original spec had feedback based precisely on enwiki numbers. http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022220.html
So about 100? Note that there are invalid addresses marked as confirmed in wikipedia.
Ok so from the breakdown at http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022237.htmlw... 202 email address records that were marked as confirmed, but failed the proposed validation check at the time and couldn't be corrected by stripping whitespace:
The breakdown of the 202 is as follows.
Reordered into:
Now allowed by the current revision of the HTML 5 spec as implemented in User::isValidEmailAddr:
Single trailing dot in local part: 40 (prohibited by RFC but plausibly
deliverable)
Multiple consecutive dots: 20 (prohibited by RFC but plausibly
deliverable)
Easily correctable by the user removing the extra bits upon being prompted, as doing so would not change the actual delivery:
- Single trailing dot in domain part: 100 (prohibited by RFC but plausibly
deliverable)
Valid address in angle brackets (with other junk around it): 21 (permitted by RFC, kind of, and plausibly deliverable)
- Comment: 3 (permitted by RFC and plausibly deliverable)
v---- LINE OF DOOM ---v
Clearly wrong in typical context, should indeed be rejected (or changed to @localhost for legit cases):
- No @: 9 (unlikely to be deliverable)
Not quite sure what's going on but most look like stray chars that would be ignored or else invalid and possibly bogusly marked as confirmed:
- Miscellaneous: 9 (one containing [NO]@[SPAM], two with trailing >,
one in "quotes", one with single leading dot in local part, two with single leading comma in local part, one with leading ": ", one with leading "")
So from the August 2009 survey on English Wikipedia, that leaves 18 email addresses out of over 3 million listed as confirmed, of which a few *might* be deliverable addresses that could not be fixed by the user tweaking them during input (ie, they actually rely on those extra chars being there in order to be delivered to the right person).
To me it sounds like we're pretty good with this; it wouldn't hurt to make sure that existing addresses that are stored funny (eg with extra whitespace or trailing dots on the domain name) continue to work as long as they've been previously.
Also wouldn't hurt to do a current survey, and to include some other language sites.
Of interest -- gmail's validation rules were also posted in that thread: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-August/022268.html
-- brion