I totally support Brion's fervor to change this. But I'm not so sure
about the "greater than zero" criterion.
The "comma" trick was a good kludge to get at the idea that random
junk is not an article. Pretty much anything with a sentence or two
will have a comma (in English) and thus will constitute an article,
though perhaps just a stub.
It would be interesting to see some quick statistics, if that's
possible, on some various methods, and how the counts are affected.
zero bytes = ? articles
100 bytes = ? articles
500 bytes = ? articles
1000 bytes = ? articles
Any single statistics is going to be limited in the information that
it conveys. It might also be fun to look at: total bytes in all
"articles" (defined different ways), average bytes per page in article
namespace, histogram of number of articles of various lengths, etc.
Of course, it's easy for me to sit here and type up a dream list of
statistics. :-)
--Jimbo