Hey Magnus,
I think we should implement the greatly-pared-down counting scheme we were discussing on the list earlier. We don't have 90K articles. We have about 50K articles and about 40K geographical entries automatically created from a database.
Larry
Larry Sanger wrote:
I think we should implement the greatly-pared-down counting scheme we were discussing on the list earlier. We don't have 90K articles. We have about 50K articles and about 40K geographical entries automatically created from a database.
How about "90K entries, 50K of them articles" on the Main Page? (well, the last part needs rewording...)
I thought of something else: Automated "evaluation" of articles. I'm uncertain if that has been mentioned before, though.
We could add a score to each article. Certain properties are usually good: * Many edits (as well as number of contributors) * many outgoing internal links * image links * links to other languages
Furthermore, the following could be scored somehow: * text to list (* and #) ratio * text to external links ratio * text to headings ratio * text to number of paragraphs ratio
Granted, such a scoring method won't be able to separate all good and bad articles from each other. But, it could find many "bad" articles (lists, text dumps, stubs, etc.), and give us some kind of statistics about where we stand.
Comments?
Magnus
Hi Magnus,
I don't think such a complex scheme is necessary. As I posted previously, it will probably suffice to count only articles that have >=2 edits. That catches most bots and stubs, and leaves a reasonable number of articles.
If nobody else will implement this, I will put it on my To-Do-List.
Regards,
Erik
Magnus Manske wrote:
Larry Sanger wrote:
I think we should implement the greatly-pared-down counting scheme we were discussing on the list earlier. We don't have 90K articles. We have about 50K articles and about 40K geographical entries automatically created from a database.
How about "90K entries, 50K of them articles" on the Main Page? (well, the last part needs rewording...)
Good idea.
I thought of something else: Automated "evaluation" of articles. I'm uncertain if that has been mentioned before, though.
We could add a score to each article. Certain properties are usually good:
- Many edits (as well as number of contributors)
- many outgoing internal links
- image links
- links to other languages
Furthermore, the following could be scored somehow:
- text to list (* and #) ratio
- text to external links ratio
- text to headings ratio
- text to number of paragraphs ratio
Granted, such a scoring method won't be able to separate all good and bad articles from each other. But, it could find many "bad" articles (lists, text dumps, stubs, etc.), and give us some kind of statistics about where we stand.
Good idea. But don't be too harsh on * and #. I've often turned a large paragraph of "there are three types of trout commonly found, the blah trout which is ... ... ... , the foo trout which is ..." into a clearer list format.
Larry Sanger wrote:
I think we should implement the greatly-pared-down counting scheme we were discussing on the list earlier. We don't have 90K articles. We have about 50K articles and about 40K geographical entries automatically created from a database.
Why don't we just say "thousands and thousands of articles"?
Since we've agreed that there's no certain way to automatically detect which articles are "real", making milestones like 100K even more meaningless than usual, we shouldn't pretend than any one figure is correct. Anybody that's interested in a total count, using any of the various automatic definitions of "article", can look at [[Wikipedia:Statistics]] for that information. Anybody that believes a particular method of counting to be useful can announce milestones based on it on [[Wikipedia:Announcements]].
We can even link from [[Main Page]] to [[Wikipedia:Statistics]] directly from the phrase "thousands and thousands of articles", so even newcomers can get an easily accessible precise count -- deciding for themselves which precise count is most accurate.
-- Toby
wikipedia-l@lists.wikimedia.org