Andre Engels wrote:
Do I understand correctly that you are saying that yes/no questions can only be answered mechanically?
No, I wasn't referring to *how* we answer the question. But the *answer* is itself a "mechanic" yes or no. You either have the article, or you don't. This is not a fuzzy thing that can be discussed in terms of feelings or "quality", the existence of the article is only a quantity: 0 (doesn't exist) or 1 (does exist). The question needs an answer, we cannot leave it undecided, because we cannot "maybe" have an article. Since I'm an inclusionist rather than a deletionist, I'm looking for ways to formulate successful arguments for keeping articles, and since the answer needs to be mechanic, it can be helpful to use mechanic reasoning, such as the population of a city. Anyway, that would be better than using the Google hit count. You could try to use other (less mechanical) arguments for keeping an article, but I think it would typically be more difficult.
I guess I should not go into the examples, but in this case my opinion is that 50,000 would be too high a limit, I myself would be thinking of 2,000 or 5,000.
Absolutely. Perhaps for the U.S. and parts of Germany we are approaching full coverage of all places with 5,000 people. But for India I doubt if we have covered all cities with 50,000. Nothing stops the limit from being set at 500 too. But a lower limit could be questioned a lot more easily than a higher one. Then again, some places with 50,000 people are less notable than some very small places. But if you can point to the fact that a place has 50,000 inhabitants (or was the birth places for a president), then it is a lot easier to defend its notability.
The notability issue is related to another question that I have:
In linguistics, it is known that some words appear more often than others, and useful statistics can be based on a large corpus of text. If we have a dictionary of 80,000 words, we can check it against a corpus of text to see if those are the 80,000 most commonly occurring words or if the dictionary is missing some frequent words that it should contain. I wish we could apply the same kind of statistics to encyclopedias as well. If we have a large corpus of text, how much of its meaning is explained by Wikipedia? Which concepts are more common than others, and is Wikipedia missing some of them? We can do word frequency statistics, but how can we count the concepts that occurr in a text?