Please excuse my ignorance if a proposal such as this one has already been discussed already (perhaps it's a wrong mailing list) but I could not find any information about it.
Has it ever been discussed, considered to establish a central database (in a manner of Commons) to hold information such as for example basic statistical data about population, area, geographical data etc etc for towns, cities or even countries? If data like that was to be stored in one central location it would be possible to link to it from Wikipedias and it would make it easier to update such data and it would make it consistent across all projects. Just a quick example, according to en Wiki the population of Washington DC is 581,530, German Wiki says it's 548.360, according to French Wiki it's 553 523 and Polish Wiki states it's 582 049.
If data like that was stored in a central location, one could just update it once and all projects would show the same information. Also it would make it easy for bots to create articles containing basic demographical data about towns and localities based on such information.
Cheers,
On 11/29/07, Michal Rosa michal.rosa@gmail.com wrote:
Doh! Never find, it was just pointed out to me that a project like that already exists - http://meta.wikimedia.org/wiki/Wikidata
Thanks anyway :)
Please excuse my ignorance if a proposal such as this one has already been discussed already (perhaps it's a wrong mailing list) but I could not find any information about it.
Has it ever been discussed, considered to establish a central database (in a manner of Commons) to hold information such as for example basic statistical data about population, area, geographical data etc etc for towns, cities or even countries? If data like that was to be stored in one central location it would be possible to link to it from Wikipedias and it would make it easier to update such data and it would make it consistent across all projects. Just a quick example, according to en Wiki the population of Washington DC is 581,530, German Wiki says it's 548.360, according to French Wiki it's 553 523 and Polish Wiki states it's 582 049.
If data like that was stored in a central location, one could just update it once and all projects would show the same information. Also it would make it easy for bots to create articles containing basic demographical data about towns and localities based on such information.
Cheers,
Michal "roo72" Rosa
On Nov 28, 2007 11:27 PM, Michal Rosa michal.rosa@gmail.com wrote:
Just a quick example, according to en Wiki the population of Washington DC is 581,530, German Wiki says it's 548.360, according to French Wiki it's 553 523 and Polish Wiki states it's 582 049.
What you're actually seeing here is what happens when data is stated to a higher degree of accuracy than is warranted. When a number which changes daily is quoted so exactly, of course different sources will differ. I'm not sure where this practise of stating populations as if they could be determined down to the individual person came from.
(I know this is not really relevant to your original point, but to the one that differing information between different Wikipedia editions may not mean any of them are wrong; they're just using different sources for an approximate figure impossible to pin down).
-Matt
On 29/11/2007, Matthew Brown morven@gmail.com wrote:
On Nov 28, 2007 11:27 PM, Michal Rosa michal.rosa@gmail.com wrote:
Just a quick example, according to en Wiki the population of Washington DC is 581,530, German Wiki says it's 548.360, according to French Wiki it's 553 523 and Polish Wiki states it's 582 049.
What you're actually seeing here is what happens when data is stated to a higher degree of accuracy than is warranted. When a number which changes daily is quoted so exactly, of course different sources will differ. I'm not sure where this practise of stating populations as if they could be determined down to the individual person came from.
Populations determined from census results can just justifiably stated to full precision, since a census is an exact count, not an estimate. Of course, some people are always going to get missed out, but that's a matter of accuracy, not precision (which is what you meant, I think). Anyone announcing population figures not drawn directly from a census that state it more precisely than nearest 100 are just kidding themselves, IMHO, nearest thousand is probably more appropriate.
Thomas Dalton wrote:
On 29/11/2007, Matthew Brown morven@gmail.com wrote:
On Nov 28, 2007 11:27 PM, Michal Rosa michal.rosa@gmail.com wrote:
Just a quick example, according to en Wiki the population of Washington DC is 581,530, German Wiki says it's 548.360, according to French Wiki it's 553 523 and Polish Wiki states it's 582 049.
What you're actually seeing here is what happens when data is stated to a higher degree of accuracy than is warranted. When a number which changes daily is quoted so exactly, of course different sources will differ. I'm not sure where this practise of stating populations as if they could be determined down to the individual person came from.
Populations determined from census results can just justifiably stated to full precision, since a census is an exact count, not an estimate. Of course, some people are always going to get missed out, but that's a matter of accuracy, not precision (which is what you meant, I think). Anyone announcing population figures not drawn directly from a census that state it more precisely than nearest 100 are just kidding themselves, IMHO, nearest thousand is probably more appropriate.
As some have said, "Lies, damn lies and statistics." Taken to one significant digit three of them would say that the population was 600,000, but the Germans would leave it at 500,000. :-)
Ec
As some have said, "Lies, damn lies and statistics." Taken to one significant digit three of them would say that the population was 600,000, but the Germans would leave it at 500,000. :-)
Which, I think, goes to show that one sig. fig. is insufficient precision.
Also, it's interesting that you should mention that quote, since the very point I was making about census results was actually that they are *not* statistics. A statistic is a number calculated from a sample of a population, the result of a census is just the actual number of people (or whatever). I guess that means they're either lies or damned lies...
Hoi, The issue with data is not so much that it can be whatever, the point is that it should not be whatever. When data is provided, like the size of the Washington DC population, it is relevant to know the source and the date of this information. This will explain what the number actually means. When there are multiple sources for this information, it is not relevant what information is "correct", what is of relevance is that information on comparable subjects uses the same data source.
OmegaWiki, the best implementation of Wikidata, does not carry at the moment information like number of inhabitants. Technically it would be possible to do so however, the data structure needed has not been implemented yet. We do however demonstrate other types of information. Have a look at the Netherlands, it is marked as a "country" and consequently all kinds of attributes associated with this class are available.
When you create a user profile, you can experiment with the representation of the data. You will find that depending of the existence of translations your experience will be in the language that you select. We do want to support more languages and, we want to extend the data available.
As OmegaWiki is providing its data under a more liberal license then the GFDL, it is possible for everyone to make use of its data.
Thanks, GerardM
http://www.omegawiki.org/index.php?title=DefinedMeaning:The%20Netherlands%20...
On Nov 29, 2007 1:24 PM, Matthew Brown morven@gmail.com wrote:
On Nov 28, 2007 11:27 PM, Michal Rosa michal.rosa@gmail.com wrote:
Just a quick example, according to en Wiki the population of Washington DC is 581,530, German Wiki says it's 548.360, according to French Wiki it's 553 523 and Polish Wiki states it's 582 049.
What you're actually seeing here is what happens when data is stated to a higher degree of accuracy than is warranted. When a number which changes daily is quoted so exactly, of course different sources will differ. I'm not sure where this practise of stating populations as if they could be determined down to the individual person came from.
(I know this is not really relevant to your original point, but to the one that differing information between different Wikipedia editions may not mean any of them are wrong; they're just using different sources for an approximate figure impossible to pin down).
-Matt
Wikipedia-l mailing list Wikipedia-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikipedia-l
Thank you for your promotional commentary.
Ec
GerardM wrote:
Hoi,
OmegaWiki, the best implementation of Wikidata, does not carry at the moment information like number of inhabitants. Technically it would be possible to do so however, the data structure needed has not been implemented yet. We do however demonstrate other types of information. Have a look at the Netherlands, it is marked as a "country" and consequently all kinds of attributes associated with this class are available.
When you create a user profile, you can experiment with the representation of the data. You will find that depending of the existence of translations your experience will be in the language that you select. We do want to support more languages and, we want to extend the data available.
As OmegaWiki is providing its data under a more liberal license then the GFDL, it is possible for everyone to make use of its data.
Thanks, GerardM
Michal Rosa wrote:
Also it would make it easy for bots to create articles containing basic demographical data about towns and localities based on such information.
As if there weren't already enough of these "articles" ...
user-written: http://it.wikipedia.org/wiki/Saarbr%C3%BCcken bot-written: http://lmo.wikipedia.org/wiki/Saarbr%C3%BCcken
Apart from the language, which article would give your more information? Do we really need articles that only repeat information from the infobox in text form?
I don't think so. --32X
Hoi, Thank you for using a bad example. Thank you for using political arguments that are completely beside the point.
The question is, what is a "Commons database" good for. It is good for providing information in a set way in as many languages as possible. This does not mean machine translation, it means having information in one place make damned sure that the underlying information is correct and make sure that this information can be understood.
The Wikimedia Foundation is about providing information. When there is information in info boxes that can be provided in a language like Tamil, Malayam, Kannada, Telugu, Gujarati, Georgian to name but a few with a completely different script, it makes sense to have at least that information available in info boxes in those languages. It makes perfect sense to have reputable information available in this way, a way that allows us to prove that at least the underlying information is correct. When we are able to provide information in info boxes, we do not "need" to provide the same information again. This however is common practice in any project; facts are repeated in text and extra information is provided to give it extra depth.
The quality of machine translation is not the same for every language. This gets you first of all broken translations and slowly but surely acceptable language. When people complain about the big numbers of articles involved, they should appreciate that for a machine translation algorithm the number of articles is immaterial. For the development of machine translation it is actually a boon when there is a lot of language to process. So people that complain about large numbers of articles when machine is used just do not understand the process.
Thanks, GerardM
On Nov 29, 2007 7:29 PM, User 32X wikipedia@32x.de wrote:
Michal Rosa wrote:
Also it would make it easy for bots to create articles containing basic demographical data about towns and localities based on such information.
As if there weren't already enough of these "articles" ...
user-written: http://it.wikipedia.org/wiki/Saarbr%C3%BCcken bot-written: http://lmo.wikipedia.org/wiki/Saarbr%C3%BCcken
Apart from the language, which article would give your more information? Do we really need articles that only repeat information from the infobox in text form?
I don't think so. --32X
Wikipedia-l mailing list Wikipedia-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikipedia-l
GerardM wrote:
Michal Rosa wrote:
Also it would make it easy for bots to create articles containing basic demographical data about towns and localities based on such information.
Thank you for using a bad example. Thank you for using political arguments that are completely beside the point.
The question is, what is a "Commons database" good for. It is good for providing information in a set way in as many languages as possible. This does not mean machine translation, it means having information in one place make damned sure that the underlying information is correct and make sure that this information can be understood.
Yeah, that's the good way. It could even solve the problem of reliable sources. There cannot be sources for inhabitants of a city in every languages. A central database were these "valuable" sources are collected would help a lot.
--32X
wikipedia-l@lists.wikimedia.org