Hi Imre,
we can encode these rules using the JSON MongoDB database we created in GlobalFactSync project (https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE). As basis for the GFS Data Browser. The database has open read access.
Is there a list of geodata issues, somewhere? Can you give some example? GFS focuses on both: overall quality measures and very domain specific adaptations. We will also try to flag these issues for Wikipedians.
So I see that there is some notion of what is good and what not by source. Do you have a reference dataset as well, or would that be NaturalEarth itself? What would help you to measure completeness for adding concordances to NaturalEarth.
-- Sebastian
On 24.08.19 21:26, Imre Samu wrote:
For geodata ( human settlements/rivers/mountains/... ) ( with GPS coordinates ) my simple rules:
- if it has a "local wikipedia pages" or any big
lang["EN/FR/PT/ES/RU/.."] wikipedia page .. than it is OK.
- if it is only in "cebuano" AND outside of "cebuano BBOX" -> then
.... this is lower quality
- only:{shwiki+srwiki} AND outside of "sh"&"sr" BBOX -> this is lower
quality
- only {huwiki} AND outside of CentralEuropeBBOX -> this is lower quality
- geodata without GPS coordinate -> ...
- ....
so my rules based on wikipedia pages and languages areas ... and I prefer wikidata - with local wikipedia pages.
This is based on my experience - adding Wikidata ID concordances to NaturalEarth ( https://www.naturalearthdata.com/blog/%C2%A0)