Hi Imre,

we can encode these rules using the JSON MongoDB database we created in GlobalFactSync project (https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE). As  basis for the GFS Data Browser. The database has open read access.

Is there a list of geodata issues, somewhere? Can you give some example? GFS focuses on both: overall quality measures and very domain specific adaptations. We will also try to flag these issues for Wikipedians.

So I see that there is some notion of what is good and what not by source. Do you have a reference dataset as well, or would that be NaturalEarth itself? What would help you to measure completeness for adding concordances to NaturalEarth.

-- Sebastian

On 24.08.19 21:26, Imre Samu wrote:
For geodata ( human settlements/rivers/mountains/... )  ( with GPS coordinates ) my simple rules:
- if it has a  "local wikipedia pages" or  any big lang["EN/FR/PT/ES/RU/.."]  wikipedia page ..  than it is OK. 
- if it is only in "cebuano" AND outside of "cebuano BBOX" ->  then .... this is lower quality 
- only:{shwiki+srwiki} AND outside of "sh"&"sr" BBOX ->  this is lower quality
- only {huwiki} AND outside of CentralEuropeBBOX -> this is lower quality 
- geodata without GPS coordinate ->  ...
- ....
so my rules based on wikipedia pages and languages areas ...  and I prefer wikidata - with local wikipedia pages.

This is based on my experience - adding Wikidata ID concordances to NaturalEarth ( https://www.naturalearthdata.com/blog/ )
--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org