Hi Imre,
we can encode these rules using the JSON MongoDB database we
created in GlobalFactSync project
(https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE).
As basis for the GFS Data Browser. The database has open read
access.
Is there a list of geodata issues, somewhere? Can you give some
example? GFS focuses on both: overall quality measures and very
domain specific adaptations. We will also try to flag these issues
for Wikipedians.
So I see that there is some notion of what is good and what not
by source. Do you have a reference dataset as well, or would that
be NaturalEarth itself? What would help you to measure
completeness for adding concordances to NaturalEarth.
-- Sebastian
For geodata ( human settlements/rivers/mountains/... ) ( with GPS coordinates ) my simple rules:- if it has a "local wikipedia pages" or any big lang["EN/FR/PT/ES/RU/.."] wikipedia page .. than it is OK.- if it is only in "cebuano" AND outside of "cebuano BBOX" -> then .... this is lower quality
- only:{shwiki+srwiki} AND outside of "sh"&"sr" BBOX -> this is lower quality- only {huwiki} AND outside of CentralEuropeBBOX -> this is lower quality- geodata without GPS coordinate -> ...- ....so my rules based on wikipedia pages and languages areas ... and I prefer wikidata - with local wikipedia pages.
This is based on my experience - adding Wikidata ID concordances to NaturalEarth ( https://www.naturalearthdata.com/blog/ )