> sitelinks / I want to use the data to help rank possible text entity links to Wikidata items
it is a public domain geo-database ... with [ mountains, rivers, populated places, .. ]
I am using wikidata json dumps - and I am importing to PostGIS database.
And I am ranking the matches with
- distance, ( lower is better )
- text similarity ( I am checking the "labels" and the "aliases" )
- and sitelinks!
And I am lowering the "mostly imported sitelinks" ranks ("cebwiki" , ... )
Because a lot of geodata re-imported. And the "distance" and "text/labels" are the same.
So be careful with the imported Wikipedia pages! ( sitelinks )
Now: As I see the geodata quality is so much better - mostly: where the active wikidata community is cleaning ..
it is just an example of why the simple "sitelinks" number is not enough :-)
In Germany - the "dewiki" is higher ranks.
in Hungary - the "huwiki" is prefered.