I've recently put up a site that uses coordinate information from Freebase and Dbpedia, and I'm starting to think about how to clean up certain data quality problems I'm encountering, for instance, see:
http://ookaboo.com/o/pictures/topic/209440/Oakville_Assembly
In this particular case, I've only got data from dbpedia, which drops the point a few hundred km from where it really is... It's obvious that this is a bad one because it's right in the middle of Lake Erie. Freebase doesn't have any coordinate for this thing (seems to me that it should), and at the moment, Wikipedia has the right coordinates (at least on Google maps I see a big factory building) My guess is that wikipedia might have been wrong at one time, and has had it corrected. It's also possible that the conversion wasn't done right in dbpedia, since coordinates are represented differently in a few hundred different infoboxes.
It seems to me that both the number of points and the quality of points in Wikipedia has been improving dramatically over the last two years... About a year ago I plotted the points for Staten Island Railroad stations and found that the railroad was displaced a few km east and ran right under the middle of the Tapan Zee bridge... Now it's much better.
I can find examples where:
(a) dbpedia is right and freebase is wrong (for instance, a town in continental Europe gets its longitude sign flipped and ends up with the wrecked ships west of the UK -- maybe here the point got fixed in wikipedia but not in freebase) (b) dbpedia is wrong and freebase is right (c) a point is missing from dbpedia but is in freebase (I see a lot of these in Switzerland), and (d) a point is missing from freebase but in dbpedia
An analysis of this is is tricky because there are a lot of things where the coordinates are iffy: the location of 'Russia' could vary within a few thousand kilometers, 'Tompkins County' could vary by ten or so kilometers, etc.
Looking at a handful of points that have diverged, I get the impression that freebase is more accurate than dbpedia, but that I get better results just looking at the coordinates on the human interface of wikipedia -- currently, it seems like a scan of the current coordinates in wikipedia (however wikipedia extracts them from the infoboxes) benefits the most from the human labor being done to fix points and also avoids errors & missed points from other people's extraction pipelines.
From my viewpoint, I'd like to make a map that doesn't have embarassing errors in it... What's the best way to clean up this mess?
wikitech-l@lists.wikimedia.org