Georgi Kobilarov wrote:
In this particular one, it's two articles about the same topic, but there could be some cases where the two articles are about something different.
Yes, such as http://en.wikipedia.org/wiki/FROG and http://en.wikipedia.org/wiki/Frog
I agree that this can be annoying. One have to make sure to not lose the case information (as it happened to me with lookup.dbpedia.org once, hence merging FROG and Frog).
But what do you suggest to do about that, Paul? Should Wikipedia make URLs case-insensitive and then enforce disambiguation with ()?
If (wikipedia) were my site, I'd do two things:
(i) map all case-variant forms to a single form (New yOrK cITy -> New York City;) "FROG" gets renamed to "FROG Cipher" or "Frog (Cipher)" (ii) do a permanent redirect from variant forms to the canonical form
I think what dbpedia is doing is reasonable considering the situation.
My own system for handling generic databases has both a VARBINARY and VARCHAR field for dbpedia URLs/labels. It does a case-insensitive lookup first, and if that fails, looks at the alternatives that turn up. It's also got some heuristics for dealing with redirects, disambiguation, and all that. In the big picture I see "naming and identity" as a specific functional module for this kind of system...
wikitech-l@lists.wikimedia.org