I notice lines in the dbpedia dumps that look like
http://dbpedia.org/resource/Boston%2C_MA http://dbpedia.org/property/redirect http://dbpedia.org/resource/Boston .
Note the URL encoded %2C=",".
Anyhow, if I go to
http://dbpedia.org/page/Boston%2C_MA
I see two redirects [one of which unescapes the comma] and ultimately end up at
http://dbpedia.org/page/Boston
If I go to Wikipedia
http://wikipedia.org/page/Boston%2C_MA
I get redirected to
http://wikipedia.org/page/Boston,_MA
which, oddly, displays the same content as "Boston" [rather than 301 redirecting...]
When I do
curl -H "Accept: application/rdf+xml" http://dbpedia.org/data/Boston.xml
I see stuff like
<rdf:Description rdf:about="http://dbpedia.org/resource/Harvey_Mason%2C_Jr.%22%3E<dbpedia-owl:birthPlace xmlns:dbpedia-owl="http://dbpedia.org/ontology/" rdf:resource="http://dbpedia.org/resource/Boston%22/%3E</rdf:Description>
Now If I run the SPARQL query
select ?Predicate where {http://dbpedia.org/resource/Harvey_Mason,_Jr. ?Predicate http://dbpedia.org/resource/Boston }
I get nothing, but if I run
select ?Predicate where {http://dbpedia.org/resource/Harvey_Mason%2C_Jr. ?Predicate http://dbpedia.org/resource/Boston }
I get
http://dbpedia.org/ontology/birthPlace
So it looks like the %-encoded URI is the "real URI" in dbpedia. Obviously I ought to keep it around in case I want to run a SPARQL query now and then. Also, dbpedia encodes wikipedia this way as well,
http://en.wikipedia.org/wiki/Harvey_Mason%2C_Jr. http://xmlns.com/foaf/0.1/primaryTopic http://dbpedia.org/resource/Harvey_Mason%2C_Jr. .
------
I took a look at some standards docs and found:
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-URI-reference
I see that we encode UTF-8 text as octets, and if the octets aren't US-ASCII characters, I wed %-encode them. However, the spec also says that
*"Note:* Because of the risk of confusion between RDF URI references that would be equivalent if derefenced, the use of %-escaped characters in RDF URI references is strongly discouraged. "
------
Now the problem I've got with the Ookaboo API is that I know people are going to punch in
http://wikipedia.org/page/Boston,_MA
and I need to turn this into the right dbpedia URL. My plan for dealing with this is to
(i) store the exact URI I get out of dbpedia, (ii) always give people the exact URI out of dbpedia (if I publish RDFa or JSON data), (iii) give the same URI for wikipedia that dbpedia gives (in HTML, RDFa, etc.) (iv) if I get a query, apply the same canonicalization rules that dbpedia uses...
Which begs the question of what exactly those rules are. What are they?