Ok, I have now found and tackled the issue.
This was indeed a bug in EasyRDF that got fixed since we forked half a year ago. I have updated easyrdf_lite now: https://github.com/Wikidata/easyrdf_lite/commit/025c32da17d82a51950230b80c254be5b3dc20d6. The respective patch for Wikibase is in review, see https://gerrit.wikimedia.org/r/86858.
Having to maintain the fork is really a pain, I wish there was a better way to do this. I think there's a strong use case for project that only need RDF export, no import or serving. Being able to deploy the serialization code separately would be very useful.
Nicholas, do you think it would be an option for EasyRdf to offer support for this use case? The most obvious (but also rather painful) way would be to spin off the serialization bit into a separate repository. But maybe it would be feasible to provide a build script that could be used to carve/reduce the code base to the parts needed in a particular scenario?
-- daniel
Am 27.09.2013 01:17, schrieb Nicholas Humfrey:
On 26/09/2013 15:33, "Daniel Kinzler" daniel.kinzler@wikimedia.de wrote:
Am 26.09.2013 14:54, schrieb Nicholas Humfrey:
Wikidata uses a fork of EasyRdf: https://github.com/Wikidata/easyrdf_lite
Which should handle this correctly.
Looks like it doesn't, but I'll investigate.
Hi,
I have just double-checked by writing an extra test and EasyRdf (the version in master) handles this correctly:
https://github.com/njh/easyrdf/commit/3121bd2201fca987c85bedb976d2148c862aa e78
So either Wikidata is passing the integer though differently or it was fixed since you took a fork...
nick.
http://www.bbc.co.uk This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
On Tue, Oct 1, 2013 at 1:01 PM, Daniel Kinzler daniel.kinzler@wikimedia.dewrote:
Ok, I have now found and tackled the issue.
This was indeed a bug in EasyRDF that got fixed since we forked half a year ago. [....]
Having to maintain the fork is really a pain, I wish there was a better way to do this.
How about not creating a fork just so you can delete a couple of directories? The full download is a whopping 260KB. Is that really too big/complex to include in its entirety and just ignore the parts you don't use?
Tom
Am 01.10.2013 20:14, schrieb Tom Morris:
How about not creating a fork just so you can delete a couple of directories? The full download is a whopping 260KB. Is that really too big/complex to include in its entirety and just ignore the parts you don't use?
Not deploying code we do not use, especially if it is complex, is a requirement from the ops team. And deploying a standalone HTTP end point (as contained in the EasyRDF distribution) on the boxes that server Wikipedia is an absolute no go. We had the choice of either deleting the unneeded parts, or writing our own.
This is actually not the first time I'm having this problem with an RDF library when all I want to do is export. Last time, I ended up writing my own (in Java).
-- daniel
On 10/2/13 10:42 AM, Daniel Kinzler wrote:
Am 01.10.2013 20:14, schrieb Tom Morris:
How about not creating a fork just so you can delete a couple of directories? The full download is a whopping 260KB. Is that really too big/complex to include in its entirety and just ignore the parts you don't use?
Not deploying code we do not use, especially if it is complex, is a requirement from the ops team. And deploying a standalone HTTP end point (as contained in the EasyRDF distribution) on the boxes that server Wikipedia is an absolute no go. We had the choice of either deleting the unneeded parts, or writing our own.
This is actually not the first time I'm having this problem with an RDF library when all I want to do is export. Last time, I ended up writing my own (in Java).
-- daniel
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Daniel,
When will the fixed data be generated and published?
On 10/2/13 1:09 PM, Daniel Kinzler wrote:
Am 02.10.2013 17:00, schrieb Kingsley Idehen:
Daniel,
When will the fixed data be generated and published?
October 14, if all goes well.
-- daniel
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Daniel,
Note, cross references in DBpedia won't change following your fix. What will emerge, as a result of these fixes, is the ability to demonstrate the power of owl:sameAs inference [1], at Web-scale, using Linked Data from the DBpedia and Wikidata data spaces on the Web :-)
[1] http://bit.ly/19pgtiP -- post about DBpedia and Wikidata cross references that's missing a typical live demonstration link re. full implications of owl:sameAs relations based reasoning and inference .
On 10/2/13 1:09 PM, Daniel Kinzler wrote:
Am 02.10.2013 17:00, schrieb Kingsley Idehen:
Daniel,
When will the fixed data be generated and published?
October 14, if all goes well.
-- daniel
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Status?
Am 16.10.2013 15:11, schrieb Kingsley Idehen:
On 10/2/13 1:09 PM, Daniel Kinzler wrote:
Am 02.10.2013 17:00, schrieb Kingsley Idehen:
Daniel,
When will the fixed data be generated and published?
October 14, if all goes well.
-- daniel
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Status?
Went live yesterday, seems to work now :)
data:Q64 a schema:Dataset ; schema:about entity:Q64 ; cc:license http://creativecommons.org/publicdomain/zero/1.0/ ; schema:version 78607403 ; schema:dateModified "2013-10-16T11:30:43Z"^^xsd:dateTime .
-- daniel
On 10/16/13 4:43 PM, Daniel Kinzler wrote:
Am 16.10.2013 15:11, schrieb Kingsley Idehen:
On 10/2/13 1:09 PM, Daniel Kinzler wrote:
Am 02.10.2013 17:00, schrieb Kingsley Idehen:
Daniel,
When will the fixed data be generated and published?
October 14, if all goes well.
-- daniel
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Status?
Went live yesterday, seems to work now :)
data:Q64 a schema:Dataset ; schema:about entity:Q64 ; cc:license http://creativecommons.org/publicdomain/zero/1.0/ ; schema:version 78607403 ; schema:dateModified "2013-10-16T11:30:43Z"^^xsd:dateTime .
-- daniel
I've run it through our variant of Vapour re. Linked Data verification: http://bit.ly/1gM7oYa .
Nearly there. Your use of 302 is what's going to trip up existing Linked Data clients. Why aren't you using 303 for redirection?
Kingsley
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
I've run it through our variant of Vapour re. Linked Data verification: http://bit.ly/1gM7oYa .
Nearly there. Your use of 302 is what's going to trip up existing Linked Data clients. Why aren't you using 303 for redirection?
There actually is a 303 *and* a 302, and in the wrong order. The ops guys had some issues coming up with the correct rewrite rules, and then it got stuck, I'm afraid.
-- daniel
On 10/17/13 12:46 PM, Daniel Kinzler wrote:
I've run it through our variant of Vapour re. Linked Data verification: http://bit.ly/1gM7oYa .
Nearly there. Your use of 302 is what's going to trip up existing Linked Data clients. Why aren't you using 303 for redirection?
There actually is a 303 *and* a 302, and in the wrong order. The ops guys had some issues coming up with the correct rewrite rules, and then it got stuck, I'm afraid.
-- daniel
Can't this get fixed? A 302 isn't the same thing as a 303.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 17.10.2013 20:16, schrieb Kingsley Idehen:
On 10/17/13 12:46 PM, Daniel Kinzler wrote:
I've run it through our variant of Vapour re. Linked Data verification: http://bit.ly/1gM7oYa .
Nearly there. Your use of 302 is what's going to trip up existing Linked Data clients. Why aren't you using 303 for redirection?
There actually is a 303 *and* a 302, and in the wrong order. The ops guys had some issues coming up with the correct rewrite rules, and then it got stuck, I'm afraid.
-- daniel
Can't this get fixed? A 302 isn't the same thing as a 303.
Sure it could. It would probably take the right person 15 minutes. I have been trying to find the right person for half a year now. The truth is, this is simply not very high on anyones priority list (including mine). It's an issue, it bothers me, i'd love to see it fixed, but there's a lot of things with a lot more impact for a lot more people on my list.
-- daniel
On 10/21/13 4:57 AM, Daniel Kinzler wrote:
Am 17.10.2013 20:16, schrieb Kingsley Idehen:
On 10/17/13 12:46 PM, Daniel Kinzler wrote:
I've run it through our variant of Vapour re. Linked Data verification: http://bit.ly/1gM7oYa .
Nearly there. Your use of 302 is what's going to trip up existing Linked Data clients. Why aren't you using 303 for redirection?
There actually is a 303 *and* a 302, and in the wrong order. The ops guys had some issues coming up with the correct rewrite rules, and then it got stuck, I'm afraid.
-- daniel
Can't this get fixed? A 302 isn't the same thing as a 303.
Sure it could. It would probably take the right person 15 minutes. I have been trying to find the right person for half a year now. The truth is, this is simply not very high on anyones priority list (including mine). It's an issue, it bothers me, i'd love to see it fixed, but there's a lot of things with a lot more impact for a lot more people on my list.
-- daniel
Daniel,
Being interoperable with the Linked Open Data cloud via DBpedia is a low cost high-impact affair for Wikidata. I don't know of anything of higher impact in the grand scheme of thing bearing in mind we all know that NIH (Not Invented Here) is ultimately always a shortcut to time-wasted.
Let's always avert generating undue costs. This is a highly beneficial issue that simply requires a single individual to make a trivial tweak.
Kingsley
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Mon, Oct 21, 2013 at 2:49 PM, Kingsley Idehen kidehen@openlinksw.com wrote:
Daniel,
Being interoperable with the Linked Open Data cloud via DBpedia is a low cost high-impact affair for Wikidata. I don't know of anything of higher impact in the grand scheme of thing bearing in mind we all know that NIH (Not Invented Here) is ultimately always a shortcut to time-wasted.
Let's always avert generating undue costs. This is a highly beneficial issue that simply requires a single individual to make a trivial tweak.
There are things that are more high-impact that this right now. For example getting the numbers datatype and queries done. That being said if there is someone who knows their way around apache rewrite rules very well and would be willing to look into this please send them my way.
Cheers Lydia
On 10/21/13 8:58 AM, Lydia Pintscher wrote:
On Mon, Oct 21, 2013 at 2:49 PM, Kingsley Idehen kidehen@openlinksw.com wrote:
Daniel,
Being interoperable with the Linked Open Data cloud via DBpedia is a low cost high-impact affair for Wikidata. I don't know of anything of higher impact in the grand scheme of thing bearing in mind we all know that NIH (Not Invented Here) is ultimately always a shortcut to time-wasted.
Let's always avert generating undue costs. This is a highly beneficial issue that simply requires a single individual to make a trivial tweak.
There are things that are more high-impact that this right now. For example getting the numbers datatype and queries done. That being said if there is someone who knows their way around apache rewrite rules very well and would be willing to look into this please send them my way.
Cheers Lydia
Lydia,
If you can share the existing re-write rules file via a URL or github project, I'll have get someone (should I not get round to it) to fix them accordingly. Ball back in your court, so to speak :-)
On Mon, Oct 21, 2013 at 3:26 PM, Kingsley Idehen kidehen@openlinksw.com wrote:
If you can share the existing re-write rules file via a URL or github project, I'll have get someone (should I not get round to it) to fix them accordingly. Ball back in your court, so to speak :-)
Hehe. Here you go: https://gerrit.wikimedia.org/r/#/admin/projects/operations/apache-config Thanks for looking into it!
Cheers Lydia
On 10/21/13 9:59 AM, Lydia Pintscher wrote:
On Mon, Oct 21, 2013 at 3:26 PM, Kingsley Idehen kidehen@openlinksw.com wrote:
If you can share the existing re-write rules file via a URL or github project, I'll have get someone (should I not get round to it) to fix them accordingly. Ball back in your court, so to speak :-)
Hehe. Here you go: https://gerrit.wikimedia.org/r/#/admin/projects/operations/apache-config Thanks for looking into it!
Cheers Lydia
Can someone not change 302 to 303 re: RewriteRule ^/entity/(.*)$ https://www.wikidata.org/wiki/Special:EntityData/$1 [R=302,QSA] ?
See: http://pastebin.com/dbmGmCt8 .
From 'temporary redirect' to 'see other' ?
Addshore
On 21 October 2013 16:48, Kingsley Idehen kidehen@openlinksw.com wrote:
On 10/21/13 9:59 AM, Lydia Pintscher wrote:
On Mon, Oct 21, 2013 at 3:26 PM, Kingsley Idehen kidehen@openlinksw.com wrote:
If you can share the existing re-write rules file via a URL or github project, I'll have get someone (should I not get round to it) to fix them accordingly. Ball back in your court, so to speak :-)
Hehe. Here you go: https://gerrit.wikimedia.org/**r/#/admin/projects/operations/** apache-confighttps://gerrit.wikimedia.org/r/#/admin/projects/operations/apache-config Thanks for looking into it!
Cheers Lydia
Can someone not change 302 to 303 re: RewriteRule ^/entity/(.*)$ https://www.wikidata.org/wiki/**Special:EntityData/$1https://www.wikidata.org/wiki/Special:EntityData/$1[R=302,QSA] ?
See: http://pastebin.com/dbmGmCt8 .
--
Regards,
Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/**blog/~kidehenhttp://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/**112399767740508618350/abouthttps://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/**kidehenhttp://www.linkedin.com/in/kidehen
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 10/21/13 11:02 AM, addshorewiki wrote:
From 'temporary redirect' to 'see other' ?
Yes.
This is all about redirection to the URL (Address) of a description document that describes a URIs referent [1].
[1] http://bit.ly/11xnQ36 -- illustrates the use of hashless HTTP URIs as a denotation mechanism for entities that aren't Web documents.
Kingsley
Addshore
On 21 October 2013 16:48, Kingsley Idehen <kidehen@openlinksw.com mailto:kidehen@openlinksw.com> wrote:
On 10/21/13 9:59 AM, Lydia Pintscher wrote: On Mon, Oct 21, 2013 at 3:26 PM, Kingsley Idehen <kidehen@openlinksw.com <mailto:kidehen@openlinksw.com>> wrote: If you can share the existing re-write rules file via a URL or github project, I'll have get someone (should I not get round to it) to fix them accordingly. Ball back in your court, so to speak :-) Hehe. Here you go: https://gerrit.wikimedia.org/r/#/admin/projects/operations/apache-config Thanks for looking into it! Cheers Lydia Can someone not change 302 to 303 re: RewriteRule ^/entity/(.*)$ https://www.wikidata.org/wiki/Special:EntityData/$1 [R=302,QSA] ? See: http://pastebin.com/dbmGmCt8 . -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen <http://www.openlinksw.com/blog/%7Ekidehen> Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 21.10.2013 16:48, schrieb Kingsley Idehen:
Can someone not change 302 to 303 re: RewriteRule ^/entity/(.*)$ https://www.wikidata.org/wiki/Special:EntityData/$1 [R=302,QSA] ?
The thing is that we intended this to be an internal apache rewrite, not a HTTP redirect at all. Because Special:EntityData itself implements the content negotiation that triggers a 303 when appropriate.
So, currently we get a 302 from /entity/Q$1 to /wiki/Special:EntityData/$1 (the generic document URI), which then applies content negotiation and sends a 303 pointing to e.g. /wiki/Special:EntityData/$1.ttl (the URL of a specific serialization, e.g. in turtle).
What I want is to remove the initial 302 completely using an internal rewrite, not replace it with another 303 - since I don't think that's semantically correct. This did not work when tried, for reasons unknown to me. Someone suggester that the wrong options where set for the rewrite rule, who knows.
Kingsley, do you think having two 303s (from /entity/Q$1 to /wiki/Special:EntityData/$1 and another one to wiki/Special:EntityData/$1.xxx) would be appropriate or at least better than what we have now?
-- daniel
On 10/21/13 4:52 PM, Daniel Kinzler wrote:
Am 21.10.2013 16:48, schrieb Kingsley Idehen:
Can someone not change 302 to 303 re: RewriteRule ^/entity/(.*)$ https://www.wikidata.org/wiki/Special:EntityData/$1 [R=302,QSA] ?
The thing is that we intended this to be an internal apache rewrite, not a HTTP redirect at all. Because Special:EntityData itself implements the content negotiation that triggers a 303 when appropriate.
So, currently we get a 302 from /entity/Q$1 to /wiki/Special:EntityData/$1 (the generic document URI), which then applies content negotiation and sends a 303 pointing to e.g. /wiki/Special:EntityData/$1.ttl (the URL of a specific serialization, e.g. in turtle).
What I want is to remove the initial 302 completely using an internal rewrite, not replace it with another 303 - since I don't think that's semantically correct. This did not work when tried, for reasons unknown to me. Someone suggester that the wrong options where set for the rewrite rule, who knows.
Kingsley, do you think having two 303s (from /entity/Q$1 to /wiki/Special:EntityData/$1 and another one to wiki/Special:EntityData/$1.xxx) would be appropriate or at least better than what we have now?
Yes.
303 is what you want. Also note that 303's can now be cached re. newer HTTP spec guidelines. Thus, this is your simplest fix.
Now, if you are concerned about the 303 traffic, you can use a different pattern (again based on newer HTTP spec guidelines) where your server returns a 200 OK on the entity URI and then returns the actual description document URL in the "Location:" response header, as exemplified below:
curl -I http://dbpedia.org/resource/Linked_Data *HTTP/1.1 303 See Other* Date: Mon, 21 Oct 2013 21:34:42 GMT Content-Type: text/html; charset=UTF-8 Content-Length: 0 Connection: keep-alive Server: Virtuoso/07.00.3204 (Linux) i686-generic-linux-glibc212-64 VDB Accept-Ranges: bytes *Location: http://dbpedia.org/page/Linked_Data** * and
curl -I http://dbpedia.org/resource/Linked_Data *HTTP/1.1 200 OK* Date: Mon, 21 Oct 2013 21:34:42 GMT Content-Type: text/html; charset=UTF-8 Content-Length: 0 Connection: keep-alive Server: Virtuoso/07.00.3204 (Linux) i686-generic-linux-glibc212-64 VDB Accept-Ranges: bytes *Location: http://dbpedia.org/page/Linked_Data*
Meanings: the client requested a document denoted http://dbpedia.org/resource/Linked_Data and the server confirms that there is a document associated with said URI (albeit not in a direct denotation oriented relationship) via 200 OK, and then returns the actual document location via "Location:" header.
The only problem with this last solution is that its new and I doubt most existing Linked Data clients support the pattern.
Kingsley
-- daniel
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 21.10.2013 23:43, schrieb Kingsley Idehen:
Kingsley, do you think having two 303s (from /entity/Q$1 to /wiki/Special:EntityData/$1 and another one to wiki/Special:EntityData/$1.xxx) would be appropriate or at least better than what we have now?
Yes.
303 is what you want. Also note that 303's can now be cached re. newer HTTP spec guidelines. Thus, this is your simplest fix.
Ticked filed: https://bugzilla.wikimedia.org/show_bug.cgi?id=56307
-- daniel
On 21/10/2013 21:52, "Daniel Kinzler" daniel.kinzler@wikimedia.de wrote:
Am 21.10.2013 16:48, schrieb Kingsley Idehen:
Can someone not change 302 to 303 re: RewriteRule ^/entity/(.*)$ https://www.wikidata.org/wiki/Special:EntityData/$1 [R=302,QSA] ?
The thing is that we intended this to be an internal apache rewrite, not a HTTP redirect at all. Because Special:EntityData itself implements the content negotiation that triggers a 303 when appropriate.
So, currently we get a 302 from /entity/Q$1 to /wiki/Special:EntityData/$1 (the generic document URI), which then applies content negotiation and sends a 303 pointing to e.g. /wiki/Special:EntityData/$1.ttl (the URL of a specific serialization, e.g. in turtle).
Wondering why the 2nd step (conneg) returns a 303. Shouldn't it just be a 200 with a content location? michael
What I want is to remove the initial 302 completely using an internal rewrite, not replace it with another 303 - since I don't think that's semantically correct. This did not work when tried, for reasons unknown to me. Someone suggester that the wrong options where set for the rewrite rule, who knows.
Kingsley, do you think having two 303s (from /entity/Q$1 to /wiki/Special:EntityData/$1 and another one to wiki/Special:EntityData/$1.xxx) would be appropriate or at least better than what we have now?
-- daniel
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
----------------------------- http://www.bbc.co.uk This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. -----------------------------
On 02/10/2013 15:42, "Daniel Kinzler" daniel.kinzler@wikimedia.de wrote:
Am 01.10.2013 20:14, schrieb Tom Morris:
How about not creating a fork just so you can delete a couple of directories? The full download is a whopping 260KB. Is that really too big/complex to include in its entirety and just ignore the parts you don't use?
Not deploying code we do not use, especially if it is complex, is a requirement from the ops team. And deploying a standalone HTTP end point (as contained in the EasyRDF distribution) on the boxes that server Wikipedia is an absolute no go. We had the choice of either deleting the unneeded parts, or writing our own.
This is actually not the first time I'm having this problem with an RDF library when all I want to do is export. Last time, I ended up writing my own (in Java).
-- daniel
Hello,
Making EasyRdf more modular, making it possible to only use the parts that you need, is definitely on my roadmap. I think RDF.rb does this very well. Although it is a difficult to balance up against keeping it easy to install and use.
In the meantime, would it help to have a script that automatically creates a cut down version of EasyRdf from the git repo?
nick.
----------------------------- http://www.bbc.co.uk This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. -----------------------------