Hello,
I'm a software engineer for the Creative Commons and am working on our on-line license/RDF validator. We're currently working on adding <link> support for RDF retrieval, due in large part to Wikipedia's decision to use CC metadata to describe the FDL license. We have intial support working, but when we test with Wikipedia, we get a 403: Forbidden.
Our system parses the page and attempts to retrieve the metadata file seperately; specifically, http://en.wikipedia.org/w/wiki.phtml?title=Main_Page&action=creativecomm... for the Main_Page.
Does Wikipedia have non-browser requests for the metadata blocked? Is there something we need to do in order to retrieve the metadata? Thanks for any help you can provide.
Nathan R. Yergler
Nathan R. Yergler wrote:
I'm a software engineer for the Creative Commons and am working on our on-line license/RDF validator. We're currently working on adding <link> support for RDF retrieval, due in large part to Wikipedia's decision to use CC metadata to describe the FDL license.
Neat!
We have intial support working, but when we test with Wikipedia, we get a 403: Forbidden.
We do block some specific user-agent strings due to past robot abuse; be sure you're using a user-agent string that identifies your software rather than a generic one.
-- brion vibber (brion @ pobox.com)
Thanks for the help; yes, adding a User-Agent seemed to do the trick. I have, however, come up with another somewhat related question. Looking at the source to the Main Page, I see the metadata referenced in the following line:
<link title="Creative Commons" type="application/rdf+xml" href="/w/wiki.phtml?title=Main_Page&action=creativecommons" rel="meta" />
However, when I attempt to retrieve the URL specified by href, I just get the main page again. However, if I replace "&" with a simple & (before action=...), I get the RDF. Is this by design for some reason, or a bug? Thanks.
Nathan R. Yergler
Brion Vibber wrote:
Nathan R. Yergler wrote:
I'm a software engineer for the Creative Commons and am working on our on-line license/RDF validator. We're currently working on adding
<link> support for RDF retrieval, due in large part to Wikipedia's decision to use CC metadata to describe the FDL license.
Neat!
We have intial support
working, but when we test with Wikipedia, we get a 403: Forbidden.
We do block some specific user-agent strings due to past robot abuse; be sure you're using a user-agent string that identifies your software rather than a generic one.
-- brion vibber (brion @ pobox.com)
Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Nathan R. Yergler wrote:
<link title="Creative Commons" type="application/rdf+xml" href="/w/wiki.phtml?title=Main_Page&action=creativecommons" rel="meta" />
However, when I attempt to retrieve the URL specified by href, I just get the main page again. However, if I replace "&" with a simple & (before action=...), I get the RDF. Is this by design for some reason, or a bug? Thanks.
That is how HTML works. Ampersands are escaped as &.
Timwi
Nathan R. Yergler wrote:
Timwi wrote:
That is how HTML works. Ampersands are escaped as &.
I knew that was the case in the body of a document, but wasn't aware it applied to head elements as well. Thanks.
In general, it applies to both "normal" text and to text inside the attributes of tags. Otherwise there would be no way to say something like <img src="..." alt="The town of Å, Norway" />, for example...
On Tue, 06 Jul 2004 17:01:51 -0500 "Nathan R. Yergler" nathan@yergler.net wrote:
Does Wikipedia have non-browser requests for the metadata blocked? Is there something we need to do in order to retrieve the metadata? Thanks for any help you can provide.
If I remember correctly, Wikipedia rejects requests that do not specify a User Agent; this might be the cause of your problems.
Andre Engels
wikitech-l@lists.wikimedia.org