Forbidden when retrieving metadata

List overview All Threads
Download

newer

older

zh-min-nan for...

es.wikipedia to UTF-8

Nathan R. Yergler

7 Jul 2004 7 Jul '04

4:01 a.m.

Hello,

I'm a software engineer for the Creative Commons and am working on our on-line license/RDF validator. We're currently working on adding <link> support for RDF retrieval, due in large part to Wikipedia's decision to use CC metadata to describe the FDL license. We have intial support working, but when we test with Wikipedia, we get a 403: Forbidden.

Our system parses the page and attempts to retrieve the metadata file seperately; specifically, http://en.wikipedia.org/w/wiki.phtml?title=Main_Page&action=creativecomm... for the Main_Page.

Does Wikipedia have non-browser requests for the metadata blocked? Is there something we need to do in order to retrieve the metadata? Thanks for any help you can provide.

Nathan R. Yergler

Show replies by date

Brion Vibber

7 Jul 7 Jul

4:30 a.m.

Nathan R. Yergler wrote:

...

I'm a software engineer for the Creative Commons and am working on our on-line license/RDF validator. We're currently working on adding <link> support for RDF retrieval, due in large part to Wikipedia's decision to use CC metadata to describe the FDL license.

Neat!

...

We have intial support working, but when we test with Wikipedia, we get a 403: Forbidden.

We do block some specific user-agent strings due to past robot abuse; be sure you're using a user-agent string that identifies your software rather than a generic one.

-- brion vibber (brion @ pobox.com)

Nathan R. Yergler

6:01 p.m.

Thanks for the help; yes, adding a User-Agent seemed to do the trick. I have, however, come up with another somewhat related question. Looking at the source to the Main Page, I see the metadata referenced in the following line:

However, when I attempt to retrieve the URL specified by href, I just get the main page again. However, if I replace "&" with a simple & (before action=...), I get the RDF. Is this by design for some reason, or a bug? Thanks.

Nathan R. Yergler

Brion Vibber wrote:

...

Nathan R. Yergler wrote:

...
I'm a software engineer for the Creative Commons and am working on our on-line license/RDF validator. We're currently working on adding

<link> support for RDF retrieval, due in large part to Wikipedia's decision to use CC metadata to describe the FDL license.

Neat!

...
We have intial support

...
working, but when we test with Wikipedia, we get a 403: Forbidden.

We do block some specific user-agent strings due to past robot abuse; be sure you're using a user-agent string that identifies your software rather than a generic one.

-- brion vibber (brion @ pobox.com)

Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Timwi

6:41 p.m.

Nathan R. Yergler wrote:

...

<link title="Creative Commons" type="application/rdf+xml" href="/w/wiki.phtml?title=Main_Page&action=creativecommons" rel="meta" />

However, when I attempt to retrieve the URL specified by href, I just get the main page again. However, if I replace "&" with a simple & (before action=...), I get the RDF. Is this by design for some reason, or a bug? Thanks.

That is how HTML works. Ampersands are escaped as &.

Timwi

Nathan R. Yergler

6:48 p.m.

Timwi wrote:

...

That is how HTML works. Ampersands are escaped as &.

I knew that was the case in the body of a document, but wasn't aware it applied to head elements as well. Thanks.

NRY

Timwi

6:52 p.m.

Nathan R. Yergler wrote:

...

Timwi wrote:

...
That is how HTML works. Ampersands are escaped as &.

I knew that was the case in the body of a document, but wasn't aware it applied to head elements as well. Thanks.

In general, it applies to both "normal" text and to text inside the attributes of tags. Otherwise there would be no way to say something like <img src="..." alt="The town of Å, Norway" />, for example...

Andre Engels

4:32 a.m.

On Tue, 06 Jul 2004 17:01:51 -0500 "Nathan R. Yergler" nathan@yergler.net wrote:

...

Does Wikipedia have non-browser requests for the metadata blocked? Is there something we need to do in order to retrieve the metadata? Thanks for any help you can provide.

If I remember correctly, Wikipedia rejects requests that do not specify a User Agent; this might be the cause of your problems.

Andre Engels

7472

Age (days ago)

7473

Last active (days ago)

wikitech-l@lists.wikimedia.org

6 comments

4 participants

tags (0)

participants (4)

Andre Engels
Brion Vibber
Nathan R. Yergler
Timwi