Testing if a page exists

List overview All Threads
Download

newer

older

Do we have server monitors for...

PNG thumbnail image sizes

Jim Higson

25 Feb 2005 25 Feb '05

5:27 p.m.

How can I check if a page exists from a program?

I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?). The only way I can find at the moment is the Date and Last-Modified response headers are the same if the article doesn't exist yet, but this seems a bit too hackish.

Any suggestions?

Jim

Show replies by date

Jim Higson

25 Feb 25 Feb

5:36 p.m.

Jim Higson wrote:

...

How can I check if a page exists from a program?

I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?). The only way I can find at the moment is the Date and Last-Modified response headers are the same if the article doesn't exist yet, but this seems a bit too hackish.

Oops, actually the Last-Modified header is always one hour (to the second) before Date if the article doesn't exist.

So testing for it just got hackier :)

-- Jim

Andre Engels

7:08 p.m.

The 200 error is there to stop a certain vandalbot; it will not be shown if you send the cookie of a logged-in user (the latter because pywikipediabot got the 200 error as well).

Andre Engels

On Fri, 25 Feb 2005 17:27:31 +0000, Jim Higson jh@333.org wrote:

...

How can I check if a page exists from a program?

I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?). The only way I can find at the moment is the Date and Last-Modified response headers are the same if the article doesn't exist yet, but this seems a bit too hackish.

Any suggestions?

Brion Vibber

8:13 p.m.

Jim Higson wrote:

...

How can I check if a page exists from a program?

You can hit Special:Export/Pagename; if the response contains no <page> elements, the page does not exist.

...

I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?).

You get a 200 because the wiki script exists, and hasn't yet been special cased to return a 404 response for pages which don't exist.

...

The only way I can find at the moment is the Date and Last-Modified response headers are the same if the article doesn't exist yet, but this seems a bit too hackish.

[...]

...

Oops, actually the Last-Modified header is always one hour (to the second) before Date if the article doesn't exist.

I can't reproduce this; for pages that don't exist there is no Last-Modified header at all. Can you show a sample request and returned headers?

-- brion vibber (brion @ pobox.com)

Jim Higson

26 Feb 26 Feb

12:39 a.m.

Brion Vibber wrote:

...

Jim Higson wrote:

...
How can I check if a page exists from a program?

You can hit Special:Export/Pagename; if the response contains no <page> elements, the page does not exist.

...
I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?).

You get a 200 because the wiki script exists, and hasn't yet been special cased to return a 404 response for pages which don't exist.

...
The only way I can find at the moment is the Date and Last-Modified response headers are the same if the article doesn't exist yet, but this seems a bit too hackish.

[...]

...
Oops, actually the Last-Modified header is always one hour (to the second) before Date if the article doesn't exist.

I can't reproduce this; for pages that don't exist there is no Last-Modified header at all. Can you show a sample request and returned headers?

My 'application' uses XMLHTTP to make the request from a web page. Since this can only be made from a page on the same domain I put up a quick example:

http://81.5.150.113/raw-200

Visit this in a recent version of Mozilla or similar and it should spit the headers to the page.

Jim

Brion Vibber

1:09 a.m.

...

Brion Vibber wrote:

...
Jim Higson wrote:

...
Oops, actually the Last-Modified header is always one hour (to the second) before Date if the article doesn't exist.

I can't reproduce this; for pages that don't exist there is no Last-Modified header at all. Can you show a sample request and returned headers?

Seems to be a weird hack that was placed into the 1.4 release branch and not the development branch.

I'm not entirely sure why it's there, but it's clearly wrong.

-- brion vibber (brion @ pobox.com)

Jim Higson

10:21 a.m.

Brion Vibber wrote:

...

...
Brion Vibber wrote:

...
Jim Higson wrote:

...
Oops, actually the Last-Modified header is always one hour (to the second) before Date if the article doesn't exist.

I can't reproduce this; for pages that don't exist there is no Last-Modified header at all. Can you show a sample request and returned headers?

Seems to be a weird hack that was placed into the 1.4 release branch and not the development branch.

I'm not entirely sure why it's there, but it's clearly wrong.

I'm running 1.4b6

Not getting 404s is quite a problem with my project - I've mentioned before, I'm trying to write a mediawiki presentation layer client-side reimplementation, as an experiment into very low spec web servers. I'm giving a presentation on Monday and will probably post a link to my work shortly after.

As it stands, my software marks no links with class=new because all HEAD requests are returning 200.

Jim

Pedro Fayolle

1:26 p.m.

...

Not getting 404s is quite a problem with my project - I've mentioned before, I'm trying to write a mediawiki presentation layer client-side reimplementation, as an experiment into very low spec web servers. I'm giving a presentation on Monday and will probably post a link to my work shortly after.

Sounds a lot like an idea I had and was planning to implement in some time. So is it pure JS with XMLHTTPRequest that you're using for this presentation layer? I have a somehow working MediaWiki-to-HTML converter written in JS which I'm using for client-side previews in edit pages. Have you implemented something like this as well? If you think it could be useful for your project please let me know. I'm very intrigued about your work.

Pedro

Jim Higson

1:39 p.m.

Pedro Fayolle wrote:

...

...
Not getting 404s is quite a problem with my project - I've mentioned before, I'm trying to write a mediawiki presentation layer client-side reimplementation, as an experiment into very low spec web servers. I'm giving a presentation on Monday and will probably post a link to my work shortly after.

Sounds a lot like an idea I had and was planning to implement in some time. So is it pure JS with XMLHTTPRequest that you're using for this presentation layer? I have a somehow working MediaWiki-to-HTML converter written in JS which I'm using for client-side previews in edit pages. Have you implemented something like this as well? If you think it could be useful for your project please let me know. I'm very intrigued about your work.

I think it is better demonstrated than explained, so I'll do my best to get a work in progress up for a few hours later today.

Basically, it's a wikitext to XML recursive decent (almost proper) parser and XML to XHTML converter. From the XML I'm generating a DOM identical to the usual mediawiki one and using the existing stylesheets, so it mostly looks the same as the PHP interface.

It doesn't just use XMLHTTP, each page has it's own URL, so the address bar changes and everything is bookmarkable. But the browser only receives a stub, and builds the page itself. The page is built bit-by-bit so the user can start reading the first part while the rest is being built.

Editing has real-time previews, although I'm still ironing out a few bugs there. Previews are done without any HTTP requests etc.

-- Jim

Pedro Fayolle

1:55 p.m.

...

I think it is better demonstrated than explained, so I'll do my best to get a work in progress up for a few hours later today.

Basically, it's a wikitext to XML recursive decent (almost proper) parser and XML to XHTML converter. From the XML I'm generating a DOM identical to the usual mediawiki one and using the existing stylesheets, so it mostly looks the same as the PHP interface.

It doesn't just use XMLHTTP, each page has it's own URL, so the address bar changes and everything is bookmarkable. But the browser only receives a stub, and builds the page itself. The page is built bit-by-bit so the user can start reading the first part while the rest is being built.

Editing has real-time previews, although I'm still ironing out a few bugs there. Previews are done without any HTTP requests etc.

So it's just like what I was planning to do, only done right :o)

I can't wait to see it working. BTW, can it be "plugged" into any running MediaWiki, like, say, Wikipedia? Or does it need it's own MediaWiki set up?

Wish you best of lucks on this, sounds truly amazing.

Pedro

Jim Higson

2:08 p.m.

Pedro Fayolle wrote:

...

...
I think it is better demonstrated than explained, so I'll do my best to get a work in progress up for a few hours later today.

Basically, it's a wikitext to XML recursive decent (almost proper) parser and XML to XHTML converter. From the XML I'm generating a DOM identical to the usual mediawiki one and using the existing stylesheets, so it mostly looks the same as the PHP interface.

It doesn't just use XMLHTTP, each page has it's own URL, so the address bar changes and everything is bookmarkable. But the browser only receives a stub, and builds the page itself. The page is built bit-by-bit so the user can start reading the first part while the rest is being built.

Editing has real-time previews, although I'm still ironing out a few bugs there. Previews are done without any HTTP requests etc.

So it's just like what I was planning to do, only done right :o)

I can't wait to see it working. BTW, can it be "plugged" into any running MediaWiki, like, say, Wikipedia? Or does it need it's own MediaWiki set up?

Sort of. It runs on top of a mediawiki, but because of the XMLHTTP security model it has to be on the same domain as the wiki, so I can't just put it up on my box as a gateway to wikipedia. I wouldn't let it make edits to wikipedia until it has proved stable anyway, but it would have been nice if I could have done a read-only gateway.

At this point I'm not aiming to roll it out on a major wiki, there's too much that you can't do from the client side (most the special pages for a start). That situation might change but it would require quite a lot of work to make the server return unpresented special pages. The aim is to demonstrate serving of dynamic services from very low spec web servers, because almost everything is static.

...

Wish you best of lucks on this, sounds truly amazing.

Pedro

Jim Higson

27 Feb 27 Feb

1:10 a.m.

Brion Vibber wrote:

...

Jim Higson wrote:

...
How can I check if a page exists from a program?

You can hit Special:Export/Pagename; if the response contains no <page> elements, the page does not exist.

Do you know any way that does not involve getting the whole article text?

I've looked at sending HEAD requests to Specail:Export and turned up some unexpected behaviour. Like action=raw, it always returns 200, but for articles with a space in the name you can tell if it is new because Content-Length is always 100.

For articles without a space in the name Specail:Export export never sends Content-Length to HEADs :)

Of course, this isn't reliable at all. Is there any chance action=raw could be made to return 404, or Content-Length: 0 for non-existing articles?

-- Jim

Jim Higson

1:18 a.m.

Jim Higson wrote:

...

Brion Vibber wrote:

...
Jim Higson wrote:

...
How can I check if a page exists from a program?

You can hit Special:Export/Pagename; if the response contains no <page> elements, the page does not exist.

Do you know any way that does not involve getting the whole article text?

I've looked at sending HEAD requests to Specail:Export and turned up some unexpected behaviour. Like action=raw, it always returns 200, but for articles with a space in the name you can tell if it is new because Content-Length is always 100.

For articles without a space in the name Specail:Export export never sends Content-Length to HEADs :)

Of course, this isn't reliable at all. Is there any chance action=raw could be made to return 404, or Content-Length: 0 for non-existing articles?

Little update (very hackish!)

You can force the wikimedia server to return the Content-Length field by inserting a space into the URL, for example:

http://localhost/wiki/index.php/Special:Export/%20dancing 200 Date: Sun, 27 Feb 2005 01:15:40 GMT Server: Apache/2.0.51 (Fedora) X-Powered-By: PHP/4.3.10 Content-Encoding: gzip Vary: Accept-Encoding Content-Length: 347 Connection: close Content-Type: application/xml; charset=utf-8

This at least gives a consistant (at least until this oddness is fixed) way to tell if a page exits, so I know to colour my links red or blue.

Jim

Kate Turner

2:35 a.m.

Jim Higson wrote in gmane.science.linguistics.wikipedia.technical:

...

This at least gives a consistant (at least until this oddness is fixed) way to tell if a page exits, so I know to colour my links red or blue.

are you going to make an http request for every link in the page?

...

Jim

kate.

Jim Higson

11:30 a.m.

Kate Turner wrote:

...

Jim Higson wrote in gmane.science.linguistics.wikipedia.technical:

...
This at least gives a consistant (at least until this oddness is fixed) way to tell if a page exits, so I know to colour my links red or blue.

are you going to make an http request for every link in the page?

At the moment, yes, while I'm trying to go as far as possible sitting on top of the existing mediawiki. Ideally there'd be some way to post the server a list of articles, and get back a list of which ones exist.

You can see this in action in the demo I posted earlier.

Jim

Brion Vibber

28 Feb 28 Feb

5:39 a.m.

Brion Vibber wrote:

...

Jim Higson wrote:

...
I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?).

You get a 200 because the wiki script exists, and hasn't yet been special cased to return a 404 response for pages which don't exist.

Just hacked this into HEAD branch: http://mail.wikipedia.org/pipermail/mediawiki-cvs/2005-February/006720.html

-- brion vibber (brion @ pobox.com)

Jim Higson

1:05 p.m.

Brion Vibber wrote:

...

Brion Vibber wrote:

...
Jim Higson wrote:

...
I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?).

You get a 200 because the wiki script exists, and hasn't yet been special cased to return a 404 response for pages which don't exist.

Just hacked this into HEAD branch:

http://mail.wikipedia.org/pipermail/mediawiki-cvs/2005-February/006720.html

...

-- brion vibber (brion @ pobox.com)

Thanks very much - when this filters down I'll be able to remove my current super-hacky page detection.

-- Jim

7084

Age (days ago)

7087

Last active (days ago)

wikitech-l@lists.wikimedia.org

16 comments

5 participants

tags (0)

participants (5)

Andre Engels
Brion Vibber
Jim Higson
Kate Turner
Pedro Fayolle