How can I check if a page exists from a program?
I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?). The only way I can find at the moment is the Date and Last-Modified response headers are the same if the article doesn't exist yet, but this seems a bit too hackish.
Any suggestions?
Jim
Jim Higson wrote:
How can I check if a page exists from a program?
I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?). The only way I can find at the moment is the Date and Last-Modified response headers are the same if the article doesn't exist yet, but this seems a bit too hackish.
Oops, actually the Last-Modified header is always one hour (to the second) before Date if the article doesn't exist.
So testing for it just got hackier :)
The 200 error is there to stop a certain vandalbot; it will not be shown if you send the cookie of a logged-in user (the latter because pywikipediabot got the 200 error as well).
Andre Engels
On Fri, 25 Feb 2005 17:27:31 +0000, Jim Higson jh@333.org wrote:
How can I check if a page exists from a program?
I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?). The only way I can find at the moment is the Date and Last-Modified response headers are the same if the article doesn't exist yet, but this seems a bit too hackish.
Any suggestions?
Jim Higson wrote:
How can I check if a page exists from a program?
You can hit Special:Export/Pagename; if the response contains no <page> elements, the page does not exist.
I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?).
You get a 200 because the wiki script exists, and hasn't yet been special cased to return a 404 response for pages which don't exist.
The only way I can find at the moment is the Date and Last-Modified response headers are the same if the article doesn't exist yet, but this seems a bit too hackish.
[...]
Oops, actually the Last-Modified header is always one hour (to the second) before Date if the article doesn't exist.
I can't reproduce this; for pages that don't exist there is no Last-Modified header at all. Can you show a sample request and returned headers?
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Jim Higson wrote:
How can I check if a page exists from a program?
You can hit Special:Export/Pagename; if the response contains no <page> elements, the page does not exist.
I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?).
You get a 200 because the wiki script exists, and hasn't yet been special cased to return a 404 response for pages which don't exist.
The only way I can find at the moment is the Date and Last-Modified response headers are the same if the article doesn't exist yet, but this seems a bit too hackish.
[...]
Oops, actually the Last-Modified header is always one hour (to the second) before Date if the article doesn't exist.
I can't reproduce this; for pages that don't exist there is no Last-Modified header at all. Can you show a sample request and returned headers?
My 'application' uses XMLHTTP to make the request from a web page. Since this can only be made from a page on the same domain I put up a quick example:
Visit this in a recent version of Mozilla or similar and it should spit the headers to the page.
Jim
Brion Vibber wrote:
Jim Higson wrote:
Oops, actually the Last-Modified header is always one hour (to the second) before Date if the article doesn't exist.
I can't reproduce this; for pages that don't exist there is no Last-Modified header at all. Can you show a sample request and returned headers?
Seems to be a weird hack that was placed into the 1.4 release branch and not the development branch.
I'm not entirely sure why it's there, but it's clearly wrong.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Brion Vibber wrote:
Jim Higson wrote:
Oops, actually the Last-Modified header is always one hour (to the second) before Date if the article doesn't exist.
I can't reproduce this; for pages that don't exist there is no Last-Modified header at all. Can you show a sample request and returned headers?
Seems to be a weird hack that was placed into the 1.4 release branch and not the development branch.
I'm not entirely sure why it's there, but it's clearly wrong.
I'm running 1.4b6
Not getting 404s is quite a problem with my project - I've mentioned before, I'm trying to write a mediawiki presentation layer client-side reimplementation, as an experiment into very low spec web servers. I'm giving a presentation on Monday and will probably post a link to my work shortly after.
As it stands, my software marks no links with class=new because all HEAD requests are returning 200.
Jim
Not getting 404s is quite a problem with my project - I've mentioned before, I'm trying to write a mediawiki presentation layer client-side reimplementation, as an experiment into very low spec web servers. I'm giving a presentation on Monday and will probably post a link to my work shortly after.
Sounds a lot like an idea I had and was planning to implement in some time. So is it pure JS with XMLHTTPRequest that you're using for this presentation layer? I have a somehow working MediaWiki-to-HTML converter written in JS which I'm using for client-side previews in edit pages. Have you implemented something like this as well? If you think it could be useful for your project please let me know. I'm very intrigued about your work.
Pedro
Pedro Fayolle wrote:
Not getting 404s is quite a problem with my project - I've mentioned before, I'm trying to write a mediawiki presentation layer client-side reimplementation, as an experiment into very low spec web servers. I'm giving a presentation on Monday and will probably post a link to my work shortly after.
Sounds a lot like an idea I had and was planning to implement in some time. So is it pure JS with XMLHTTPRequest that you're using for this presentation layer? I have a somehow working MediaWiki-to-HTML converter written in JS which I'm using for client-side previews in edit pages. Have you implemented something like this as well? If you think it could be useful for your project please let me know. I'm very intrigued about your work.
I think it is better demonstrated than explained, so I'll do my best to get a work in progress up for a few hours later today.
Basically, it's a wikitext to XML recursive decent (almost proper) parser and XML to XHTML converter. From the XML I'm generating a DOM identical to the usual mediawiki one and using the existing stylesheets, so it mostly looks the same as the PHP interface.
It doesn't just use XMLHTTP, each page has it's own URL, so the address bar changes and everything is bookmarkable. But the browser only receives a stub, and builds the page itself. The page is built bit-by-bit so the user can start reading the first part while the rest is being built.
Editing has real-time previews, although I'm still ironing out a few bugs there. Previews are done without any HTTP requests etc.
I think it is better demonstrated than explained, so I'll do my best to get a work in progress up for a few hours later today.
Basically, it's a wikitext to XML recursive decent (almost proper) parser and XML to XHTML converter. From the XML I'm generating a DOM identical to the usual mediawiki one and using the existing stylesheets, so it mostly looks the same as the PHP interface.
It doesn't just use XMLHTTP, each page has it's own URL, so the address bar changes and everything is bookmarkable. But the browser only receives a stub, and builds the page itself. The page is built bit-by-bit so the user can start reading the first part while the rest is being built.
Editing has real-time previews, although I'm still ironing out a few bugs there. Previews are done without any HTTP requests etc.
So it's just like what I was planning to do, only done right :o)
I can't wait to see it working. BTW, can it be "plugged" into any running MediaWiki, like, say, Wikipedia? Or does it need it's own MediaWiki set up?
Wish you best of lucks on this, sounds truly amazing.
Pedro
Pedro Fayolle wrote:
I think it is better demonstrated than explained, so I'll do my best to get a work in progress up for a few hours later today.
Basically, it's a wikitext to XML recursive decent (almost proper) parser and XML to XHTML converter. From the XML I'm generating a DOM identical to the usual mediawiki one and using the existing stylesheets, so it mostly looks the same as the PHP interface.
It doesn't just use XMLHTTP, each page has it's own URL, so the address bar changes and everything is bookmarkable. But the browser only receives a stub, and builds the page itself. The page is built bit-by-bit so the user can start reading the first part while the rest is being built.
Editing has real-time previews, although I'm still ironing out a few bugs there. Previews are done without any HTTP requests etc.
So it's just like what I was planning to do, only done right :o)
I can't wait to see it working. BTW, can it be "plugged" into any running MediaWiki, like, say, Wikipedia? Or does it need it's own MediaWiki set up?
Sort of. It runs on top of a mediawiki, but because of the XMLHTTP security model it has to be on the same domain as the wiki, so I can't just put it up on my box as a gateway to wikipedia. I wouldn't let it make edits to wikipedia until it has proved stable anyway, but it would have been nice if I could have done a read-only gateway.
At this point I'm not aiming to roll it out on a major wiki, there's too much that you can't do from the client side (most the special pages for a start). That situation might change but it would require quite a lot of work to make the server return unpresented special pages. The aim is to demonstrate serving of dynamic services from very low spec web servers, because almost everything is static.
Wish you best of lucks on this, sounds truly amazing.
Pedro
Brion Vibber wrote:
Jim Higson wrote:
How can I check if a page exists from a program?
You can hit Special:Export/Pagename; if the response contains no <page> elements, the page does not exist.
Do you know any way that does not involve getting the whole article text?
I've looked at sending HEAD requests to Specail:Export and turned up some unexpected behaviour. Like action=raw, it always returns 200, but for articles with a space in the name you can tell if it is new because Content-Length is always 100.
For articles without a space in the name Specail:Export export never sends Content-Length to HEADs :)
Of course, this isn't reliable at all. Is there any chance action=raw could be made to return 404, or Content-Length: 0 for non-existing articles?
-- Jim
Jim Higson wrote:
Brion Vibber wrote:
Jim Higson wrote:
How can I check if a page exists from a program?
You can hit Special:Export/Pagename; if the response contains no <page> elements, the page does not exist.
Do you know any way that does not involve getting the whole article text?
I've looked at sending HEAD requests to Specail:Export and turned up some unexpected behaviour. Like action=raw, it always returns 200, but for articles with a space in the name you can tell if it is new because Content-Length is always 100.
For articles without a space in the name Specail:Export export never sends Content-Length to HEADs :)
Of course, this isn't reliable at all. Is there any chance action=raw could be made to return 404, or Content-Length: 0 for non-existing articles?
Little update (very hackish!)
You can force the wikimedia server to return the Content-Length field by inserting a space into the URL, for example:
http://localhost/wiki/index.php/Special:Export/%20dancing 200 Date: Sun, 27 Feb 2005 01:15:40 GMT Server: Apache/2.0.51 (Fedora) X-Powered-By: PHP/4.3.10 Content-Encoding: gzip Vary: Accept-Encoding Content-Length: 347 Connection: close Content-Type: application/xml; charset=utf-8
This at least gives a consistant (at least until this oddness is fixed) way to tell if a page exits, so I know to colour my links red or blue.
Jim
Jim Higson wrote in gmane.science.linguistics.wikipedia.technical:
This at least gives a consistant (at least until this oddness is fixed) way to tell if a page exits, so I know to colour my links red or blue.
are you going to make an http request for every link in the page?
Jim
kate.
Kate Turner wrote:
Jim Higson wrote in gmane.science.linguistics.wikipedia.technical:
This at least gives a consistant (at least until this oddness is fixed) way to tell if a page exits, so I know to colour my links red or blue.
are you going to make an http request for every link in the page?
At the moment, yes, while I'm trying to go as far as possible sitting on top of the existing mediawiki. Ideally there'd be some way to post the server a list of articles, and get back a list of which ones exist.
You can see this in action in the demo I posted earlier.
Jim
Brion Vibber wrote:
Jim Higson wrote:
I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?).
You get a 200 because the wiki script exists, and hasn't yet been special cased to return a 404 response for pages which don't exist.
Just hacked this into HEAD branch: http://mail.wikipedia.org/pipermail/mediawiki-cvs/2005-February/006720.html
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Brion Vibber wrote:
Jim Higson wrote:
I've tried doing a HEAD request to the page url with &action=raw, but that returns 200 regardless if the page exists (bug or feature?).
You get a 200 because the wiki script exists, and hasn't yet been special cased to return a 404 response for pages which don't exist.
Just hacked this into HEAD branch:
http://mail.wikipedia.org/pipermail/mediawiki-cvs/2005-February/006720.html
-- brion vibber (brion @ pobox.com)
Thanks very much - when this filters down I'll be able to remove my current super-hacky page detection.
-- Jim
wikitech-l@lists.wikimedia.org