jidanni@jidanni.org schreef:
To keep one's website's links fresh, one uses a linkchecker to detect broken links. But how is a linkchecker to check if a wikipedia article exists, given that this returns $ HEAD -H User-Agent: http://*.wikipedia.org/wiki/*%7Csed q 200 OK for any article, non-existent and existent, and even the 'our servers are experiencing technical problems' message. (-H avoids 403 Forbidden)
Should I make a list of the wikipedia URLs I want to check and send each to a API URL for it? Will this API URL return a "more correct" HTTP code?
Or must I do something like $ GET $URL|grep 'wiki.* does not currently have an article called .*, maybe it was deleted' && echo $URL Broken
First, save up a list of articles you wanna check. When you've got a couple hundred of them (or have run out of articles to check), issue an API request like:
http://en.wikipedia.org/w/api.php?action=query&titles=Dog%7CWP:WAX%7CJid...
It returns some basic data (namespace and existence) for every article. For production use, you probably want pure XML, so use &format=xml . Alternatively, you can use format=json or format=php .
Roan Kattouw (Catrope)