jidanni wrote:
To keep one's website's links fresh, one uses a linkchecker to detect broken links. But how is a linkchecker to check if a wikipedia article exists, given that this returns $ HEAD -H User-Agent: http://*.wikipedia.org/wiki/*%7Csed q 200 OK for any article, non-existent and existent, and even the 'our servers are experiencing technical problems' message. (-H avoids 403 Forbidden)
Should I make a list of the wikipedia URLs I want to check and send each to a API URL for it? Will this API URL return a "more correct" HTTP code?
Or must I do something like $ GET $URL|grep 'wiki.* does not currently have an article called .*, maybe it was deleted' && echo $URL Broken
Check the db. All existing pages appear on the 'page' table
The links are at pagelinks, templatelinks, imagelinks and categorylinks (although a non-existing category still has a function)