Feature Requests item #1500288, was opened at 2006-06-04 03:09 Message generated for change (Comment added) made by purodha You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1500288...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None
Status: Closed
Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Have weblinkchecker.py check the Internet Archive for backup
Initial Comment: weblinkchecker.py apparently has an option to take action on finding a broken link (currently only to add something to a talk page; I haven't been able to get this to work, though). But it would be even better if it could insert, in a comment or perhaps an addendum after the broken link, a link to backups of that page in the Internet Archive/Wayback Machine.
I don't think this enhancement would be backbreakingly difficult and troublesome. The script would have to prepend "http://web.archive.org/web/" to the original URL, check whether the string "Not in Archive." (or whatever the current error message is) appears in the Internet Archive page. If it does, then simply carry on with the rest of the links to be checked; if not, if the Archive *does* have something backed up, then take some boilerplate like "The preceding URL appeared to be invalid to weblinkchecker.py; however, backups of the URL can be found in the [[Internet Archive]] $HERE. You may want to consider amending the original link to point to the archived copies and not the live one.", replace $HERE with the URL prepended with the Archive bit, and insert in a comment.
-maru
----------------------------------------------------------------------
Comment By: Purodha B Blissenbach (purodha)
Date: 2009-01-23 16:53
Message: The described feature has been implemented meanwhile.
----------------------------------------------------------------------
Comment By: Daniel Herding (wikipedian) Date: 2008-01-31 01:03
Message: Logged In: YES user_id=880694 Originator: NO
By the way, I have already implemented Internet Archive lookup long ago. webcitation.org is not yet supported yet, though.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody) Date: 2008-01-30 20:09
Message: Logged In: NO
Isn't it possible to create a bot that checks when the external links works again? In this uses the category with inaccessible external links. When an external link is accessible again the bod removes the message from the talkpage, the bot marks the talkpage with the template for speedy deletion.
My apologise if I'm adding this message on the wrong page.
Regards, Kenny (from the Dutch Wikipedia http://nl.wikipedia.org/wiki/Gebruiker:Ken123 )
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody) Date: 2007-06-24 18:42
Message: Logged In: NO
In the same vein, it would be good if WebCite http://www.webcitation.org/ archived pages were included as well. There's apparently some nice programmatic ways of looking for archived URLs according to http://www.webcitation.org/doc/WebCiteBestPracticesGuide.pdf.
While I'm writing, it'd also be good if the bot would proactively archive pages when they disappear and come back. Variable uptime to me bespeaks a page that is likely to disappear permanently. It isn't hard either - it's just "www.webcitation.org/archive?url=" ++ url ++ "&email=foo@bar.com"
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody) Date: 2007-06-24 18:33
Message: Logged In: NO
In the same vein, it would be good if WebCite http://www.webcitation.org/ archived pages were included as well. There's apparently some nice programmatic ways of looking for archived URLs according to http://www.webcitation.org/doc/WebCiteBestPracticesGuide.pdf.
While I'm writing, it'd also be good if the bot would proactively archive pages when they disappear and come back. Variable uptime to me bespeaks a page that is likely to disappear permanently. It isn't hard either - it's just "www.webcitation.org/archive?url=" ++ url ++ "&email=foo@bar.com"
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1500288...
pywikipedia-l@lists.wikimedia.org