Feature Requests item #1500288, was opened at 2006-06-03 20:09
Message generated for change (Comment added) made by nobody
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=150028…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Have weblinkchecker.py check the Internet Archive for backup
Initial Comment:
weblinkchecker.py apparently has an option to take action on finding a broken link
(currently only to add something to a talk page; I haven't been able to get this to
work, though). But it would be even better if it could insert, in a comment or perhaps an
addendum after the broken link, a link to backups of that page in the Internet
Archive/Wayback Machine.
I don't think this enhancement would be backbreakingly difficult and troublesome. The
script would have to prepend "http://web.archive.org/web/" to the original URL,
check whether the string "Not in Archive." (or whatever the current error
message is) appears in the Internet Archive page. If it does, then simply carry on with
the rest of the links to be checked; if not, if the Archive *does* have something backed
up, then take some boilerplate like "The preceding URL appeared to be invalid to
weblinkchecker.py; however, backups of the URL can be found in the [[Internet Archive]]
$HERE. You may want to consider amending the original link to point to the archived copies
and not the live one.", replace $HERE with the URL prepended with the Archive bit,
and insert in a comment.
-maru
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2008-01-30 12:09
Message:
Logged In: NO
Isn't it possible to create a bot that checks when the external links
works again? In this uses the category with inaccessible external links.
When an external link is accessible again the bod removes the message from
the talkpage, the bot marks the talkpage with the template for speedy
deletion.
My apologise if I'm adding this message on the wrong page.
Regards,
Kenny (from the Dutch Wikipedia
http://nl.wikipedia.org/wiki/Gebruiker:Ken123 )
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2007-06-24 11:42
Message:
Logged In: NO
In the same vein, it would be good if WebCite
<http://www.webcitation.org/> archived pages were included as well. There's
apparently some nice programmatic ways of looking for archived URLs
according to
<http://www.webcitation.org/doc/WebCiteBestPracticesGuide.pdf>.
While I'm writing, it'd also be good if the bot would proactively archive
pages when they disappear and come back. Variable uptime to me bespeaks a
page that is likely to disappear permanently. It isn't hard either - it's
just
"www.webcitation.org/archive?url=" ++ url ++
"&email=foo(a)bar.com"
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2007-06-24 11:33
Message:
Logged In: NO
In the same vein, it would be good if WebCite
<http://www.webcitation.org/> archived pages were included as well. There's
apparently some nice programmatic ways of looking for archived URLs
according to
<http://www.webcitation.org/doc/WebCiteBestPracticesGuide.pdf>.
While I'm writing, it'd also be good if the bot would proactively archive
pages when they disappear and come back. Variable uptime to me bespeaks a
page that is likely to disappear permanently. It isn't hard either - it's
just
"www.webcitation.org/archive?url=" ++ url ++
"&email=foo(a)bar.com"
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=150028…