[Pywikipedia-l] [ pywikipediabot-Feature Requests-1500288 ] Have weblinkchecker.py check the Internet Archive for backup

SourceForge.net noreply at sourceforge.net
Thu Jan 31 01:03:21 UTC 2008


Feature Requests item #1500288, was opened at 2006-06-04 05:09
Message generated for change (Comment added) made by wikipedian
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1500288&group_id=93107

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Have weblinkchecker.py check the Internet Archive for backup

Initial Comment:
weblinkchecker.py apparently has an option to take action on finding a broken link (currently only to add something to a talk page; I haven't been able to get this to work, though). But it would be even better if it could insert, in a comment or perhaps an addendum after the broken link, a link to backups of that page in the Internet Archive/Wayback Machine.

I don't think this enhancement would be backbreakingly difficult and troublesome. The script would have to prepend  "http://web.archive.org/web/" to the original URL, check whether the string "Not in Archive." (or whatever the current error message is) appears in the Internet Archive page. If it does, then simply carry on with the rest of the links to be checked; if not, if the Archive *does* have something backed up, then take some boilerplate like "The preceding URL appeared to be invalid to weblinkchecker.py; however, backups of the URL can be found in the [[Internet Archive]] $HERE. You may want to consider amending the original link to point to the archived copies and not the live one.", replace $HERE with the URL prepended with the Archive bit, and insert in a comment.

-maru


----------------------------------------------------------------------

>Comment By: Daniel Herding (wikipedian)
Date: 2008-01-31 02:03

Message:
Logged In: YES 
user_id=880694
Originator: NO

By the way, I have already implemented Internet Archive lookup long ago.
webcitation.org is not yet supported yet, though.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2008-01-30 21:09

Message:
Logged In: NO 

Isn't it possible to create a bot that checks when the external links
works again? In this uses the category with inaccessible external links.
When an external link is accessible again the bod removes the message from
the talkpage, the bot marks the talkpage with the template for speedy
deletion.

My apologise if I'm adding this message on the wrong page.

Regards,
Kenny (from the Dutch Wikipedia
http://nl.wikipedia.org/wiki/Gebruiker:Ken123 ) 

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2007-06-24 20:42

Message:
Logged In: NO 

In the same vein, it would be good if WebCite
<http://www.webcitation.org/> archived pages were included as well. There's
apparently some nice programmatic ways of looking for archived URLs
according to
<http://www.webcitation.org/doc/WebCiteBestPracticesGuide.pdf>.

While I'm writing, it'd also be good if the bot would proactively archive
pages when they disappear and come back. Variable uptime to me bespeaks a
page that is likely to disappear permanently. It isn't hard either - it's
just
 "www.webcitation.org/archive?url=" ++ url ++ "&email=foo at bar.com"

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2007-06-24 20:33

Message:
Logged In: NO 

In the same vein, it would be good if WebCite
<http://www.webcitation.org/> archived pages were included as well. There's
apparently some nice programmatic ways of looking for archived URLs
according to
<http://www.webcitation.org/doc/WebCiteBestPracticesGuide.pdf>.

While I'm writing, it'd also be good if the bot would proactively archive
pages when they disappear and come back. Variable uptime to me bespeaks a
page that is likely to disappear permanently. It isn't hard either - it's
just
 "www.webcitation.org/archive?url=" ++ url ++ "&email=foo at bar.com"

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1500288&group_id=93107



More information about the Pywikipedia-l mailing list