New subject: RE : Internet Archive BOT

24 Jun 2020

Hi Pascal, all,

this is being discussed here:
https://en.wikipedia.org/wiki/User_talk:Cyberpower678 THe last response was
June 16, and it seems to focus on geo-blocking as the cause for
blacklisting (in case anyone feels called to help out the developer).

This bot performs incredible work and I hope it gets fixed soon!

Best,
Lodewijk

On Tue, Jun 23, 2020 at 5:04 AM Pascal Martin &lt;pmartin(a)linterweb.fr&gt; wrote:

...
  HI,

 My native language is French, automatic translation into English.
 This message follows the numerous detection of false 404 links by the
 Internet Archive robot because it is blacklisted on a lot of servers. Small
 details concerning the archiving service of Wikiwix (
 https://nl.wikipedia.org/wiki/Wikipedia:De_kroeg#Internet_Archive_Bot )
 It is based solely on this Javascript to be implemented since 2008 in
 French Wikipedia:
 https://fr.wikipedia.org/wiki/MediaWiki:Gadget-ArchiveLinks.js
 The advantage of this solution makes it possible to add other archiving
 sources, and does not modify the content of Wikipedia articles.
 New links are detected by 3 different means:
 • Annual recovery: https://dumps.wikimedia.org/backup-index.html,
 • Recovery on IRC and on the WEB of Recents Changes.
 And we also recommend clicking on the archive link as soon as the source
 is added by a contributor, this immediately generates storage of the link
 and allows you to test the rendering of the archived page.
 In addition to fighting 404 errors, this solution also offers the
 advantage of protecting against changes in content that may appear in the
 pages to be archived.
 Wikiwix strictly respects copyright, archiving is only done with the
 author's approval using the noarchive tag.
 Since 2015, I have been alerting about the deployment of the IA robot:
 2015:
 https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2015/Bots_and_gad…:
 the bot solution with modification of the template cache is currently
 exclusive to WayBackMachine, 2017:
 https://fr.wikipedia.org/wiki/Discussion_user:Pmartin#I_left_you_a_message! :
 attempted collaboration abort by the bot trainer and bot stopped following
 numerous false detections on page 404.
 The role of IABOT is to detect the links present in Wikipedia which are in
 errors 404, to find an archive in priority on the WayBack Machine, and to
 modify the articles to replace the dead link there.
 This process is not good because IABOT only allows one archive url to be
 stored on all the languages, which greatly favors the Wayback Machine, to
 the detriment of the different versions of the page. While the template
 should link to a page that would list all of the possible archives for a
 404 page.
 A week has been planned for the end of July 2020 to resolve the few
 stabilization problems that Wikiwix currently encounters, linked to the new
 solution which consumes only 30 euros of electricity per month, we can also
 support this week for a deployment of the solution on the NL part of
 Wikipedia.

 Could someone stop this bots, otherwise the false detection of links will
 become contagious for all projects?

 Pascal Martin
 _______________________________________________
 Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
 https://meta.wikimedia.org/wiki/Wikimedia-l
 New messages to: Wikimedia-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> 

Re: [Wikimedia-l] Internet Archive BOT