HI,
My native language is French, automatic translation into English.
This message follows the numerous detection of false 404 links by the Internet Archive
robot because it is blacklisted on a lot of servers. Small details concerning the
archiving service of Wikiwix
(
https://nl.wikipedia.org/wiki/Wikipedia:De_kroeg#Internet_Archive_Bot )
It is based solely on this Javascript to be implemented since 2008 in French
Wikipedia:
https://fr.wikipedia.org/wiki/MediaWiki:Gadget-ArchiveLinks.js
The advantage of this solution makes it possible to add other archiving sources, and does
not modify the content of Wikipedia articles.
New links are detected by 3 different means:
• Annual recovery:
https://dumps.wikimedia.org/backup-index.html,
• Recovery on IRC and on the WEB of Recents Changes.
And we also recommend clicking on the archive link as soon as the source is added by a
contributor, this immediately generates storage of the link and allows you to test the
rendering of the archived page.
In addition to fighting 404 errors, this solution also offers the advantage of protecting
against changes in content that may appear in the pages to be archived.
Wikiwix strictly respects copyright, archiving is only done with the author's approval
using the noarchive tag.
Since 2015, I have been alerting about the deployment of the IA robot:
2015:
https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2015/Bots_and_gad…: the
bot solution with modification of the template cache is currently exclusive to
WayBackMachine,
2017:
https://fr.wikipedia.org/wiki/Discussion_user:Pmartin#I_left_you_a_message! :
attempted collaboration abort by the bot trainer and bot stopped following numerous false
detections on page 404.
The role of IABOT is to detect the links present in Wikipedia which are in errors 404, to
find an archive in priority on the WayBack Machine, and to modify the articles to replace
the dead link there.
This process is not good because IABOT only allows one archive url to be stored on all the
languages, which greatly favors the Wayback Machine, to the detriment of the different
versions of the page. While the template should link to a page that would list all of the
possible archives for a 404 page.
A week has been planned for the end of July 2020 to resolve the few stabilization problems
that Wikiwix currently encounters, linked to the new solution which consumes only 30 euros
of electricity per month, we can also support this week for a deployment of the solution
on the NL part of Wikipedia.
Could someone stop this bots, otherwise the false detection of links will become
contagious for all projects?
Pascal Martin