[Wikimedia-l] Internet archive and automatic retroactive robots.txt (was Re: Internet archive and strategy survey (was Re: 24 TB for User:Dispenser on Tool Labs please))

7 Jul 2014


      On Mon, Jul 7, 2014 at 5:21 AM, James Salsman jsalsman@gmail.com wrote:
...
Kevin Gorman wrote:
...
Regarding the IA: they have a significant interest in working with the
Wikimedia projects, a lot more experience than the Wikimedia projects
have
...
caching absolutely tremendous quantities of data, a willinness to handle
a
...
degree of legal risk that would be inappropriate for the Wikimedia
projects
...
to take on....
Because they censor things retroactively when requested by new domain
owners' robots.txt,
<Note I am replying in my personal capacity as an enwiki editor, and
nothing here at all represents the views of WMF or anyone else>
This point shouldn't get lost in the various other issues of more dubious
veracity and/or applicability raised in the original message.
I've seen cases where domain ownership changes or a major corporate
restructuring results in a domain being completely reorganized or even
redirected wholesale to some other domain. And the robots.txt for the new
version of the site denies everything, likely because the new owners don't
want the redirects or other old content showing up in Google searches. But
this has the unfortunate side effect that IA removes all the old content
from public access.
I really wish that IA would reconsider their policy of *automatically*
retroactively honoring robots.txt.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[Wikimedia-l] Internet archive and automatic retroactive robots.txt (was Re: Internet archive and strategy survey (was Re: 24 TB for User:Dispenser on Tool Labs please))