On Mon, Jul 7, 2014 at 5:21 AM, James Salsman jsalsman@gmail.com wrote:
Kevin Gorman wrote:
Regarding the IA: they have a significant interest in working with the Wikimedia projects, a lot more experience than the Wikimedia projects
have
caching absolutely tremendous quantities of data, a willinness to handle
a
degree of legal risk that would be inappropriate for the Wikimedia
projects
to take on....
Because they censor things retroactively when requested by new domain owners' robots.txt,
<Note I am replying in my personal capacity as an enwiki editor, and nothing here at all represents the views of WMF or anyone else>
This point shouldn't get lost in the various other issues of more dubious veracity and/or applicability raised in the original message.
I've seen cases where domain ownership changes or a major corporate restructuring results in a domain being completely reorganized or even redirected wholesale to some other domain. And the robots.txt for the new version of the site denies everything, likely because the new owners don't want the redirects or other old content showing up in Google searches. But this has the unfortunate side effect that IA removes all the old content from public access.
I really wish that IA would reconsider their policy of *automatically* retroactively honoring robots.txt.
Brad Jorsch (Anomie), 07/07/2014 17:37:
And the robots.txt for the new version of the site denies everything, likely because the new owners don't want the redirects or other old content showing up in Google searches. But this has the unfortunate side effect that IA removes all the old content from public access.
This is not correct. If you can reproduce it, please file a bug, but it's not how it's supposed or said to work. I've pasted some links where you can find additional information like this at https://archive.org/post/1019415/retroactive-robotstxt-removal-of-past-crawl... (also reposting an elaborated version of my message of this morning).
Nemo
On Mon, Jul 7, 2014 at 1:49 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Brad Jorsch (Anomie), 07/07/2014 17:37:
And the robots.txt for the new version of the site denies everything, likely because the new owners
don't
want the redirects or other old content showing up in Google searches.
But
this has the unfortunate side effect that IA removes all the old content from public access.
This is not correct. If you can reproduce it, please file a bug, but it's not how it's supposed or said to work. I've pasted some links where you can find additional information like this at
https://archive.org/post/1019415/retroactive-robotstxt-removal-of-past-crawl... (also reposting an elaborated version of my message of this morning).
<Still my own personal views, in no way representing any position of WMF or anyone else>
I'm confused. You say this is not correct, but then you post a link to a post of your own that does not refute it and that has many links to people confirming it.
On 7 July 2014 21:08, Brad Jorsch (Anomie) bjorsch@wikimedia.org wrote:
On Mon, Jul 7, 2014 at 1:49 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Brad Jorsch (Anomie), 07/07/2014 17:37:
And the robots.txt for the new version of the site denies everything, likely because the new owners
don't
want the redirects or other old content showing up in Google searches.
But
this has the unfortunate side effect that IA removes all the old content from public access.
This is not correct. If you can reproduce it, please file a bug, but it's not how it's supposed or said to work. I've pasted some links where you can find additional information like this at https://archive.org/post/1019415/retroactive-robotstxt-removal-of-past-crawl... (also reposting an elaborated version of my message of this morning).
<Still my own personal views, in no way representing any position of WMF or anyone else> I'm confused. You say this is not correct, but then you post a link to a post of your own that does not refute it and that has many links to people confirming it.
Indeed, I was about to note the same. Nemo, there are multiple links from that page confirming that IA does retroactive takedowns. When you say they don't, do you have anything to back up that claim?
- d.
wikimedia-l@lists.wikimedia.org