Michael Bimmler wrote:
To put it bluntly, I dare suggest from a non-technical POV that the "htdig" (that's the name, isn't it?) experiment has failed. If we can only update our search index every 6 months or so, it is pointless to have it.
Yeah, it doesn't work as well as advertised.
Instead, I suggest that http://lists.wikimedia.org/robots.txt be modified as to allow Google (and other search engines) to crawl /pipermail/ again. I do not really see the privacy issues of this, nabble, gmane etc. are google-searchable as well and I really don't see the point in barring Google from our own archive.
For the meantime, I'm going to have to recommend not doing this (see my notes below for why).
As you note, it's already possible to search via third-party archives. It would probably not be difficult to replace the broken htdig search form with a link to a nice offsite archive, though.
If I am very honest, I do not even remember anymore, why we decided to bar Google from http://lists.wikimedia.org/pipermail.
Because:
a) The current mailman/pipermail system makes it a *huge* pain in the butt to remove mails from archives on request
b) I got tired of the volume of requests to remove mails from archives, with the consequent time required in handling them
c) With the wildly popular wikimedia.org domain out of the running, third-party list archives aren't as visible in general search engine results
d) Therefore, the volume of requests go down
e) and I don't feel bad turning down most of the remaining requests.
If and when mailman's archiving system is fixed up to make it quick & easy to take a mail out of archives (eg, *not* involving shutting down all mail processing, rebuilding an entire list's archives since 2001, and discovering that all the links are now broken because mailman's internal behavior has changed in the intervening years and it splits up messages differently), then I'll be happy to pop us back into general search engine indexes.
Was it due to privacy concerns? If so, which, and why is lists.wikimedia.orgas an archive different from Nabble/Gmane?
That'd be c) above.
-- brion vibber (brion @ wikimedia.org)