[courtesy copy to foundation-l, though I suggest that discussion, if any, be centralised on wikitech-l]
Hi all, the search index for the mailinglist archives was last rebuilt in January. Now, after having made quite a few queries about this here and at other places, I learnt (and obviously had to accept) that rebuilding the search index is quite a resources-consuming process which resulted in crashes.
To put it bluntly, I dare suggest from a non-technical POV that the "htdig" (that's the name, isn't it?) experiment has failed. If we can only update our search index every 6 months or so, it is pointless to have it.
Instead, I suggest that http://lists.wikimedia.org/robots.txt be modified as to allow Google (and other search engines) to crawl /pipermail/ again. I do not really see the privacy issues of this, nabble, gmane etc. are google-searchable as well and I really don't see the point in barring Google from our own archive.
If I am very honest, I do not even remember anymore, why we decided to bar Google from http://lists.wikimedia.org/pipermail. Was it due to privacy concerns? If so, which, and why is lists.wikimedia.orgas an archive different from Nabble/Gmane?
Thanks, Michael