On Sun, Mar 10, 2013 at 8:53 PM, Platonides Platonides@gmail.com wrote:
I'm not convinced about [[en:MediaWiki_talk:*]] and [[en:Template_talk:*]], they can bring quite a bit of noise (similarly for [[en:Wikipedia:Village_pump_(technical)]]). I see how interesting discussions could be happening there, though.
The tabs in the search results page (sorry I didn't mention them in the previous email) can be used to filter results to more relevant content, if desired. I think that might help coping with noise.
Besides feedback on whether the engine works as you'd expect, I would like
to start some discussion about the ability for Google's bots to crawl
some
of the resources that are currently included in the URL filters, but
return
no results. For example, the IRC logs at bots.wmflabs.org/~wm-bot/logs/. Some workarounds are used (e.g. using github for code search since gitweb isn't crawlable) but that isn't possible for all resources. What can we
do
to improve the situation?
Do we really want Google to index them?
Why log them publicly if we don't make them searchable? Either we're committed to being open or we're not... having a public but hard-to-use archive seems somewhat contradictory to me.
--Waldir