Have a look at http://84.114.164.84:8080/
It seems to be a set of tools: crawler, parsers, indexers... To allow a search. In short, an experimental search engine. Or a wannabe commercial engine maybe, given that a .net domain is registered: http://www.paxle.net/
Funny fact, you seem to be able to pause the crawling process here: http://84.114.164.84:8080/status#dcrawler . When I first reached the page, all the processes were active, and I wasn't prompted for auth when I asked to pause them, while trying to reach other parts of the site prompts for authentication.
This tool seem to have a blacklist: "org.paxle.filter.blacklist.impl.BlacklistFilter". If you're able to reach the author, you can probably ask him to blacklist your tools. Question is _how_ : I haven't been able to find an email or any information on this.
I found a bugtracker which seems to be active: https://bugs.pxl.li/my_view_page.php but I don't know if this is a tracker for... the engine (which can be used by multiple hosts), or for the project.
Good luck !
2009/1/12 RYU Cheol rcheol@gmail.com:
for caching?
HTTP caches reqeust the time stamp for checking consistency.
-Cheol
2009/1/12 Ilmari Karonen nospam@vyznev.net:
Daniel Schwen wrote:
84.114.164.84 - - [11/Jan/2009:14:03:43 +0000] "HEAD /%7Ekolossos/wp-world/umkreis.php?la=pt&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 200 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:49 +0000] "HEAD /~para/earth.php?latdegdec=48.134444&londegdec=16.285&scale=300000 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:49 +0000] "HEAD /~kolossos/wp-world/umkreis.php?la=nl&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:50 +0000] "HEAD /~kolossos/wp-world/umkreis.php?la=pt&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1"
What is going on here? Proxy?
...and why is it making HEAD requests?
-- Ilmari Karonen
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l