Have a look at
http://84.114.164.84:8080/
It seems to be a set of tools: crawler, parsers, indexers... To allow
a search. In short, an experimental search engine. Or a wannabe
commercial engine maybe, given that a .net domain is registered:
http://www.paxle.net/
Funny fact, you seem to be able to pause the crawling process here:
http://84.114.164.84:8080/status#dcrawler . When I first reached the
page, all the processes were active, and I wasn't prompted for auth
when I asked to pause them, while trying to reach other parts of the
site prompts for authentication.
This tool seem to have a blacklist:
"org.paxle.filter.blacklist.impl.BlacklistFilter". If you're able to
reach the author, you can probably ask him to blacklist your tools.
Question is _how_ : I haven't been able to find an email or any
information on this.
I found a bugtracker which seems to be active:
https://bugs.pxl.li/my_view_page.php but I don't know if this is a
tracker for... the engine (which can be used by multiple hosts), or
for the project.
Good luck !
2009/1/12 RYU Cheol <rcheol(a)gmail.com>om>:
for caching?
HTTP caches reqeust the time stamp for checking consistency.
-Cheol
2009/1/12 Ilmari Karonen <nospam(a)vyznev.net>et>:
Daniel Schwen wrote:
84.114.164.84 - - [11/Jan/2009:14:03:43
+0000] "HEAD
/%7Ekolossos/wp-world/umkreis.php?la=pt&lon=16.285&lat=48.134444&rang=50&map=1
HTTP/1.1" 200 0 "-" "Jakarta Commons-HttpClient/3.1"
84.114.164.84 - - [11/Jan/2009:14:03:49
+0000] "HEAD
/~para/earth.php?latdegdec=48.134444&londegdec=16.285&scale=300000
HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1"
84.114.164.84 - - [11/Jan/2009:14:03:49
+0000] "HEAD
/~kolossos/wp-world/umkreis.php?la=nl&lon=16.285&lat=48.134444&rang=50&map=1
HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1"
84.114.164.84 - - [11/Jan/2009:14:03:50
+0000] "HEAD
/~kolossos/wp-world/umkreis.php?la=pt&lon=16.285&lat=48.134444&rang=50&map=1
HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1"
What is going on here? Proxy?
...and why is it making HEAD requests?
--
Ilmari Karonen
_______________________________________________
Toolserver-l mailing list
Toolserver-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
_______________________________________________
Toolserver-l mailing list
Toolserver-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
--
Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ]