While looking at the access statistics of the Query-to-map tool I discovered that a large portion of the hits are generated by one single austrian DSL IP (90000 hits yesterday (9% of all hits that day), 55000 hits so far today, ).
Seemingly random queries to lots of tools. Example:
84.114.164.84 - - [11/Jan/2009:14:03:43 +0000] "HEAD /%7Ekolossos/wp-world/umkreis.php?la=pt&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 200 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:49 +0000] "HEAD /~para/earth.php?latdegdec=48.134444&londegdec=16.285&scale=300000 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:49 +0000] "HEAD /~kolossos/wp-world/umkreis.php?la=nl&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:50 +0000] "HEAD /~kolossos/wp-world/umkreis.php?la=pt&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1"
What is going on here? Proxy? Daniel
P.S.: there is a webserver on that IP. Looks like a broken Mediawiki installation, a broken forum installation. No further info though.
See if you can find out which ISP that IP address belongs to because i know TPG runs almost everyones activity though proxies and its known to break (eg: X-Forwarded-For randomly disables) and i believe Internode also has a proxy for a few users depending on their state.
-Peachey
Daniel Schwen wrote:
84.114.164.84 - - [11/Jan/2009:14:03:43 +0000] "HEAD /%7Ekolossos/wp-world/umkreis.php?la=pt&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 200 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:49 +0000] "HEAD /~para/earth.php?latdegdec=48.134444&londegdec=16.285&scale=300000 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:49 +0000] "HEAD /~kolossos/wp-world/umkreis.php?la=nl&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:50 +0000] "HEAD /~kolossos/wp-world/umkreis.php?la=pt&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1"
What is going on here? Proxy?
...and why is it making HEAD requests?
for caching?
HTTP caches reqeust the time stamp for checking consistency.
-Cheol
2009/1/12 Ilmari Karonen nospam@vyznev.net:
Daniel Schwen wrote:
84.114.164.84 - - [11/Jan/2009:14:03:43 +0000] "HEAD /%7Ekolossos/wp-world/umkreis.php?la=pt&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 200 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:49 +0000] "HEAD /~para/earth.php?latdegdec=48.134444&londegdec=16.285&scale=300000 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:49 +0000] "HEAD /~kolossos/wp-world/umkreis.php?la=nl&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:50 +0000] "HEAD /~kolossos/wp-world/umkreis.php?la=pt&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1"
What is going on here? Proxy?
...and why is it making HEAD requests?
-- Ilmari Karonen
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Have a look at http://84.114.164.84:8080/
It seems to be a set of tools: crawler, parsers, indexers... To allow a search. In short, an experimental search engine. Or a wannabe commercial engine maybe, given that a .net domain is registered: http://www.paxle.net/
Funny fact, you seem to be able to pause the crawling process here: http://84.114.164.84:8080/status#dcrawler . When I first reached the page, all the processes were active, and I wasn't prompted for auth when I asked to pause them, while trying to reach other parts of the site prompts for authentication.
This tool seem to have a blacklist: "org.paxle.filter.blacklist.impl.BlacklistFilter". If you're able to reach the author, you can probably ask him to blacklist your tools. Question is _how_ : I haven't been able to find an email or any information on this.
I found a bugtracker which seems to be active: https://bugs.pxl.li/my_view_page.php but I don't know if this is a tracker for... the engine (which can be used by multiple hosts), or for the project.
Good luck !
2009/1/12 RYU Cheol rcheol@gmail.com:
for caching?
HTTP caches reqeust the time stamp for checking consistency.
-Cheol
2009/1/12 Ilmari Karonen nospam@vyznev.net:
Daniel Schwen wrote:
84.114.164.84 - - [11/Jan/2009:14:03:43 +0000] "HEAD /%7Ekolossos/wp-world/umkreis.php?la=pt&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 200 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:49 +0000] "HEAD /~para/earth.php?latdegdec=48.134444&londegdec=16.285&scale=300000 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:49 +0000] "HEAD /~kolossos/wp-world/umkreis.php?la=nl&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1" 84.114.164.84 - - [11/Jan/2009:14:03:50 +0000] "HEAD /~kolossos/wp-world/umkreis.php?la=pt&lon=16.285&lat=48.134444&rang=50&map=1 HTTP/1.1" 301 0 "-" "Jakarta Commons-HttpClient/3.1"
What is going on here? Proxy?
...and why is it making HEAD requests?
-- Ilmari Karonen
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
On Mon, Jan 12, 2009 at 5:35 AM, Nicolas Dumazet nicdumz@gmail.com wrote:
It seems to be a set of tools: crawler, parsers, indexers... To allow a search. In short, an experimental search engine. Or a wannabe commercial engine maybe, given that a .net domain is registered: http://www.paxle.net/
...
This tool seem to have a blacklist: "org.paxle.filter.blacklist.impl.BlacklistFilter". If you're able to reach the author, you can probably ask him to blacklist your tools. Question is _how_ : I haven't been able to find an email or any information on this.
I don't think contacting the author should be necessary. If the bot is obeying robots.txt and other relevant directives, use those to block it. If it's not obeying robots exclusion standards, it should be blocked site-wide with an informative error message.
toolserver-l@lists.wikimedia.org