Pe sâmbătă, 13 iulie 2019, Maarten Dammers <maarten(a)mdammers.nl> a scris:
In my day job I'm a network engineer. Nothing smaller than a /24 gets
routed on the internet. I would just do a quick and dirty approach: Ignore
the last octet. So cache based on /24. If you want to go more complicated
you can loose the length. An ipv4 address is 32 bit. A /24 says: Network is
24 bits and the host part is 8 bits. So for a /23 it's 23 bits of network
and 9 bits of host. It's always on the bit boundary so a /24 is alway from
0 (network) to 255 (broadcast). Just Google a bit to find posts like
2014/05/15/network-binary-math-explained . So comparison is very easy and
IPv4 is easy, you can just go with a bit map and be done with it on a
decent pc. The problems are IPv6 and usage fragmentation, as described
How are you going to deal with providers that announce
large chunks of ip
space (like a /13) that are used for all sorts of things? I assume you want
to use INET objects and not ROUTE objects? Be aware that mass harvesting of
databases like RIPE isn't allowed. Also the quality of these objects differ
greatly depending on the LIR/country/RIR.
I suspect the other apis used in the script are going to split these
networks a lot, thus my concern with running a trie at Wikipedia scale.
Maybe there's a way to split the ipv6 space just enough to be feasible to
use a bitmap as well?
On 12-07-19 04:43, Huji Lee wrote:
I am working on a bot that fetches a list of anonymous editors on fawiki,
uses WHOIS to retrieve more info about that IP, and uses a number of online
APIs to check if the IP is a proxy or not.
I would like to improve the code by implementing a CIDR cache, so that if
I run whois on 22.214.171.124 and determine that its ASN range is 126.96.36.199/24 and
then I encounter 188.8.131.52 in the next iteration of my for loop, I would
quickly determine this IP also belongs to the same range and skip the WHOIS
part for it.
The search space would consist of IP ranges like "184.108.40.206 - 220.127.116.11"
(these are the beginning and end IP addresses of the 18.104.22.168/24 range).
Obviously, we can convert these IPs to Hex to make comparisons easier.
Given an IP like 22.214.171.124, we need the object to efficiently determine if it
already has an IP range that encompasses this given IP and if so, return
the previously cached details for that IP pair. If not, we will store that
The part that I am not fully clear about is the following: how can I avoid
having to loop through every range in the cache? Is there a way to create a
hash function that checks two inequality comparisons efficiently?