You probably want to use a trie for this – I found several available Python implementations, but I don’t know what their advantages or disadvantages are, so I’ll just list them in alphabetical order:
I am working on a bot that fetches a list of anonymous editors on fawiki, uses WHOIS to retrieve more info about that IP, and uses a number of online APIs to check if the IP is a proxy or not.
I would like to improve the code by implementing a CIDR cache, so that if I run whois on 188.8.131.52 and determine that its ASN range is 184.108.40.206/24 and then I encounter 220.127.116.11 in the next iteration of my for loop, I would quickly determine this IP also belongs to the same range and skip the WHOIS part for it.
The search space would consist of IP ranges like "18.104.22.168 - 22.214.171.124" (these are the beginning and end IP addresses of the 126.96.36.199/24 range). Obviously, we can convert these IPs to Hex to make comparisons easier. Given an IP like 188.8.131.52, we need the object to efficiently determine if it already has an IP range that encompasses this given IP and if so, return the previously cached details for that IP pair. If not, we will store that in cache.
The part that I am not fully clear about is the following: how can I avoid having to loop through every range in the cache? Is there a way to create a hash function that checks two inequality comparisons efficiently?
_______________________________________________ pywikibot mailing list firstname.lastname@example.org https://lists.wikimedia.org/mailman/listinfo/pywikibot