@Lucas: cidr-trie was exactly what I wanted; thank you!
@Strainu: In our use case (fawiki), the number of distinct IPs that make an edit in each day is not too large (usually a few hundred). Therefore, the memory intensity of using a CIDR-trie is minimal.
The code at [1] has been updated to include caching in the way that I had desired.
Thanks again,
Huji
[1] https://github.com/PersianWikipedia/fawikibot/blob/master/HujiBot/findproxy....
-----------
I'm very curious if you can run at Wikipedia scale with such a trie in memory on a normal computer (e.g. with only tens of GiB of memory). Please let us know if you actually get this into production (or just submit the script for inclusion in the framework, it sounds really useful)
Strainu
Pe vineri, 12 iulie 2019, Lucas Werkmeister <mail at lucaswerkmeister.de https://lists.wikimedia.org/mailman/listinfo/pywikibot> a scris:
- You probably want to use a trie <https://en.wikipedia.org/wiki/Trie https://en.wikipedia.org/wiki/Trie> for
*>* this – I found several available Python implementations, but I don’t know *>* what their advantages or disadvantages are, so I’ll just list them in *>* alphabetical order: *>>* - cidr-tree <https://github.com/Figglewatts/cidr-trie https://github.com/Figglewatts/cidr-trie> *>* - py-radix <https://github.com/Figglewatts/cidr-trie https://github.com/Figglewatts/cidr-trie> *>* - pysubnettree <https://github.com/zeek/pysubnettree https://github.com/zeek/pysubnettree> *>* - pytricia <https://github.com/jsommers/pytricia https://github.com/jsommers/pytricia> *>>* Cheers, *>* Lucas *>* On 12.07.19 04:43, Huji Lee wrote: *>