2009/4/16 Nicolas Dumazet nicdumz@gmail.com
I am currently running more systematic tests, I will let you know the results =)
Oh well. I don't know where I got that key length dependance idea. Performance is absolutely not related to length of keys, from what I saw today.
From this test, dict. __setitem__ seem to be 25% faster than set.add on
Python 2.5 :
from timeit import Timer import string, random
input = [] testSet = set() testDict = dict()
def initAdd(): for i in range(1000): input.append("".join(random.sample(string.letters, 40)))
def setAdd(): for e in input: testSet.add(e)
def dictAdd(): for e in input: testDict[e] = True
initAdd() s = Timer('setAdd()', 'from __main__ import setAdd').timeit() print int(s), 'seconds for set.add' d = Timer('dictAdd()', 'from __main__ import dictAdd').timeit() print int(d), 'seconds for dict.__setitem__'
notIn = [] def initContains(): for e in input: notIn.append('0' + e[1:])
def setContains(): for e in input: assert e in testSet for e in notIn: assert e not in testSet
def dictContains(): for e in input: assert e in testDict for e in notIn: assert e not in testDict
initContains() s = Timer('setContains()', 'from __main__ import setContains').timeit() print int(s), 'seconds for set.__contains__' d = Timer('dictContains()', 'from __main__ import dictContains').timeit() print int(d), 'seconds for dict.__contains__'
# 312 seconds for set.add # 224 seconds for dict.__setitem__
# 359 seconds for set.__contains__ # 362 seconds for dict.__contains__
As you can see, __contains__ shows the same performances between set and dict.
I tried on 3 different hardwares: different absolute values, same relative differences.
Asking around about that "slow set" rumor, I got told that set might have performed quite bad in Python 2.4 . Well, I tried, and 2.4 is not significantly slower than 2.5 on those operations (did the set improvements got backported to 2.5 ?)
Anyway, bottom line: * "bite your tongue a thousand times..." * set is mutable, and has not been immutable ever in Python history * dict.__setitem__ looks 25% faster than set.add * key length (strings, from 5 to 100 characters long) does not appear to affect run time * Python 2.4 & 2.5 perform equally well wrt set & dict operations