2009/4/16 Nicolas Dumazet <nicdumz(a)gmail.com>
I am currently running more systematic tests, I will let you know the
results =)
Oh well. I don't know where I got that key length dependance idea.
Performance is absolutely not related to length of keys, from what I saw
today.
From this test, dict. __setitem__ seem to be 25% faster
than set.add on
Python 2.5 :
from timeit import Timer
import string, random
input = []
testSet = set()
testDict = dict()
def initAdd():
for i in range(1000):
input.append("".join(random.sample(string.letters, 40)))
def setAdd():
for e in input:
testSet.add(e)
def dictAdd():
for e in input:
testDict[e] = True
initAdd()
s = Timer('setAdd()', 'from __main__ import setAdd').timeit()
print int(s), 'seconds for set.add'
d = Timer('dictAdd()', 'from __main__ import dictAdd').timeit()
print int(d), 'seconds for dict.__setitem__'
notIn = []
def initContains():
for e in input:
notIn.append('0' + e[1:])
def setContains():
for e in input:
assert e in testSet
for e in notIn:
assert e not in testSet
def dictContains():
for e in input:
assert e in testDict
for e in notIn:
assert e not in testDict
initContains()
s = Timer('setContains()', 'from __main__ import setContains').timeit()
print int(s), 'seconds for set.__contains__'
d = Timer('dictContains()', 'from __main__ import dictContains').timeit()
print int(d), 'seconds for dict.__contains__'
# 312 seconds for set.add
# 224 seconds for dict.__setitem__
# 359 seconds for set.__contains__
# 362 seconds for dict.__contains__
As you can see, __contains__ shows the same performances between set and
dict.
I tried on 3 different hardwares: different absolute values, same relative
differences.
Asking around about that "slow set" rumor, I got told that set might have
performed quite bad in Python 2.4 . Well, I tried, and 2.4 is not
significantly slower than 2.5 on those operations (did the set improvements
got backported to 2.5 ?)
Anyway, bottom line:
* "bite your tongue a thousand times..."
* set is mutable, and has not been immutable ever in Python history
* dict.__setitem__ looks 25% faster than set.add
* key length (strings, from 5 to 100 characters long) does not appear to
affect run time
* Python 2.4 & 2.5 perform equally well wrt set & dict operations
--
Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ]