2009/4/16 Nicolas Dumazet <nicdumz@gmail.com>

I am currently running more systematic tests, I will let you know the results =)


Oh well. I don't know where I got that key length dependance idea. Performance is absolutely not related to length of keys, from what I saw today.

From this test, dict. __setitem__ seem to be 25% faster than set.add on Python 2.5 :

from timeit import Timer
import string, random

input = []
testSet = set()
testDict = dict()

def initAdd():
    for i in range(1000):
        input.append("".join(random.sample(string.letters, 40)))

def setAdd():
    for e in input:
        testSet.add(e)

def dictAdd():
    for e in input:
        testDict[e] = True

initAdd()
s = Timer('setAdd()', 'from __main__ import setAdd').timeit()
print int(s), 'seconds for set.add'
d = Timer('dictAdd()', 'from __main__ import dictAdd').timeit()
print int(d), 'seconds for dict.__setitem__'

notIn = []
def initContains():
    for e in input:
        notIn.append('0' + e[1:])

def setContains():
    for e in input:
        assert e in testSet
    for e in notIn:
        assert e not in testSet

def dictContains():
    for e in input:
        assert e in testDict
    for e in notIn:
        assert e not in testDict

initContains()
s = Timer('setContains()', 'from __main__ import setContains').timeit()
print int(s), 'seconds for set.__contains__'
d = Timer('dictContains()', 'from __main__ import dictContains').timeit()
print int(d), 'seconds for dict.__contains__'

# 312 seconds for set.add
# 224 seconds for dict.__setitem__

# 359 seconds for set.__contains__
# 362 seconds for dict.__contains__

As you can see, __contains__ shows the same performances between set and dict.

I tried on 3 different hardwares: different absolute values, same relative differences.

Asking around about that "slow set" rumor, I got told that set might have performed quite bad in Python 2.4 . Well, I tried, and 2.4 is not significantly slower than 2.5 on those operations (did the set improvements got backported to 2.5 ?)



Anyway, bottom line:
* "bite your tongue a thousand times..."
* set is mutable, and has not been immutable ever in Python history
* dict.__setitem__ looks 25% faster than set.add
* key length (strings, from 5 to 100 characters long) does not appear to affect run time
* Python 2.4 & 2.5 perform equally well wrt set & dict operations

--
Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ]