No subject


Mon Jan 19 21:31:31 UTC 2009


from timeit import Timer
import string, random

input =3D []
testSet =3D set()
testDict =3D dict()

def initAdd():
    for i in range(1000):
        input.append("".join(random.sample(string.letters, 40)))

def setAdd():
    for e in input:
        testSet.add(e)

def dictAdd():
    for e in input:
        testDict[e] =3D True

initAdd()
s =3D Timer('setAdd()', 'from __main__ import setAdd').timeit()
print int(s), 'seconds for set.add'
d =3D Timer('dictAdd()', 'from __main__ import dictAdd').timeit()
print int(d), 'seconds for dict.__setitem__'

notIn =3D []
def initContains():
    for e in input:
        notIn.append('0' + e[1:])

def setContains():
    for e in input:
        assert e in testSet
    for e in notIn:
        assert e not in testSet

def dictContains():
    for e in input:
        assert e in testDict
    for e in notIn:
        assert e not in testDict

initContains()
s =3D Timer('setContains()', 'from __main__ import setContains').timeit()
print int(s), 'seconds for set.__contains__'
d =3D Timer('dictContains()', 'from __main__ import dictContains').timeit()
print int(d), 'seconds for dict.__contains__'

# 312 seconds for set.add
# 224 seconds for dict.__setitem__

# 359 seconds for set.__contains__
# 362 seconds for dict.__contains__

As you can see, __contains__ shows the same performances between set and
dict.

I tried on 3 different hardwares: different absolute values, same relative
differences.

Asking around about that "slow set" rumor, I got told that set might have
performed quite bad in Python 2.4 . Well, I tried, and 2.4 is not
significantly slower than 2.5 on those operations (did the set improvements
got backported to 2.5 ?)



Anyway, bottom line:
* "bite your tongue a thousand times..."
* set is mutable, and has not been immutable ever in Python history
* dict.__setitem__ looks 25% faster than set.add
* key length (strings, from 5 to 100 characters long) does not appear to
affect run time
* Python 2.4 & 2.5 perform equally well wrt set & dict operations

--=20
Nicolas Dumazet =E2=80=94 NicDumZ [ n=C9=AAk.d=CC=AAymz ]

--0015175cda86fd59e80467af5236
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<br><br><div class=3D"gmail_quote">2009/4/16 Nicolas Dumazet <span dir=3D"l=
tr">&lt;<a href=3D"mailto:nicdumz at gmail.com" target=3D"_blank">nicdumz at gmai=
l.com</a>&gt;</span><br><blockquote class=3D"gmail_quote" style=3D"border-l=
eft: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left:=
 1ex;">


<br>I am currently running more systematic tests, I will let you know the r=
esults =3D)</blockquote></div><br><br>Oh well. I don&#39;t know where I got=
 that key length dependance idea. Performance is absolutely not related to =
length of keys, from what I saw today.<br>


<br>From this test, dict. __setitem__ seem to be 25% faster than set.add on=
 Python 2.5 :<br><br>from timeit import Timer<br>import string, random<br><=
br>input =3D []<br>testSet =3D set()<br>testDict =3D dict()<br><br>def init=
Add():<br>


=C2=A0=C2=A0=C2=A0 for i in range(1000):<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0 input.append(&quot;&quot;.join(random.sample(string.letters, 4=
0)))<br><br>def setAdd():<br>=C2=A0=C2=A0=C2=A0 for e in input:<br>=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 testSet.add(e)<br><br>def dictAdd():<b=
r>=C2=A0=C2=A0=C2=A0 for e in input:<br>


=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 testDict[e] =3D True<br><br>init=
Add()<br>s =3D Timer(&#39;setAdd()&#39;, &#39;from __main__ import setAdd&#=
39;).timeit()<br>print int(s), &#39;seconds for set.add&#39;<br>d =3D Timer=
(&#39;dictAdd()&#39;, &#39;from __main__ import dictAdd&#39;).timeit()<br>


print int(d), &#39;seconds for dict.__setitem__&#39;<br><br>notIn =3D []<br=
>def initContains():<br>=C2=A0=C2=A0=C2=A0 for e in input:<br>=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 notIn.append(&#39;0&#39; + e[1:])<br><br>def=
 setContains():<br>=C2=A0=C2=A0=C2=A0 for e in input:<br>=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0 assert e in testSet<br>


=C2=A0=C2=A0=C2=A0 for e in notIn:<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 assert e not in testSet<br><br>def dictContains():<br>=C2=A0=C2=A0=
=C2=A0 for e in input:<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 assert=
 e in testDict<br>=C2=A0=C2=A0=C2=A0 for e in notIn:<br>=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0 assert e not in testDict<br><br>initContains()<br>


s =3D Timer(&#39;setContains()&#39;, &#39;from __main__ import setContains&=
#39;).timeit()<br>print int(s), &#39;seconds for set.__contains__&#39;<br>d=
 =3D Timer(&#39;dictContains()&#39;, &#39;from __main__ import dictContains=
&#39;).timeit()<br>


print int(d), &#39;seconds for dict.__contains__&#39;<br><br># 312 seconds =
for set.add<br># 224 seconds for dict.__setitem__<br><br># 359 seconds for =
set.__contains__<br># 362 seconds for dict.__contains__<br><br>As you can s=
ee, __contains__ shows the same performances between set and dict.<br>


<br>I tried on 3 different hardwares: different absolute values, same relat=
ive differences.<br><br>Asking around about that &quot;slow set&quot; rumor=
, I got told that set might have performed quite bad in Python 2.4 . Well, =
I tried, and 2.4 is not significantly slower than 2.5 on those operations (=
did the set improvements got backported to 2.5 ?)<br>


<br><br><br>Anyway, bottom line:<br>* &quot;bite your tongue a thousand tim=
es...&quot;<br>* set is mutable, and has not been immutable ever in Python =
history<br>* dict.__setitem__ looks 25% faster than set.add<br>* key length=
 (strings, from 5 to 100 characters long) does not appear to affect run tim=
e<br>

* Python 2.4 &amp; 2.5 perform equally well wrt set &amp; dict operations<b=
r clear=3D"all">
<br>-- <br>Nicolas Dumazet =E2=80=94 NicDumZ [ n=C9=AAk.d=CC=AAymz ]<br>

--0015175cda86fd59e80467af5236--



More information about the Pywikipedia-l mailing list