No subject
Mon Jan 19 21:31:31 UTC 2009
from timeit import Timer
import string, random
input =3D []
testSet =3D set()
testDict =3D dict()
def initAdd():
for i in range(1000):
input.append("".join(random.sample(string.letters, 40)))
def setAdd():
for e in input:
testSet.add(e)
def dictAdd():
for e in input:
testDict[e] =3D True
initAdd()
s =3D Timer('setAdd()', 'from __main__ import setAdd').timeit()
print int(s), 'seconds for set.add'
d =3D Timer('dictAdd()', 'from __main__ import dictAdd').timeit()
print int(d), 'seconds for dict.__setitem__'
notIn =3D []
def initContains():
for e in input:
notIn.append('0' + e[1:])
def setContains():
for e in input:
assert e in testSet
for e in notIn:
assert e not in testSet
def dictContains():
for e in input:
assert e in testDict
for e in notIn:
assert e not in testDict
initContains()
s =3D Timer('setContains()', 'from __main__ import setContains').timeit()
print int(s), 'seconds for set.__contains__'
d =3D Timer('dictContains()', 'from __main__ import dictContains').timeit()
print int(d), 'seconds for dict.__contains__'
# 312 seconds for set.add
# 224 seconds for dict.__setitem__
# 359 seconds for set.__contains__
# 362 seconds for dict.__contains__
As you can see, __contains__ shows the same performances between set and
dict.
I tried on 3 different hardwares: different absolute values, same relative
differences.
Asking around about that "slow set" rumor, I got told that set might have
performed quite bad in Python 2.4 . Well, I tried, and 2.4 is not
significantly slower than 2.5 on those operations (did the set improvements
got backported to 2.5 ?)
Anyway, bottom line:
* "bite your tongue a thousand times..."
* set is mutable, and has not been immutable ever in Python history
* dict.__setitem__ looks 25% faster than set.add
* key length (strings, from 5 to 100 characters long) does not appear to
affect run time
* Python 2.4 & 2.5 perform equally well wrt set & dict operations
--=20
Nicolas Dumazet =E2=80=94 NicDumZ [ n=C9=AAk.d=CC=AAymz ]
--0015175cda86fd59e80467af5236
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<br><br><div class=3D"gmail_quote">2009/4/16 Nicolas Dumazet <span dir=3D"l=
tr"><<a href=3D"mailto:nicdumz at gmail.com" target=3D"_blank">nicdumz at gmai=
l.com</a>></span><br><blockquote class=3D"gmail_quote" style=3D"border-l=
eft: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left:=
1ex;">
<br>I am currently running more systematic tests, I will let you know the r=
esults =3D)</blockquote></div><br><br>Oh well. I don't know where I got=
that key length dependance idea. Performance is absolutely not related to =
length of keys, from what I saw today.<br>
<br>From this test, dict. __setitem__ seem to be 25% faster than set.add on=
Python 2.5 :<br><br>from timeit import Timer<br>import string, random<br><=
br>input =3D []<br>testSet =3D set()<br>testDict =3D dict()<br><br>def init=
Add():<br>
=C2=A0=C2=A0=C2=A0 for i in range(1000):<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0 input.append("".join(random.sample(string.letters, 4=
0)))<br><br>def setAdd():<br>=C2=A0=C2=A0=C2=A0 for e in input:<br>=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 testSet.add(e)<br><br>def dictAdd():<b=
r>=C2=A0=C2=A0=C2=A0 for e in input:<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 testDict[e] =3D True<br><br>init=
Add()<br>s =3D Timer('setAdd()', 'from __main__ import setAdd&#=
39;).timeit()<br>print int(s), 'seconds for set.add'<br>d =3D Timer=
('dictAdd()', 'from __main__ import dictAdd').timeit()<br>
print int(d), 'seconds for dict.__setitem__'<br><br>notIn =3D []<br=
>def initContains():<br>=C2=A0=C2=A0=C2=A0 for e in input:<br>=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 notIn.append('0' + e[1:])<br><br>def=
setContains():<br>=C2=A0=C2=A0=C2=A0 for e in input:<br>=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0 assert e in testSet<br>
=C2=A0=C2=A0=C2=A0 for e in notIn:<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 assert e not in testSet<br><br>def dictContains():<br>=C2=A0=C2=A0=
=C2=A0 for e in input:<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 assert=
e in testDict<br>=C2=A0=C2=A0=C2=A0 for e in notIn:<br>=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0 assert e not in testDict<br><br>initContains()<br>
s =3D Timer('setContains()', 'from __main__ import setContains&=
#39;).timeit()<br>print int(s), 'seconds for set.__contains__'<br>d=
=3D Timer('dictContains()', 'from __main__ import dictContains=
').timeit()<br>
print int(d), 'seconds for dict.__contains__'<br><br># 312 seconds =
for set.add<br># 224 seconds for dict.__setitem__<br><br># 359 seconds for =
set.__contains__<br># 362 seconds for dict.__contains__<br><br>As you can s=
ee, __contains__ shows the same performances between set and dict.<br>
<br>I tried on 3 different hardwares: different absolute values, same relat=
ive differences.<br><br>Asking around about that "slow set" rumor=
, I got told that set might have performed quite bad in Python 2.4 . Well, =
I tried, and 2.4 is not significantly slower than 2.5 on those operations (=
did the set improvements got backported to 2.5 ?)<br>
<br><br><br>Anyway, bottom line:<br>* "bite your tongue a thousand tim=
es..."<br>* set is mutable, and has not been immutable ever in Python =
history<br>* dict.__setitem__ looks 25% faster than set.add<br>* key length=
(strings, from 5 to 100 characters long) does not appear to affect run tim=
e<br>
* Python 2.4 & 2.5 perform equally well wrt set & dict operations<b=
r clear=3D"all">
<br>-- <br>Nicolas Dumazet =E2=80=94 NicDumZ [ n=C9=AAk.d=CC=AAymz ]<br>
--0015175cda86fd59e80467af5236--
More information about the Pywikipedia-l
mailing list