Fastest Way To Sort A Python 3.7+ Dictionary
Now that the insertion order of Python dictionaries is guaranteed starting in Python 3.7 (and in CPython 3.6), what is the best/fastest way to sort a dictionary - both by value and
Solution 1:
TL;DR: Best ways to sort by key or by value (respectively), in CPython 3.7:
{k: d[k] for k in sorted(d)}
{k: v for k,v in sorted(d.items(), key=itemgetter(1))}
Tested on a macbook with sys.version
:
3.7.0b4 (v3.7.0b4:eb96c37699, May 22018, 04:13:13)
[Clang 6.0 (clang-600.0.57)]
One-time setup with a dict of 1000 floats:
>>>import random>>>from operator import itemgetter>>>random.seed(123)>>>d = {random.random(): random.random() for i inrange(1000)}
Sorting numbers by key (best to worst):
>>>%timeit {k: d[k] for k insorted(d)}
# 296 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>>%timeit {k: d[k] for k insorted(d.keys())}
# 306 µs ± 9.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>>%timeit dict(sorted(d.items(), key=itemgetter(0)))
# 345 µs ± 4.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>>%timeit {k: v for k,v insorted(d.items(), key=itemgetter(0))}
# 359 µs ± 2.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>>%timeit dict(sorted(d.items(), key=lambda kv: kv[0]))
# 391 µs ± 8.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>>%timeit dict(sorted(d.items()))
# 409 µs ± 9.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>>%timeit {k: v for k,v insorted(d.items())}
# 420 µs ± 5.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>>%timeit {k: v for k,v insorted(d.items(), key=lambda kv: kv[0])}
# 432 µs ± 39.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Sorting numbers by value (best to worst):
>>> %timeit {k: v for k,v insorted(d.items(), key=itemgetter(1))}
# 355 µs ± 2.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>>> %timeit dict(sorted(d.items(), key=itemgetter(1)))
# 375 µs ± 31.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>>> %timeit {k: v for k,v insorted(d.items(), key=lambda kv: kv[1])}
# 393 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>>> %timeit dict(sorted(d.items(), key=lambda kv: kv[1]))
# 402 µs ± 9.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>>> %timeit {k: d[k] for k insorted(d, key=d.get)}
# 404 µs ± 3.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>>> %timeit {k: d[k] for k insorted(d, key=d.__getitem__)}
# 404 µs ± 20.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>>> %timeit {k: d[k] for k insorted(d, key=lambda k: d[k])}
# 480 µs ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
One-time setup with a large dict of strings:
>>>import random>>>from pathlib import Path>>>from operator import itemgetter>>>random.seed(456)>>>words = Path('/usr/share/dict/words').read_text().splitlines()>>>random.shuffle(words)>>>keys = words.copy()>>>random.shuffle(words)>>>values = words.copy()>>>d = dict(zip(keys, values))>>>list(d.items())[:5]
[('ragman', 'polemoscope'),
('fenite', 'anaesthetically'),
('pycnidiophore', 'Colubridae'),
('propagate', 'premiss'),
('postponable', 'Eriglossa')]
>>>len(d)
235886
Sorting a dict of strings by key:
>>>%timeit {k: d[k] for k insorted(d)}
# 387 ms ± 1.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>>%timeit {k: d[k] for k insorted(d.keys())}
# 387 ms ± 2.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>>%timeit dict(sorted(d.items(), key=itemgetter(0)))
# 461 ms ± 1.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>>%timeit dict(sorted(d.items(), key=lambda kv: kv[0]))
# 466 ms ± 2.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>>%timeit {k: v for k,v insorted(d.items(), key=itemgetter(0))}
# 488 ms ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>>%timeit {k: v for k,v insorted(d.items(), key=lambda kv: kv[0])}
# 536 ms ± 16.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>>%timeit dict(sorted(d.items()))
# 661 ms ± 9.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>>%timeit {k: v for k,v insorted(d.items())}
# 687 ms ± 5.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sorting a dict of strings by value:
>>> %timeit {k: v for k,v insorted(d.items(), key=itemgetter(1))}
# 468 ms ± 5.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>>> %timeit dict(sorted(d.items(), key=itemgetter(1)))
# 473 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>>> %timeit dict(sorted(d.items(), key=lambda kv: kv[1]))
# 492 ms ± 9.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>>> %timeit {k: v for k,v insorted(d.items(), key=lambda kv: kv[1])}
# 496 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>>> %timeit {k: d[k] for k insorted(d, key=d.__getitem__)}
# 533 ms ± 5.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>>> %timeit {k: d[k] for k insorted(d, key=d.get)}
# 544 ms ± 6.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>>> %timeit {k: d[k] for k insorted(d, key=lambda k: d[k])}
# 566 ms ± 5.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Note: Real-world data often contains long runs of already-sorted sequences, which Timsort algorithm can exploit. If sorting a dict lies on your fast path, then it's recommended to benchmark on your own platform with your own typical data before drawing any conclusions about the best approach. I have prepended a comment character (#
) on each timeit result so that IPython users can copy/paste the entire code block to re-run all the tests on their own platform.
Post a Comment for "Fastest Way To Sort A Python 3.7+ Dictionary"