Skip to content Skip to sidebar Skip to footer

Using More Worker Processes Than There Are Cores

This example from PYMOTW gives an example of using multiprocessing.Pool() where the processes argument (number of worker processes) passed is twice the number of cores on the machi

Solution 1:

Doing this can make sense if your job is not purely cpu-bound, but also involves some I/O.

The computation in your example is also too short for a reasonable benchmark, the overhead of just creating more processes in the first place dominates.

I modified your calculation to let it iterate over a range of 10M, while calculating an if-condition and let it take a nap in case it evaluates to True, which happens n_sleep-times. That way a total sleep of sleep_sec_total can be injected into the computation.

# default_cpus.py
import time
import multiprocessing


def do_calculation(iterations, n_sleep, sleep_sec):
    for i in range(iterations):
        if i % (iterations / n_sleep) == 0:
            time.sleep(sleep_sec)


def main(sleep_sec_total):

    iterations = int(10e6)
    n_sleep = 100
    sleep_sec = sleep_sec_total / n_sleep
    tasks = [(iterations, n_sleep, sleep_sec)] * 20

    with multiprocessing.Pool(
        maxtasksperchild=2,
    ) as pool:
        pool.starmap(do_calculation, tasks)

# double_cpus.py
...

def main(sleep_sec_total):

    iterations = int(10e6)
    n_sleep = 100
    sleep_sec = sleep_sec_total / n_sleep
    tasks = [(iterations, n_sleep, sleep_sec)] * 20

    with multiprocessing.Pool(
        processes=multiprocessing.cpu_count() * 2,
        maxtasksperchild=2,
    ) as pool:
        pool.starmap(do_calculation, tasks)

I ran the benchmark with sleep_sec_total=0 (purely cpu-bound) and with sleep_sec_total=2 for both modules.

Results with sleep_sec_total=0:

$ python -m timeit -n 5 -r 3 'import default_cpus; default_cpus.main(0)'
5 loops, best of 3: 15.2 sec per loop

$ python -m timeit -n 5 -r 3 'import double_cpus; double_cpus.main(0)'
5 loops, best of 3: 15.2 sec per loop

Given a reasonable computation-size, you'll observe close to no difference between default- and double-cpus for a purely cpu-bound task. Here it happened, that both tests had the same best-time.

Results with sleep_sec_total=2:

$ python -m timeit -n 5 -r 3 'import default_cpus; default_cpus.main(2)'
5 loops, best of 3: 20.5 sec per loop
$ python -m timeit -n 5 -r 3 'import double_cpus; double_cpus.main(2)'
5 loops, best of 3: 17.7 sec per loop

Now with adding 2 seconds of sleep as a dummy for I/0, the picture looks different. Using double as much processes gave a speed up of about 3 seconds compared to the default.


Solution 2:

If you task is I/O bound (such as waiting for a database, a network service), then making more threads than there are processors actually increases your throughput.

This is because while your thread is waiting on I/O the processor can actually do work on other threads.

If you have a CPU heavy task, then more processors will actually slow it down.


Post a Comment for "Using More Worker Processes Than There Are Cores"