Python threading and subprocesses explained


from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

from urllib.request import urlopen
from time import perf_counter

def work(n):    
    with urlopen("https://www.google.com/#{n}") as f:
        contents = f.read(32)
    return contents

def run_pool(pool_type):
    with pool_type() as pool:
        start = perf_counter()
        results = pool.map(work, numbers)
    print ("Time:", perf_counter()-start)
    print ([_ for _ in results])    

if __name__ == '__main__':
    numbers = [x for x in range(1,16)]
    
    # Run the task using a thread pool
    run_pool(ThreadPoolExecutor)
    
    # Run the task using a process pool
    run_pool(ProcessPoolExecutor)

How Python multiprocessing works

In the above example, the concurrent.futures module provides high-level pool objects for running work in threads (ThreadPoolExecutor) and processes (ProcessPoolExecutor). Both pool types have the same API, so you can create functions that work interchangeably with both, as the example shows.

We use run_pool to submit instances of the work function to the different types of pools. By default, each pool instance uses a single thread or process per available CPU core. There’s a certain amount of overhead associated with creating pools, so don’t overdo it. If you’re going to be processing lots of jobs over a long period of time, create the pool first and don’t dispose of it until you’re done. With the Executor objects, you can use a context manager to create and dispose of pools (with/as).

pool.map() is the function we use to subdivide the work. The pool.map() function takes a function with a list of arguments to apply to each instance of the function, splits the work into chunks (you can specify the chunk size but the default is generally fine), and feeds each chunk to a worker thread or process.