from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
from urllib.request import urlopen
from time import perf_counter
def work(n):
with urlopen("https://www.google.com/#{n}") as f:
contents = f.read(32)
return contents
def run_pool(pool_type):
with pool_type() as pool:
start = perf_counter()
results = pool.map(work, numbers)
print ("Time:", perf_counter()-start)
print ([_ for _ in results])
if __name__ == '__main__':
numbers = [x for x in range(1,16)]
# Run the task using a thread pool
run_pool(ThreadPoolExecutor)
# Run the task using a process pool
run_pool(ProcessPoolExecutor)
How Python multiprocessing works
In the above example, the concurrent.futures
module provides high-level pool objects for running work in threads (ThreadPoolExecutor
) and processes (ProcessPoolExecutor
). Both pool types have the same API, so you can create functions that work interchangeably with both, as the example shows.
We use run_pool
to submit instances of the work
function to the different types of pools. By default, each pool instance uses a single thread or process per available CPU core. There’s a certain amount of overhead associated with creating pools, so don’t overdo it. If you’re going to be processing lots of jobs over a long period of time, create the pool first and don’t dispose of it until you’re done. With the Executor
objects, you can use a context manager to create and dispose of pools (with/as
).
pool.map()
is the function we use to subdivide the work. The pool.map()
function takes a function with a list of arguments to apply to each instance of the function, splits the work into chunks (you can specify the chunk size but the default is generally fine), and feeds each chunk to a worker thread or process.