Executing Parallel Tasks in Python with concurrent.futures

 Oct. 31, 2016     0 comments

Recently I discovered for myself the concurrent.futures Python module. It is the part of the standard library in Python 3 and for Python 2 it can be downloaded from PyPI. This module allows to run multiple time-consuming tasks in parallel in different threads or processes and then to collect the result of those tasks for future use.

However, in my opinion, the documentation and usage examples for concurrent.futures are over-complicated. Below is a very basic example of how to run a series of time-consuming tasks in parallel and collect the results of those tasks:

import concurrent.futures
import random
import time
from traceback import print_exc

random.seed()


def worker(i):
    """
    Some time-consuming task
    """
    time.sleep(1.0)
    return (i + random.randint(0, 11)) / random.randint(0, 11)


def get_results():
    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
        future_results = [executor.submit(worker, i) for i in range(8)]
        concurrent.futures.wait(future_results)
        for future in future_results:
            try:
                yield future.result()
            except:
                print_exc()


if __name__ == '__main__':
    start = time.time()
    for result in get_results():
        print(result)
    print('Time elapsed: {}s'.format(time.time() - start))

This example is pretty abstract. The worker function represents some long-running task and the get_results function allows to collect the results of multiple tasks run in parallel. The concurrent.futures.wait function blocks execution until all the tasks are completed and then the results are extracted from future objects using yield statement. In this example results are collected in the same order as we submitted our tasks to the executor. This can be useful if you need to process, for example, some list of raw data and get processed data in the same order as the raw data. If the order in which we receive results does not matter, we can replace concurrent.futures.wait function in line 20 with as_completed which allows to extract results as soon as individual tasks are completed, without preserving order:

for future in concurrent.futures.as_completed(future_results)
  try:
      yield future.result()
  except:
      print_exc()

  Python