When you first start looking into asynchronous processing in Python, you’ll come across a couple of terms: threading and multiprocessing. The first part of this article then, is about understanding what those two terms mean and when you should use one over the other.
What is threading?
Threading is a way to overcome performance issues related to blocking calls in Python. Let’s use two examples here:
- An application which processes users input
- An application that pings a website every X seconds and determines if it is available or not
In both of these examples, we have blocking calls. In the first example, we sit and wait for the user to enter the data and in the second, we wait for the website to respond to our ping.
Threading does not use multiple cores of the processor. Rather, it switches between processes for us, let’s work through an example. In the below, we are pinging two websites; we start by pinging Kodey.co.uk and while we wait for the response to that, we ping example.com. In this case, example.com responded quicker than kodey, so it was fully processed first.
|TIME||PING WWW.KODEY.CO.UK||PING WWW.EXAMPLE.COM|
Remember, threading is perfect for jobs with blocking calls, which will be network/data/io related:
- Pinging or scraping a web page takes time to receive response
- Running an OS command can take time to get a response
- Downloading data, can take time
For CPU intensive processes, there is no need to use threading, you should use multiprocessing instead.
Let’s look at some code. In the below, you will see I have defined a simple and a complex function. We’re using a thread here because we have a blocking call – while we wait for the user to enter their name. We don’t want to stop the complex_function running, just because we’re waiting on a user. So, we use threads – while we wait for the simple_function input, the complex function can start.
import time from threading import Thread def simple_function(): name = input('what is your name') print(name) def complex_function(): out =  for x in range(200000): x**2 out.append(x) print(out) thread1 = Thread(target=simple_function) thread2 = Thread(target=complex_function) thread1.start() thread2.start() #tells the main thread to wait for the two other threads to complete before exiting. thread1.join() thread2.join()
Below is a prettier way to write the same piece of code. Here we use the concurrent futures module. We creatr a pool of threads (in this case 2) and we submit our functions into the pool. We don’t need to use join() as we did above, because the with statement ensures that all threads finish before executing.
import time from concurrent.futures import ThreadPoolExecutor def simple_function(): print('hello') def complex_function(): out =  for x in range(200000): x**2 out.append(x) print(out) with ThreadPoolExecutor(max_workers=2) as pool: pool.submit(complex_function) pool.submit(simple_function)
Multiprocessing gives us true concurrency in Python. Our processes can run on more than one CPU core. As you can see below, I have created two processes both with the same target. That means, I want to run the same process on multiple cores. When I run the below, it takes only very slightly longer to run the complex_function twice than it does to run it once, because of course, it’s using two cores.
from multiprocessing import Process def simple_function(): print('hello') def complex_function(): out =  for x in range(100): x**2 out.append(x) print(out) process = Process(target=complex_function) process2 = Process(target=complex_function)
As with threading above, there are always neater ways to write the same code:
from concurrent.futures import ProcessPoolExecutor def simple_function(): print('hello') def complex_function(): out =  for x in range(100): x**2 out.append(x) print(out) #similar to before - much simpler code with ProcessPoolExecutor(max_workers=2) as pool: pool.submit(complex_function) pool.submit(complex_function)