The Python zip function is incredibly useful. It takes in an iterable (a list, a tuple, a dictionary etc..) and returns an iterator. Well, that’s as clear as mud, right? Let’s look at an example to clear things up.
Let’s imagine I have three lists like below. These lists are related to one another. Index 0 in each list are related. The UK’s capital is London and in this fictitious example, it has a population of 1212 (the tube would be much nicer if that were true!).
countries = ['UK', 'USA', 'France']
capitals = ['London', 'Washington DC','Paris']
population = [1212, 452353, 3523532]
Now, I want to be able to merge these lists, to give me something like the below. Where, i have a list of tuples. You can see then that we have grouped the relevant information about each country together into tuples within a list.
[('UK', 'London', 1212),
('USA', 'Washington DC', 452353),
('France', 'Paris', 3523532)]
This is pretty cool, because now we can loop over the list and process each tuple, knowing that each of the items within it are related. But, how did we do it? It’s as simple as the below – we use the zip function, pass in the 3 iterables that we want to zip & that’s it!
merged = zip(countries, capitals, population)
list(merged)
Something interesting you’ll need to know is, the zip takes the contents of the shortest list. So if we had the below, where I have added an additional country (Germany), but haven’t added a related city or population, that item will be excluded from the ouput list.
countries = ['UK', 'USA', 'France', 'Germany']
capitals = ['London', 'Washington DC','Paris']
population = [1212, 452353, 3523532]
We can overcome this by using zip_longest as below. Here, we get the output including Germany.
from itertools import zip_longest
countries = ['UK', 'USA', 'France', 'Germany']
capitals = ['London', 'Washington DC','Paris']
population = [1212, 452353, 3523532]
merged = zip_longest(countries, capitals, population)
list(merged)
The output is:
[('UK', 'London', 1212),
('USA', 'Washington DC', 452353),
('France', 'Paris', 3523532),
('Germany', None, None)]
You’ll see in the below that we have ‘None’ where there was no value. We can fill the values by altering the script very slightly:
from itertools import zip_longest
countries = ['UK', 'USA', 'France', 'Germany']
capitals = ['London', 'Washington DC','Paris']
population = [1212, 452353, 3523532]
merged = zip_longest(countries, capitals, population, fillvalue="555")
list(merged)
Now the output looks like this:
[('UK', 'London', 1212),
('USA', 'Washington DC', 452353),
('France', 'Paris', 3523532),
('Germany', '555', '555')]