Table of Contents¶
In which I reiterate how important it is to generate a proper appreciation for these essential tools.¶
If you have done any Python at all you've probably come across the concepts of iterator and generator, they sound a little intimidating but they're really very friendly. This post tries to clearly define these (and related) terms, and also show you how and when to use these guys. We'll start with iterator stuff, and end with generator stuff.
As always:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
Defining Iterators¶
The terms "iterator" and "iterable" sometimes get used quit recklessly, so first let's clear up the syntax. To summarize an excellent blog post by Vincent Driessen:
- a container is any object that holds stuff and can be asked if it contains a certain element (think
dict
,list
,set
). - an iterator is any object that remembers its state so it can produce the next value in a sequence when
next()
is called on it. - an iterable is any object that can produce an
iterator
wheniter()
is called on it (most containers are iterable).
Some examples are in order at this point.
We can ask the list
if it "contains" a certain element thus a list
is a container!
li = [1, 2, 3]
2 in li
We can make an iterator
object by calling iter()
on the list, thus the list is an iterable!
li_itr = iter(li)
type(li_itr)
We can call next()
on li_it
, thus we confirm that li_it
really is an iterator.
next(li_itr), next(li_itr)
Fun with Iterators¶
What makes an object an iterator
is having a defined behavior under the built-in next()
. Each call of next()
will return the next element of the iterator, starting with the first element and ending with raising an exception after all the elements have been pulled.
for
Loops¶
Think you've never used an iterator? Wrong, you've been using them like crazy. The for
syntax that lets us loop with ease through container objects is secretly using the iterator protocol. This:
li = [1, 2, 3]
for elem in li:
print(elem)
is actually implemented kinda like this:
itr = iter(li)
while True:
try:
print(next(itr))
except StopIteration:
break
In python, you can call iter()
on most container objects to get their equivalent iterator
. Some maybe less obvious examples are:
dict_itr = iter({'one':1, 'two':2})
next(dict_itr)
str_itr = iter("alohamora")
next(str_itr)
Making Your Own Iterators (SKIP IF YOU NEVER USE CLASSES)¶
We can make our own classes which are iterators simply by exposing methods __next__
and __iter__
. Calling next()
on these guys will cause Python to search the class for a __next__
method. Lets make a class that defines objects who iterate through the integers:
class int_iter:
def __init__(self, start_int):
self.curr_value = start_int
def __iter__(self):
return self
def __next__(self):
temp = self.curr_value
self.curr_value += 1 # Update this attribute to the next integer value
return temp # Return the previous "current value"
my_it = int_iter(10)
next(my_it), next(my_it)
Itertools Module¶
The itertools
module of the standard library implements lots useful iterators and iterator-centric functionality - I'll just highlight some of the most useful stuff here. I like to use the import name it
, but I don't know if that's a very common convention.
import itertools as it
Count indefinitely from some number by some increment with count()
¶
cnt = it.count(start=3.26, step=0.2)
next(cnt), next(cnt)
Cycle indefinitely through a finite list with cycle()
.¶
cyc = it.cycle(["a", "b"])
next(cyc), next(cyc), next(cyc)
Chain iterators together with chain()
.¶
itr1 = iter("ab")
itr2 = iter([1, 2])
itr_chain = it.chain(itr1, itr2)
next(itr_chain), next(itr_chain), next(itr_chain)
Slice an iterator with islice()
!¶
Normally iterators don't support the slice sytnax, but itertools islice
implements this!
cnt = it.count(start=0.5, step=1)
cnt_slc = it.islice(cnt, 3, 6)
for elem in cnt_slc:
print(elem)
You can also check out the full itertools docs, or this great post intro to itertools by Justin Duke
Defining Generators¶
Now that you know about iterators, you're in a position to understand generators. Sadly here there is an even more ambiguous usage of terms. Allow me to set the record straight:
- a generator iterator is an object that can temporarily freeze it's execution, but later pick up exactly where it left-off. It is a true python
iterator
! - a generator function is a function that returns a generator iterator object. It is a true python
function
! - a generator is apparently whatever the hell you want it to be. (It gets used to mean both of the above).
Generator functions are functions that return generator iterators, which are iterators. Use the term generator with reckless abandon to sew confusion and angst.
Write Generator Functions with yield
¶
In python you write a generator function just like a normal function, but you use the keyword yield
instead of return
. Each appearance of yield
is a place where the resulting generator iterator will freeze and then can later pick back up from that point. You don't explicitly return
anything from a generator function, instead calling it will implicitly return a generator iterator. Was that sufficiently confusing? Here's an example:
# Define a Generator Function
def gen_fun(val):
yield val
yield val*2
yield "wheee!"
# Nothing explicitly "returned", a Generator Iterator will be implicitly returned though.
# Call the Generator Function to get a Generator Iterator
gen_it = gen_fun(5)
I called my generator function and captured the generator iterator it returned. Here (I think) is why this topic is so confusing to people: For a normal function you look at the definition to see what it will do. But for a generator function you look at the definition to see what the returned generator iterator will do. The generator function itself doesn't do anything, except return generator iterators.
The behavior of my iterator gen_it
is defined by the yields inside the gen_fun
definition. At each yield
it will render up the yield
ed value and pause until next()
is called again. If we hit the end of the function block then the game is up and the call to next()
will raise StopIteration
. I like the way Jeff Knupp puts it: yield
rather than return
implies a temporary passing back of control with the expectation that the iterator will get to continue it's work at some point.
next(gen_it)
for elem in gen_it:
print(elem)
See, a generator iterator is just an iterator
, so all of the stuff you learned above applies.
Generator Functions Can Produce Very Flexible Iterators¶
Generator functions can create more powerful and flexible iterators than just calling iter()
on a container can. Here's something you couldn't ever get from iter(container)
.
def reverse(data):
'''Yield elements of a container in reverse.'''
idx= -1
while True:
yield data[idx] # Here it will pause until next() is called again
idx -= 1
gen_it = reverse("god")
next(gen_it), next(gen_it), next(gen_it)
A generator iterator acts just like a function whose complete state is preserved even while it's execution is suspended. You may know that when a typical function executes, all it's work happens on the stack (a special space in memory), and when that function returns or excepts all that space on the stack is reclaimed like the whole thing never happened. A generator iterator acts like a function that gets to stay on the stack until it's done yield
ing.
Generator Functions can Construct An Ensemble of Iterators¶
Generator functions act like constructors for generator iterator objects. We can pass different values to the constructor in order to get back slightly different generator iterators. For example, if we want an iterator that yields the first $n$ non-negative integers, and we want $n$ to be able to take any value.
def firstn(n):
'''Yield first n non-negative integers.'''
num = 0
while num < n:
yield num
num += 1
gen_it = firstn(3)
list(gen_it)
gen_it = firstn(6)
list(gen_it)
Generator Expressions Instead of List Comps¶
Ever made a list, iterated through it's elements once, and then never looked at it again? So now you know that a more memory efficient thing would be to make a generator iterator that creates and yields those elements one by one. Python knows you love the simplicity of list comprehension syntax, so Generator Expression syntax lets you build a generator iterator as easily as you would build a list with list comp. The syntax looks just like list comprehension except it has parentheses instead of brackets!
# As a list comp
li = [n for n in range(100)]
sum(li)
# More efficiently as a generator expression
gen_it = (n for n in range(100))
sum(gen_it)
Generator expressions are prefereable when you need the list only once as an intermediary (see here). The generator expression approach requires less continugous memory and is faster. As a neat aside: list comprehension is actually just a generator expression wrapped inside a list constructor!
%timeit sum(n for n in range(1000000)) # Here we can omit the parentheses for the gen. expr.
%timeit sum([n for n in range(1000000)])
There are still cases where list comprehension is preferable, say if you need any list methods like slicing, sorted
, reversed
etc. For instance with a real list you can do:
[x**2 for x in range(0, 10, 1)][2:4]
Of course if we make use of itertools
islice
function then we can get a generator that is a slice of another generator:
slice_it = it.islice((x**2 for x in range(0, 10, 1)), 2, 4)
next(slice_it), next(slice_it)
Practice Problems¶
Make an iterator from a list, and get the first value from that iterator.¶
li = [1, 2, 3]
li_it = iter(li)
firstval = next(li_it)
Calculate the sum of the first 100 integers that are divisible by 7. Don't build a list in memory. You can use range()
.¶
sum(x for x in range(200) if x%7 == 0)
Write a generator function that outputs "random walk" generator iterators with any given starting point. Specifically, each iterator should produce values that are a random walk on the integers starting from the specified start point and drawing a "step" from the uniform distribution on [-5, 5]. Use numpy.randint()
.¶
import numpy as np
def gen_fun_randwalk(start):
x = start
while True:
step = np.random.randint(low=-5, high=6, size=1)[0]
x += step
yield x
Construct a random walker generator that starts at 0. Then make a second generator that yields the first million values of that random walker. Then construct a list from said generator and plot it.¶
mywalk = gen_fun_randwalk(0) # Random walk generator that starts at 0
walk_slc = it.islice(mywalk, 0, 1000000) # Slice of the random walk generator that yields first 1 mill values
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(list(walk_slc))
Make a fresh generator slice as in the above, and from it make a second generator that yields only the positive numbers in those first million.¶
mywalk = gen_fun_randwalk(0) # Random walk generator that starts at 0
walk_slc = it.islice(mywalk, 0, 1000000) # Slice of the random walk generator that yields first 1 mill values
pos_walk = (x for x in walk_slc if x > 0)
Alternately use itertools:
mywalk = gen_fun_randwalk(0) # Random walk generator that starts at 0
walk_slc = it.islice(mywalk, 0, 1000000) # Slice of the random walk generator that yields first 1 mill values
pos_walk = it.filterfalse(lambda x: x<0, walk_slc)
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(list(pos_walk))
Make a generator that yields filenames that match a certain pattern.¶
See this awesome collection of sys admin python snippets. You should use the methods in the os
(walk
and join
) and fnmatch
(filter
) modules for this.
import os
import fnmatch
def gen_file_find(filepat,top):
for path, dirlist, filelist in os.walk(top):
for name in fnmatch.filter(filelist,filepat):
yield os.path.join(path,name)
Simulating a fair coin with an unfair one.¶
See Sahand Saba's post. Given an unfair coin which flips as heads with probability $0 < p < 1$ we can come up with an algorithm that flips a fair coin ($p=0.5$). Consider a pair of unfair coin flips, with the set of possible results being $\{HH,HT,TH,TT\}$. Both $HT$ and $TH$ have probability of exactly $p(1−p)$ which means we can map one to HEADS and one to TAILS to get a fair coin. Write a generator that generates fair flips, only using numpy.random.binomial(n=1, p=p)
with an unfair coin.
from numpy.random import binomial
def flip_fair(p):
outcomes = ["H", "T"]
while True:
x, y = binomial(n=1, p=p), binomial(n=1, p=p) # Flip two unfair coins
if x != y: # proceed only if we are in the HT or TH case
yield outcomes[x] # half the time x will be 0, half it will be 1
flips = it.islice(flip_fair(0.1), 0, 10)
list(flips)
Generator of Cartesian Coordiantes in $R^3$¶
I want a generator that yields the coordinates of points on a 3D grid (these will be 3-tuples). My $x$, $y$ and $z$ coordinates are each in the set $[0, 1, 2]$ (this is a small grid!). Take a look at the itertools official docs to see if it.product
can help with this. Bonus points if you can use it.tee
.
x = iter([0, 1, 2])
xyz = it.tee(x, 3)
xyz
pt_gen = it.product(*xyz)
next(pt_gen), next(pt_gen), next(pt_gen)