Iterators and generators are among my favorite programming tools—they're also some of the most powerful. These constructs enable us to write cleaner, more flexible and higher performance code; undoubtedly an invaluable addition to any programmer's toolbox. In addition, iterators and generators are an elegant means to work with large and potentially infinite data structures, coming in handy for data science. However, they can be some of the more perplexing concepts to grasp at first.
This article aims to deliver a gentle but in-depth introduction to iterators and generators in Python, although they're prevalent in other languages too. Nevertheless, in order to appreciate generators, we need to first have a good handle on iterators. And to understand iterators, we need to start with iterables.
Table of contents¶
- Unraveling the secrets of a for loop
- Putting theory into practice
- Constructing custom iterators
- Infinite iterators
- Why iterators are so powerful
1. Unraveling the secrets of a for loop¶
What are iterables?¶
Most programmers have worked with an iterable before even if they weren't consciously aware of doing so. An iterable is an object whose contents can be traversed or looped over. For example, when implementing a
for loop (
for item in obj: ...), any object that can take the place of
obj is an iterable. Many Python containers are examples of iterables, including
pandas DataFrames, and open file objects are iterables since we can loop over them. All iterables share a common thread: they implement the
__iter__() method, which we'll get to in the next section.
What are iterators?¶
An iterable provides the
for loop its contents but on its own isn't very useful; an iterable can't tell the loop how to traverse or iterate over its contents, or what the next item is. Therefore, upon initiating, a
for loop cleverly calls
iter(obj) behind-the-scenes, which in turn calls
obj doesn't support iteration or sport the
__iter__() method (meaning not an iterable), a
TypeError is raised. Otherwise,
obj.__iter__() hands the loop an iterator; an iterator is an object that knows how to perform the iteration and determines what the next item is. Logically, this means an iterable is any object when passed to
iter(), that's capable of producing an iterator.
How does the
for loop actually employ the iterator to perform the iteration and identify the next item? The answer is after retrieving an iterator, the loop secretly calls
next(iterator), which in turn calls
iterator.__next__()—don't worry, every iterator implements this method by definition.
__next__() holds instructions that determine the subsequent item, which is then handed to the loop. During each cycle, the loop secretly calls
next(iterator) again; the iterator "remembers" its state and rather than yielding the first item, it resumes where it left off.
Depending on the iterable, its iterator may have differing methodologies for traversing the contents and identifying the next item. Here are instructions for some common iterables:
tuple- loop over each element
dict- loop over each key (arbitrary order)
frozenset- loop over each element (arbitrary order)
str- loop over each "character"
- open file object - loop over each line
In a nutshell, the
for loop first passes an iterable to
iter() to produce an iterator and then in each cycle, implicitly calls
next(iterator) to determine the subsequent item—just one item per cycle. Ha, we now know the inner workings of a
2. Putting theory into practice¶
The best way to make sense of iterables and iterators is to experiment with them. Let's put what we've learned to the test and start by creating an iterable.
# My favorite list of numbers myIterable = [4, 8, 15, 16, 23, 42]
Now that we have an iterable, we need instructions for traversing its contents so let's produce an iterator from
# Produce an iterator from myIterable myIterator = iter(myIterable) # Confirm that we produced an iterator type(myIterator)
We could call
next(myIterator) and explore what happens, but before jumping in, what would happen if we used an iterator in a
for loop instead of an iterable? Will the loop raise a
TypeError when it calls
iter(myIterator)? Let's find out.
# Test myIterator in a for loop for item in myIterator: print(item, end=' ')
4 8 15 16 23 42
Surprise! It looks like everything went smoothly. The reason is because
myIterator, or any iterator produced via
iter(), implements its own
__iter__() method to satisfy the
for loop. In fact, calling
self. Simply put, if we produce an iterator from an iterable and then pass this iterator to
iter() (thus calling its
__iter__() method), we get back the same iterator. We can quickly verify this.
iter(myIterator) == myIterator
We've already mentioned that an iterable is any object when passed to
iter(), that's capable of producing an iterator. According to this definition,
myIterator must also be an iterable, which explains why it plays nicely with the
Traversing contents manually¶
Now let's continue where we left off and verify that calling
next(myIterator) yields the subsequent item one at a time.
# Yield the subsequent item next(myIterator)
--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-5-4b59040f9a4a> in <module>() 1 # Yield the subsequent item ----> 2 next(myIterator) StopIteration:
Oh no, what happened? Well, the
for loop from earlier had implicitly called
next(myIterator) over and over until the iterator yielded its last item—the loop exhausted the iterator. When we attempted to manually call
next(myIterator) afterwards, a
StopIteration exception was raised to tell us
myIterator has no more items left to yield. Unbeknownst to us, the loop also ran into the same
StopIteration exception after exhausting the iterator;
for loops gracefully handle and conceal this exception, and know when to stop calling
Once an iterator is exhausted, it's practically useless. However, we can "restock" the contents by recreating the iterator, enabling it to yield items again, starting with the first one.
# Produce another iterator from myIterable myIterator = iter(myIterable) # Yield the subsequent item next(myIterator)
myIterator has been "restocked" and
next() does its job. Let's keep going and confirm a
StopIteration exception is raised if the iterator is exhausted again.
--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-12-dc04b44d6581> in <module>() ----> 1 next(myIterator) StopIteration:
As expected, after again exhausting the iterator, we run into another
StopIteration exception when attempting to call
Revealing the true identity of a for loop¶
Gathering everything we've learned so far about how iterables and iterators work, we can construct a
while loop that emulates
for item in obj: expression.
# Produce an iterator assuming obj is an iterable obj_iter = iter(obj) while True: try: # Yield the subsequent item item = next(obj_iter) # Perform expression # End the loop if a StopIteration exception is raised except StopIteration: break
We've divulged the
for loop in Python is actually just a special case of a
while loop! We've also shown that an iterator yields the subsequent item only when passed to
next(), either manually or implicitly within a
for loop; otherwise, the iterator just...sits there. Pretty lazy, no? That's actually the correct terminology to describe this behavior: lazy evaluation. We can think of an iterator as an idle conveyor belt in a factory that churns out a single product when switched on, but then it automatically shuts off. Fortunately, the conveyor belt knows where it left off each time and never runs backward.
3. Constructing custom iterators¶
So far we know only one way to produce iterators: by passing an iterable to
iter(). However, we've determined the building blocks of an iterator include the
__next__() methods; we can use this information to construct a
class that produces our very own iterators! But before getting fancy, let's build one that produces iterators similar to one we're familiar with:
class TheNumbersMaker(): def __init__(self): # Instantiate the iterator using our favorite list of numbers self.contents = [4, 8, 15, 16, 23, 42] # An iterator needs a way to hold state and recall where it left off. # We can keep track of which index we're on using self.curr_index. self.curr_index = 0 def __iter__(self): # Recall that in order for an iterator to work with a for loop, the # loop must be able call this method. Technically, we don't need this # method if we only want to traverse the iterator manually, but why # restrict ourselves? return self def __next__(self): # This method provides the instructions needed to determine the # subsequent item: move down the contents of the list by one index and # update the state each time. Upon reaching the end, raise an exception # to indicate the iterator is exhausted. if self.curr_index < len(self.contents): curr_index = self.curr_index self.curr_index += 1 return self.contents[curr_index] else: raise StopIteration()
class is known as an iterator protocol—everything required to produce a custom iterator. Let's use it to produce a "clone" of
myIterator and test it out.
# Produce an iterator using our iterator protocol myIteratorClone = TheNumbersMaker() # Yield all items and exhaust the iterator for item in myIteratorClone: print(item, end=' ')
4 8 15 16 23 42
# Confirm that a StopIteration exception is raised when calling next() on an # exhausted iterator next(myIteratorClone)
--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-15-386b0a8c8ce3> in <module>() 1 # Confirm that an exception is raised when calling next() after the iterator is 2 # exhausted ----> 3 next(myIteratorClone) <ipython-input-13-9aafb5109040> in __next__(self) 25 return self.contents[curr_index] 26 else: ---> 27 raise StopIteration() StopIteration:
Our custom iterator works like a charm! Now that we know how to build our own iterator, we could make more complex ones by changing the iterator protocol. Perhaps we'd like to produce an iterator from a list of our choosing—we just need to add a
contents parameter to
__init__() and assign it to
self.contents. Or instead we'd want to yield every third item?
4. Infinite iterators¶
Every iterator we've produced thus far is finite—they can eventually be exhausted, either manually or using a
for loop. However, we can amend the iterator protocol to produce iterators that never stop yielding items. Let's investigate by constructing an iterator protocol that produces one of these uncanny-sounding iterators: one that yields Fibonacci numbers. The instructions for yielding the nth Fibonacci number are:
This translates to: starting with 0 and 1, the next Fibonacci number is the sum of the previous two; thus, leading to the sequence 0, 1, 1, 2, 3, 5, 8, 13, 21...
class FibNumMaker(): def __init__(self): # Instantiate the iterator with 0 and 1 (the first two Fibonacci # numbers). Since the next Fibonacci number depends on the previous two # numbers, we need to keep track of both the current and previous # states for this iterator, using self.curr and self.last, # respectively. self.last, self.curr = 0, 1 def __iter__(self): # Enables the iterator to be compatible with for loops return self def __next__(self): # Instructions for determining and yielding the subsequent Fibonacci # number number = self.curr self.curr += self.last self.last = number return number
Let's produce an iterator using this iterator protocol and explore what happens as we manually attempt to yield items.
# Produce an iterator that yields Fibonacci numbers fib_seq = FibNumMaker()
So far so good, but to test if
fib_seq will continue yielding items, let's ask it for the next 15 numbers via a
# Yield the next 15 Fibonacci numbers for item in range(15): print(next(fib_seq), end=' ')
8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765
Fantastic, we've built an iterator that successively yields Fibonacci numbers; since there are infinite Fibonacci numbers,
fib_seq is an infinite iterator. We can keep calling
next(fib_seq) and we'll never exhaust the iterator! However, there's a few things to keep in mind:
- Notice we didn't use
for item in obj: .... If we were to do so, the loop would implicitly call
next(fib_seq)each cycle; because
fib_seqcan't be exhausted, the loop would never end! As an alternative, we elected to use the
forloop to manually call
fib_seqis an infinite iterator because it's not derived from a finite iterable such as
myIteratorClone. In fact,
fib_seqisn't derived from any iterable; rather, each time we call
next(fib_seq), we simply compute a rolling sum using our two stored states (
self.last) and then update the states accordingly.
fib_seqcan't be exhausted, we didn't need a condition that raises a
StopIterationexception in the
__next__()method of the iterator protocol.
5. Why iterators are so powerful¶
Let's say we wanted the sum of any prime number smaller than a
maximum value. We could first create a
list of all integers up to
maximum, then use our favorite algorithm to keep only the prime numbers, and then pass
sum(). What if
maximum was 1 billion? The filtered
list would not only take a long time just to construct but require a needlessly large amount of memory when all we really care about is its sum. Instead, we could construct an iterator that successively yields prime numbers and compute a rolling sum; when the next prime number exceeds
maximum, we'd know to stop and return the current sum.
Not surprisingly, iterators can save a tremendous amount of memory and time when working very large sequences or datasets, or even infinite ones. In fact, iterators enable us to represent an infinite number of items with finite memory. Even with smaller sequences or datasets, using iterators can help us write more efficient code. Here are a few scenarios that come to mind where using an iterator may be advantageous:
- Performing computations on a stream of data
- Accessing certain items of a sequence without first storing the entire sequence
- Processing large files or datasets in manageable chunks (buffering)
- Performing computations on a sequence without knowing if all items in the sequence will be needed
- Building custom data samplers or random number generators
The biggest drawback to using iterators is the numerous lines of code required to implement a complete iterator protocol, even if the resulting iterator is rudimentary (like
myIteratorClone). Of course, Python wouldn't live up to its reputation if it didn't offer a few ways to simplify the process of producing iterators, conceivably down to a single, straightforward line of code! We'll learn about these shortcuts and some handy built-in iterators in an upcoming blog post.
If you'd like to play around with the code, here's the GitHub repository. As always, don't hesitate to leave your comments below.