Python Technical Guide: Let's Use Generators By Default
At PyCon 2015 I had the pleasure of seeing Brett Slatkin’s talk on writing effective Python functions (youtube link). During the presentation he touched on the topic of generators, and one of his statements appealed to me:
Instead of “why are you using generators?” we should be asking “why are you using lists?”
In my experience I find that I don’t use generators very often. Python makes working with lists so easy that I don’t typically spend much time thinking about them. Rather than waiting for the “perfect scenario” to use generators, I’m going to make an effort to use generators whenever i’m dealing with iterable objects.
Generators aren’t typically part of a core Python curriculum, but they’re a fairly simple concept, so let’s walk through an example.
Generators at a Glance
Here’s a simple range function (without the python range function):
def get_numbers(start, stop): x = start result = [] while x < stop: result.append(x) x += 1 return result
The above function would produce results like this:
for result in get_numbers(5,10): print result 5 6 7 8 9 10
Problem 1: What if we made “stop” a very large number and/or each element was actual a large data object?
Answer: This is a bad idea, eventually the array would get so big we’d run out of memory. While we may think our data is not a memory risk, the reality is we don’t know for sure how our data will scale over time.
Problem 2: What if we wanted this to run infinitely?
Answer: This is not possible, your loop will never get to the point where it can “return”.
Both of these problems are solvable with generators.
def get_numbers(start, stop=None): x = start while not stop or x <= stop: yield str(x)*x x +=1
You’ll notice the function above doesn’t have a return statement. It is replaced by the yield statement inside the loop, which is like a return statement for an individual object in the resulting iterable.
The new get_numbers function returns an iterable object, similar to the old example. The difference is that when iterating through the results, only one item is ‘yielded’ at a time. In other words, each value isn’t generated until you ask for it, and is discarded when we move to the next item. We can use the same for loop as the original example to get the exact same output:
for result in get_numbers(5,10): print result 5 6 7 8 9 10
The power of this may not be apparent in the example above, but let’s discuss how it solves Problem 1 and Problem 2 from above.
Problem 1: What if we made “stop” a very large number and/or each element was actual a large data object?
With the generator/yield style, we do not return a massive array if stop is a huge number. Instead, only one result is in scope at a time, effectively removing memory impact as a concern.
Problem 2: What if we wanted this to run infinitely?
While not truly infinite in this case (eventually x will reach the upper limit for an integer), yielding allows us to create endless results because we no longer have to “return” the entire data set. Open up a shell and run get_numbers(5) , you will see that the program runs until you kill it.
Generators provide quite a bit of flexibility when creating iterable objects, and they are not terribly complicated either. The next time you’re writing a Python application, consider getting into the habit of using generators for your iterable objects. This way you’re one step ahead for preventing memory issues in the future. As a bonus, you’ll already be comfortable with generators when you encounter a situation where they are required.