Python: Generator

Generator is a special type of function that produces a series of values over time. That means we can use a generate and get a value, then stop, and later get the next value.

This is the process of lazy generation of sequence of values. Let’s understand the problem first, that generators are trying to solve, and why we need a lazy generation.

Problem

Say, we have a function that generates a bunch of square values from 1 to a given number. Check the function below-

def square_store(n):
    result = []

    for i in range(n):
        result.append((i + 1) ** 2)

    return result


print(square_store(100))
Python

Note that, this function creates a bunch of numbers, saves those in a list, and finally returns the list of values.

What happens when the provided value of “n” is a very large value?

All those numbers will be generated, and saved in a list before being returned. This will make the memory usage high.

NOTES

We do not even need these numbers altogether, most of the time.

We need those one by one.

Solution

The solution to this is to create a single value at a time, and return that. And avoid generating the full sequence entirely.

Step #1: using “yield

To create a generator we need “yield”. The “yield” keyword turns a function into a generator.

Advantages of using “yield”-

Check the code below-

def sample_numbers():
    yield 1
    yield 4
    yield 9
    yield 16
Python

“yield” will return a value, and pause the execution at that point. When called the next time, the execution starts from where it was paused.

So when called the first time this sample_numbers() function will return 1.

Call it again and it will return 4.

Call it the next time and 9 is returned.

This is the difference between “return” and “yield”-

return: returns the value and finishes the function execution.
yield: returns the value, and pauses the execution. Next time the execution begins from the paused position.

Generator as Iterator

Let’s see what the generator returns, check the code below-

from collections.abc import Iterable, Iterator

def sample_numbers():
    yield 1
    yield 4
    yield 9
    yield 16
    
    # return 20


sample_gen = sample_numbers()

print("sample_gen value: ", sample_gen)
print("sample_gen type: ", type(sample_gen))

print("is sample_gen an iterator? ",isinstance(sample_gen, Iterator))

print("sample_gen dir result: ", dir(sample_gen))
Python

Output:

After the code execution, we see the following-

We get a generator object, when we execute the generator function.
The type of the returned object is <class ‘generator’>
The returned object is an iterator.
sample_gen value:  <generator object sample_numbers at 0x7f585747a030>

sample_gen type:  <class 'generator'>

is sample_gen an iterator?  True

sample_gen dir result:  
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', 
'__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', 
'__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', 
'__lt__', '__name__', '__ne__', '__new__', '__next__', 
'__qualname__', '__reduce__', '__reduce_ex__', '__repr__', 
'__setattr__', '__sizeof__', '__str__', '__subclasshook__', 
'close', 'gi_code', 'gi_frame', 'gi_running', 'gi_yieldfrom', 
'send', 'throw']
Plaintext

You can see that the generator is an Iterator. And in the result of “dir”, we can find the following functions-

__iter__
__next__

So we can iterate over the result returned by the generator.

Let’s check the different ways to get the data yielded by the generator.

Usage #1: In a Loop

Use the generator as an iterator in a loop, and it will iterate through the generated value-

for number in sample_numbers():
    print(number)
Python

Output:

1
4
9
16
Plaintext

Usage #2: Using next()

As we get an iterator back from the generator, so the “next()” function can be used to get results one by one.

Each time we call the generator with “next()” we get one result back.

number_gen = sample_numbers()

print(next(number_gen))
print(next(number_gen))
print(next(number_gen))
print(next(number_gen))

# try to get beyond limit
# print(next(number_gen))

# throws StopIternation
Python

Output:

1
4
9
16
Plaintext

Usage #3: Get All as List

We can also use the list() constructor function to extract the full result from the generator and convert it to a list.

This is not an ideal use case for a generator, but in some cases, you might need to use this.

big_box_num_list = list(sample_numbers())

print(big_box_num_list)
Python

WARNING

The advantages of the generator are lost when we convert it to a list. As, the full result is generated at once, and converted to list.

Output:

[1, 4, 9, 16]
Plaintext

Usage #4: Generator Expression Unpacking

We can use generator expression unpacking in print() function to print the elements. Add asterisk(*) before the generator call.

The same generator expression can be used to unpack the values in other places, like a list, tuple, or function arguments.

print(*sample_numbers())

# Unpack Generated Values to List
numbers_list = [*sample_numbers()]
print(numbers_list)

# Unpack Generated Values to tuple
numbers_tuple = (*sample_numbers(),)
print(numbers_tuple)


def my_func(*args):
    print("Inside my_func: ", args)


my_func(*sample_numbers())
Python

Output:

1 4 9 16

[1, 4, 9, 16]

(1, 4, 9, 16)

Inside my_func:  (1, 4, 9, 16)
Plaintext

Step #2: Use “yield” in Loop

In the previous step, we had a bunch of “yield” statements, which were static.

Now let’s add the “yield” statement in loop. This way we can iterate and produce the yield as many times as we want.

Each time the code reaches to “yield”, it will return the data. And resume from there, when it is called the next time.

def sample_numbers():
    for i in range(5):
        yield i


for num in sample_numbers():
    print(num)
Python

Output:

0
1
2
3
4
Plaintext

Step #3: Pass Range as Param

After the next step, we can do some more modifications.

Add a param to the generator, and accept the range for the limit of the loop.

This is not directly related to generator construction, but most of the times the generator is used like this.

def sample_numbers(n):
    for i in range(n):
        yield i


sample_gen = sample_numbers(5)

for num in sample_gen:
    print(num)
Python

Output:

0
1
2
3
4
Plaintext

Generator Expression

There is a compact way of defining a generator. It uses the following syntax-

generator = (expression for item in iterable if condition)
Python

The above one-liner expression has the following criteria-

Full expression is wrapped in parentheses ().
The returned value is written in the beginning.
Then we have a “for” loop
At the end, there can be a “if” condition. the condition is optional.
No use of “yield” in the expression.

NOTES

This expression is similar to the list comprehensions. Just, in list compression, we use square brackets [], but in generator expression, we use parentheses ().

Example #1: Number Generator

Let’s use the generator expression to create a number generator. It works the same way as the generator created with “yield”.

sample_gen_ex = (x**2 for x in range(10))

print(sample_gen_ex)

print("Individual sample number:")

print(next(sample_gen_ex))
print(next(sample_gen_ex))
print(next(sample_gen_ex))

print("In the loop:")

for n in sample_gen_ex:
    print(n)
Python

Output:

<generator object <genexpr> at 0x7f1f12a502e0>

Individual samplem number:
0
1
4

In the loop:
9
16
25
36
49
64
81
Plaintext

Example #2: Number Generator with Condition

We can add a condition for the generated numbers. Just add a “if” statement at the end of the loop.

Here we are considering numbers from the loop, which are divisible by two(2).

sample_gen_ex = (x*x for x in range(10) if x%2 == 0)

print(sample_gen_ex)

print("Individual sample number:")

print(next(sample_gen_ex))
print(next(sample_gen_ex))
print(next(sample_gen_ex))

print("In the loop:")

for n in sample_gen_ex:
    print(n)
Python

Output:

<generator object <genexpr> at 0x7fc0d5e842e0>

Individual sample number:
0
4
16

In the loop:
36
64
Plaintext

Resetting a Generator

Each time we go to the next step, the generator moves to the next yield element. But in the following cases, the generator starts from the beginning.

If we initialize the generator, then it will start from the beginning. In the following example we are running a generator in the loop twice-

# Generator that generate numbers
# from 0 to 4
def sample_numbers():
    for i in range(5):
        yield i


# Use the generator
for num in sample_numbers():
    print(num)


# Use the generator again
for num in sample_numbers():
    print(num)
Python

Output:

0
1
2
3
4

0
1
2
3
4
Plaintext

What if we break the first loop before completing it?

the generator will start from the beginning, when we call it again. Check the example below.

# Generator that generate numbers
# from 0 to 4
def sample_numbers():
    for i in range(5):
        yield i


# Use the generator
for num in sample_numbers():
    print(num)

    # Break with some condition
    # before the loop is comple
    if num == 2:
        break


# Use the generator again
for num in sample_numbers():
    print(num)
Python

Output:

0
1
2

0
1
2
3
4
Plaintext

Check the following example. It will make the case more clear-

# Generator that generate numbers
# from 0 to 4
def sample_numbers():
    for i in range(5):
        yield i


# Generator initialized first time
gen = sample_numbers()

print(next(gen))
print(next(gen))
print(next(gen))

# Generator initialized again
gen = sample_numbers()

print(next(gen))
print(next(gen))
Python

Output:

0
1
2

0
1
Plaintext

Closing a Generator

Use the “close()” function on the generator to close it. We can not use the generator to generate data anymore.

NOTES

Once the generator is closed we can not restart or resume it.

If we try to get the next value then it throws an exception(StopIteration)

# Generator that generate numbers
# from 0 to 4
def sample_numbers():
    for i in range(5):
        yield i


# Generator initialized first time
gen = sample_numbers()

print(next(gen))
print(next(gen))

# Close the generator
gen.close()

print(next(gen))
Python

Output:

0
1
<generator object sample_numbers at 0x7f5b797da030>
Traceback (most recent call last):
  File "generator2.py", line 19, in <module>
    print(next(gen))
StopIteration
Plaintext

We can handle the case, by handling the exception, like below.

# Generator that generate numbers
# from 0 to 4
def sample_numbers():
    for i in range(5):
        yield i


# Generator initialized first time
gen = sample_numbers()

print(next(gen))
print(next(gen))

# Close the generator
gen.close()

try:
    print(next(gen))
except StopIteration:
    print("Generator is closed.")
Python

Output:

0
1
Generator is closed.
Plaintext

When “close()” is called, it raises “GeneratorExit”.

We can catch “GeneratorExit” specifically, or just use the finally block inside the generator, if some cleanup is required.

# Generator that generate numbers
# from 0 to 4
def sample_numbers():
    try:
        for i in range(5):
            yield i
    # except GeneratorExit:
    #     print("GeneratorExit: Cleaning up generator")
    finally:
        print("Generator closed, cleaning up.")


# Generator initialized first time
gen = sample_numbers()

print(next(gen))
print(next(gen))

# Close the generator
gen.close()
Python

Output:

0
1
Generator closed, cleaning up.
Plaintext

Send Data to “yield

We can send data to the generator for the next step in each call. Use the “send()” function to send data. send() resumes the generator call and sends data to the generator,

Assign the yield value to some variable in the generator.

WARNING

We can not send the value of the first call to the generator. If we use send() to send some value in the first call, it throws the following error-

TypeError: can’t send non-None value to a just-started generator.

We can use send(None) in the first call, if we want to. Or, use next to in the first call.

def sample_gen():
    received_val1 = yield 100

    print("1. Received in generator: ", received_val1)

    received_val2 = yield "Second Value"

    print("2. Received in generator: ", received_val2)

    received_val3 = yield "ABCDEF"

    print("3. Received in generator: ", received_val3)

    received_val4 = yield 999

    print("4. Received in generator: ", received_val4)


# Generator initialized
gen = sample_gen()

# First call
print("First Call-")
print("Outside generator: ", next(gen))

# Second call with value
print("Second Call-")
print("Outside generator: ", gen.send("S2"))

# Third call without value
print("Third Call-")
print("Outside generator: ", next(gen))

# Fourth call with value
print("Fourth Call-")
print("Outside generator: ", gen.send("S4"))
Python

Output:

First Call-
Outside generator:  100

Second Call-
1. Received in generator:  S2
Outside generator:  Second Value

Third Call-
2. Received in generator:  None
Outside generator:  ABCDEF

Fourth Call-
3. Received in generator:  S4
Outside generator:  999
Plaintext

Sub-Generator with “yield from

Let’s see how can use another generator inside a generator-

Sub-Generator in Loop [not recommended]

We can call the generator in a loop like below, and that will work as expected-

def sub_generator():
    yield 1
    yield 2
    
def main_generator():
    for value in sub_generator():
        yield value


for value in main_generator():
    print(value)
Python

Output:

1
2
Plaintext

Use “yield from” to Simplify

Instead of using a loop, we can use “yield from” and then call the generator.

The use of “yield from” will call the generator, and iterate over the items returned by the yield. So, works exactly like iterating in a loop.

def sub_generator():
    yield 1
    yield 2


def main_generator():
    yield from sub_generator()


for value in main_generator():
    print(value)
Python

Output:

1
2
Plaintext

yield from” Multiple Times

We can use “yield from” multiple times, to call multiple generators inside a generator.

In that case, the processing of the first “yield from” is done completely, then the next one is processed.

def sub_generator1():
    yield 1
    yield 2
    
def sub_generator2():
    yield 444
    yield 555
    yield 666
    yield 777


def main_generator():
    yield from sub_generator1() # First this is complete
    yield from sub_generator2() # This done after the previous one is complete


for value in main_generator():
    print(value)
Python

Output:

1
2

444
555
666
777
Plaintext

yield from” with Return Value

We can return from the sub generator and get that in the main generator. This is possible with “yield from”.

def sub_generator():
    yield 1
    yield 2

    return "sub_generator done"


def main_generator():
    result = yield from sub_generator()

    print(f"In main generator: {result}")


for value in main_generator():
    print(value)
Python

Output:

1
2

In main generator: sub_generator done
Plaintext

Iterate over Iteratables

If we have iterables on multiple levels, we can use the following format to iterate over each item one by one.

Here we have a list, tuple, set, inside a list. We are using “yield from” in the generator, and it will iterate over each iterable item inside the list.

def print_iter(*iterables):
    for i in iterables:
        yield from i


for item in print_iter([1, 2, 3, 4], ("abc", "def"), {"big", "box", "code"}):
    print(item)
Python

Output:

1
2
3
4

abc
def

box
code
big
Plaintext

Memory Usage Comparison

Let’s company the memory usage, for the same task for a generator and simple function without a generator.

The following scripts are used for comparison-

import tracemalloc

# Start memory tracking
tracemalloc.start()


def square_store(n):
    result = []

    for i in range(n):
        yield (i + 1) ** 2


ss = square_store(10_000_000)

for i in ss:
    print(i)


# Get memory usage
current, peak = tracemalloc.get_traced_memory()

print(f"Current memory usage: {current / 1024:.2f} KB")
print(f"Peak memory usage: {peak / 1024:.2f} KB")

# Stop memory tracking
tracemalloc.stop()
Python
import tracemalloc

# Start memory tracking
tracemalloc.start()


def square_store(n):
    result = []

    for i in range(n):
        result.append((i + 1) ** 2)

    return result


ss = square_store(10_000_000)

for i in ss:
    print(i)


# Get memory usage
current, peak = tracemalloc.get_traced_memory()

print(f"Current memory usage: {current / 1024:.2f} KB")
print(f"Peak memory usage: {peak / 1024:.2f} KB")

# Stop memory tracking
tracemalloc.stop()
Python

Here are the peak memory usage-

NOTES

The following data is taken on a single machine, and not represented as exact memory usage measurement.

This is represented as a comparison, see the pattern of memory usage measurement.

Dataset Size
(number/length)
Peak Memory(KB)
Without “yield
Peak Memory(KB)
with “yield
1004.271.25
1,00036.621.28
10,000357.251.33
100,0003,779.851.33
1,000,00039,373.351.33
10,000,000399,379.631.33

Here is the same data represented in a chart. You can see the spike in memory usage for a normal function, the generator stays very minimal memory usage.

Advantages

Here are the advantages of using a generator-

Stateful: generators remember the state of the execution, and revive from the last stage.
Lazy Evaluation: values are returned on demand, only when requested.
Iterator: generators are a type of iterator, so they can be used in a loop or with next().
Memory Efficient: generators are memory efficient, as they do not generate the full data entirely.
Improved Performance: values in a generator are created on the fly, so the overhead of the creation of an entire sequence is avoided.

Examples

Let’s take at few example, that we can use in real life-

Example #1: Number Generator

def infinite_number_gen(start=0):
    while True:
        yield start
        
        # For the next number
        start += 1


inf_gen = infinite_number_gen()

# First get 10 numbers in loop
for _ in range(10):
    print(next(inf_gen))

# Generate 2 more
print(next(inf_gen))
print(next(inf_gen))
Python

Output:

0
1
2
3
4
5
6
7
8
9
10
11
Plaintext

Example #2: Fibonacci Generator

def fibonacci(n):
    # Initial values
    n1, n2 = 0, 1

    # Run in range
    for _ in range(n):
        # Return one value
        yield n1

        # Set the values for next steps
        n1, n2 = n2, n1 + n2


# Generate 10 fibonacci numbers
for fib_num in fibonacci(10):
    print(fib_num)
Python

Output:

0
1
1
2
3
5
8
13
21
34
Plaintext

Example #3: Prime Number Generator

# Checker for prime
def is_prime(num):
    if num < 2:
        return False
    for i in range(2, int(num**0.5) + 1):
        if num % i == 0:
            return False
    return True


# Generator
def prime_num_gen(limit):
    for num in range(2, limit):
        if is_prime(num):
            yield num


# Demo usage
prime_nums = prime_num_gen(20)

print(list(prime_nums))
Python

Output:

[2, 3, 5, 7, 11, 13, 17, 19]
Plaintext

Example #4: File Reader

2025-01-05 10:15:32,451 - INFO - Starting the application
2025-01-05 10:15:32,459 - DEBUG - Loading configuration from config.json
2025-01-05 10:15:32,467 - INFO - Configuration loaded successfully
2025-01-05 10:15:32,482 - WARNING - No backup found. Proceeding without backup.
2025-01-05 10:15:32,495 - ERROR - Failed to connect to database: Connection timed out
2025-01-05 10:15:32,511 - INFO - Retrying connection to database
2025-01-05 10:15:32,532 - INFO - Connected to database successfully
2025-01-05 10:15:32,548 - DEBUG - Initializing cache
Plaintext
def read_file(path):
    with open(path) as file:
        for line in file:
            yield line

for line in read_file("big_box_log.txt"):
    print(line)
Python

Output:

2025-01-05 10:15:32,451 - INFO - Starting the application

2025-01-05 10:15:32,459 - DEBUG - Loading configuration from config.json

2025-01-05 10:15:32,467 - INFO - Configuration loaded successfully

2025-01-05 10:15:32,482 - WARNING - No backup found. Proceeding without backup.

2025-01-05 10:15:32,495 - ERROR - Failed to connect to database: Connection timed out

2025-01-05 10:15:32,511 - INFO - Retrying connection to database

2025-01-05 10:15:32,532 - INFO - Connected to database successfully

2025-01-05 10:15:32,548 - DEBUG - Initializing cache
Plaintext

Cognitive Clarifications

Here are some questions and clarifications about Python Generators.

Why do we need a lazy iterator like a Generator?

As the generator calculates and generates values on requests, so we can use this to avoid unnecessary computations.

Are Python Generators thread-safe?

No.
Generators are not thread-safe, so can not be shared across threads without proper synchronization.

Leave a Comment


The reCAPTCHA verification period has expired. Please reload the page.