Itertools - count()

Oct 21, 2024

Disclaimer: This post is part of a series where we dive into python's itertools built-in library and you don't have an specific order to read it. Each post is about one particular function and you can find all of then under the tag itertools

Count is an infinite generator, which means you can run it infinitely, in an optimized way. TLDR will show you how to use it. Later we dive deeper into details.

TLDR

It's like range, but infinite, and with floats

Since python 3.1, the itertools.count(start=0, step=1) function has two optional arguments that you can use: start and step. Both accept non-integer values.

from itertools import count
c1 = count() # standard counter
c2 = count(10) # start counter on 10
c3 = count (1.5, 0.5) # starts on 1.5, step 0.5

print(next(c1)) # 0
print(next(c1)) # 1
print(next(c1)) # 2

print(next(c2)) # 10
print(next(c2)) # 11
print(next(c2)) # 12

print(next(c3)) # 1.5
print(next(c3)) # 2.0
print(next(c3)) # 2.5

Shallow dive

Note: This is a dive into the user side of the function.
If you want to know how it goes under the hood, check the deep dive below

The documentation shows us an approximation of what the function do:

def count(start=0, step=1):
    n = start
    while True:
        yield n
        n += step

So, let's analyze this piece of code line by line. Starting with the function signature, we can see the two arguments we can pass. Start and step. But just by looking this you may think this should receive only integers as arguments. This is not the case. the count function does accept floats in both parameters. then we initialize our start position n, and do a loop forever.

But isn't this a problem? shouldn't this function have the return keyword? No, young padawan, the return is done by the yield keyword. In fact, this keyword is what makes the function behave like a generator. When the code executes the yield line, it will return the value, but keep waiting to keep going. When you call it a next time, it will keep going, and ended up at the top of the loop.

This also means the n += step will only be executed when you call the next number. Don't believe me? Let me modify the code and show you

def count(start=0, step=1):
    n = start
    while True:
        yield n
        print("wait a minute, this is not the first call")
        n += step

x = count()
next(x) # 0

next(x) # Prints the message, and then yields 1

simple (and fictional) use case

But what can we do with this function? Think of count like a building block that you use to compose more complex functions. As a fictional example, imagine you need to create the ids if items in a shopping cart.

Sure, you could do it like this (PLEASE DON'T DO IT LIKE THIS)

shopping_cart = []

def add_item(item_name:str):
   num_of_items = len(shopping_cart)
   shopping_cart.append({"item_id":num_of_items + 1, "name": item_name})

#then, call it like this
add_item("apple")
add_item("banana")

Can you spot the issue here? What would happened if You removed apple and added avocado? both banana and avocado would have the same id. And what if the Ids have to start from 1000? How can we solve it? count() can do it:

from itertools import count()
cart_ids = count(1000)
shopping_cart = []

def add_item(item_name:str):
   shopping_cart.append({"item_id":next(cart_ids), "name": item_name})

#then, call it like this
add_item("apple")
add_item("banana")

Now, even if you remove items from the cart, the Id will remain unique. And you can customize it as much as you want.

Deep dive

So far everything was too easy for you? Let's dive deeper into the function.

For this part, Let's assume we are using python 3.10 ( CPython ), since this is the most common flavor. You can check the source code by yourself

Since we are using C API, our code has to be static typed. But remember that the function allow more than ints? To solve this, the function has to modes: fast and slow.

Deciding in a mode

When the function is initialized, the decision on which mode happens based on some criteria:

Fast Mode:

cnt is an int (can be converted to integer internally)
- cnt is the internal counter. Initially set to the start argument
step is 1

If any of those constrains is broken, it will switch to slow mode

Slow Mode:

step may be zero
- code comments suggest this would make it a slower version of repeat()
cnt or step can be a float, Fraction or Decimal

Execution loop

All the manipulation in the function happen using PyObjects, preventing overflows. but to make things simpler, I will say things using the specified type inside the PyObject.

First there is a flag set for fast mode. If it is fast mode, we increment directly, using a Py_INCREF macro. Those macros are a standard way to perform those operations in C/Python API.

After the increment, we check if the counter got any errors. If yes, it is because the step is not one anymore, and move to slow mode. If not, it will clean the internal variables. Then, it will check if the result is still inside boundaries of a integer max size. If yes, it clears the long_cnt.
If not, it sets the cnt variable to the PY_SSIZE_T_MAX.

Skipping the assertion statements1, the last step is creating the new object that is returned to loop one more time.

the count object that is returned, and the one we manipulate in this entire operation is the following

typedef struct {
    PyObject_HEAD
    Py_ssize_t cnt;
    PyObject *long_cnt;
    PyObject *long_step;
} countobject;

for the brevity of this post.

Comments

Ready for more?