Inside CPython: How Python Is Built on C (Part 1)
This article explains what Python looks like under the hood: what happens when you run a Python program, and how the CPython interpreter processes your code.
The default Python you run on your machine is actually a C program called CPython. CPython takes your Python source code, compiles it into bytecode, and then executes that bytecode instruction by instruction in an interpreter loop. This bytecode is a Python‑specific instruction set that the CPython virtual machine understands, separate from the machine code your CPU executes.
That can already feel like a lot if you’ve never heard of bytecode, CPU instructions, or interpreters. So let’s zoom in on the first step: turning your .py file into something the interpreter can work with.
Here is the simple picture:

You can think of Python code as structured text that follows a well‑defined set of rules: keywords like for, while, if, and else, along with identifiers, numbers, operators, and so on. Before CPython can do anything useful, it has to break this text into meaningful pieces.
This first step is called lexing or tokenizing: the code is split into tokens such as keywords, names, operators, and literals. Each token type is defined in CPython’s tokenizer and token tables in the source code. Once we have tokens, the parser can look at their order and structure to understand the meaning of a line of code.
The parser’s job is to check that the tokens follow Python’s grammar and to build a structured representation of the program. In CPython 3, this parser is generated by a tool called pegen, which reads Python’s grammar and produces the C code for the parser. This grammar is essentially a formal description of Python’s syntax, similar to the way grammars are expressed in Backus–Naur Form (BNF) and its extended forms.
Once parsing is complete, CPython can build internal structures and eventually compile them into bytecode. When you run a Python file, you are really calling the CPython interpreter, which contains components such as the tokenizer, the parser (generated by pegen), the compiler, and the Python virtual machine. CPython’s role is to convert your Python code into bytecode and then execute that bytecode in the virtual machine. The virtual machine itself is written in C and runs a loop over the bytecode instructions, executing each one using C code and system calls.
This is a first high‑level look at Python internals. In future parts, we can dig deeper into how bytecode is stored, how objects are represented in memory, how reference counting works, and how CPython manages memory allocation behind the scenes.
Sources
- https://realpython.com/cpython-source-code-guide/
- https://manakjiri.cz/pythonoviny/2025/compiler-parser/
- https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
- https://github.com/python/cpython/blob/main/Parser/token.c
- https://github.com/python/cpython/tree/main/Tools/peg_generator/pegen
- https://github.com/daeken/Benjen/blob/master/daeken.com/entries/python-marshal-format.md
