This is a study note of How Cairo Works. It covers both low-level and high-level language features of Cairo.

## 1 Introduction to Cairo

In Cairo, the basic data type `felt` (field element) is an integer in the range `0 <= x <P`, where `P` is a 256-bit prime number. All the computations are done modulo `P`. In Cairo, the divisioin `x / y` satisfies `(x / y) * y == x`, therefore `2 * (P + 1) / 2 = P + 1` which is `1 (mod P)`, `1 / 2 = (P + 1 ) / 2`. Additionally, multiply a number by `2` or `3` doesn’t always result in a number divisible by `2` or `3`.

The goal of a Cairo program is to prove that some computation is correct. A nondeterministic computation method is often used to first guess a value (the nondeterministic part), then verify other properites.

Cairo uses a read-only nondeterministic memory model. The value for each memory cell is chosen by the prover, but is immutable during the program execution. We use the syntax `[x]` to represent the value of the memory at address `x`. The memory can be thought as a `write-once` memory. The `assert [x] == y` can be interpreted either as `write value y to address x` or `verify that the value of address x is y`. They mean the same thing in read-only nondeterministic memory model. The memory addresses accessed by a program must be continuous. The memory offsets must be in the range of `[-2**15, 2**15)`.

The only values that may change over time are three register values:

• `ap`: the allocation pointer points to a yet-unused memory cell.
• `fp`: the frame pointer points to the frame of the current function. The addresses of all the function’s arguments and local variables are relative to the value of `fp`. When a function starts, `fp` is equal to `ap`. But unlike `ap`, the value of `fp` remains the same throughout the scope of a function.
• `pc`: the program counter points to the current instruction.

A simple Cairo instruction takes the form of an assertion for equality, so named assert-equal instruction. For example: `[ap] = [ap - 1] * [fp]; ap++` states that the production of two cells is written to the value of the next unused cell. `; ap++` is not a separate instruction, it is part of the assert-euqal instruction.

`ap`, `fp` and `pc` are pointers to memory cells. The values of memory cells are `[ap]`, `[fp]` and `[pc]`.

## 2 The Program Counter and Reference

### 2.1 The Program Counter

Each instruction takes 1 or 2 field elements. When an immediate value is used in the instruction, it takes 2 field elements. The program counter (pc) keeps the address of the current insturciton. Usually it advances by 1 or 2 – the size of the instruciton.

A `jmp` instruction may be used to jump to a different instruction in three flavors:

• `jmp abs 17`: `pc = 17`.
• `jmp rel 17`: `pc = pc + 17`
• `jmp label`: a relative jump.

Both `jmp rel` and `jmp lable` can be used with an `if` condition for conditonal jump.

### 2.2 Reference

The `let x = <expr>` syntax defines a reference to the `expr`’s value and substitue `x` accordingly when it is used later. A reference can hold any expression and may depende on `ap` or `fp`. For example: `let x =[[fp + 3] + 1]`.

The `assert <compound-expr> = <compound-expr>` asserts the equality between the values of two compound expressions. The Cairo compiles it into multiple instructions and `ap` may advance an unknown number of steps. Hence, you should avoid using `ap` and `fp` in compound expression and use the referece, temporary, or local variable. For example, `let x = [ap - 1]` and `let y = [ap - 2]`, then `assert x * x = x + 5 * y`. Here the `ap` chanes dynamically during the execution of the compound expression.

References are not variables: the scope of each definition is defined according to a static analysis of the order in which the instructions will be executed. It will follow a basic flow from jumps and conditional jumps, but if there are colliding definitions for the same reference, the reference will be revoked.

If there is a label or a call instruction between the definition of a reference that depends on `ap` and its usage, the reference may be revoked, since the compiler may not be able to compute the change of `ap` for jump and call. Sometime the compiler will not automatically detect a jump and the reference will not be revoked. Using the reference in such cases may result in an undefined behavior. A reference depends on `fp` is never revoked.

The syntax `ref_name.member_name`, where `ref_name` is a typed reference (`T*`) with value `val` and type `T`, and `T.member_name` is a member definition, compiles to `[ref_name + T.member_name]`. The `member_name` is the memory offset. The result is `[val + offset]`. You may omit the type when define a typed reference like: `let ptr = cast([fp], MyStruct*)` – assume `[fp]` contains a point to a struct.

A temporary variable is a syntactic sugar for a reference based on the `ap` register. For a simple expression, `tempvar var_name = <expr>` is equivalent to `[ap] = <expr>; ap++` and `let var_name = [ap - 1]`. A temporary variable is a reference that may be revoked. In practice, `let` is used in pointer operations and function call; `tempvar` is used for value expression.

Local variables are based on the `fp` register. The first local variable will be a reference to `[fp + 0]`, the second one to `[fp + 1]` and so on. The Cairo compiler auto-generates a constant `SIZEOF_LOCALS` with is equal to the accumulated size of locals within the same scope. The instruction `alloc_locals` is transformed to `ap += SIZEOF_LOCALS`.

Note that unless the local variable is defined and initialized in the same line, the `local` directive itself does not translate to a Cairo instruction (this is another difference from `tempvar`) – it simply translates to a reference definition. This is one of the reasons you must increase the value of `ap` manually. The type of a local variable must be explicitly stated (otherwise, `felt` is used), and it is not deduced from the type of the initialization value.

You can specify a typed local variable in two different ways: `local x : T* = <expr>` and `local y : T = <expr>`. The first one allocates one cell whose value is a pointer to a struct of type `T`, `x.a` is `[[fp + 0] + T.a]`. The second one allocates `T.SIZE` cells. `y` refers to the address of the structure `fp + 1` rather than `[fp + 1]`. `y.a` is `[fp + 1 + T.a]`. Use `y` is an error. For example, `tempvar z = y` will faile, since it compiles to `[ap] = fp`, which is not a valid instruction in Cairo. Nevertheless, defining a variable called `__fp__` will allow the code to compile.

Tuples are represented as a comma-separated list of elements enclosed in parentheses. For example: `(3, x)`. Cairo requires a trailing comma for single-element tuples, to distinguish them from regular parentheses. For example `(5,)` is a single-element tuple. Tuple elements are accessed with the tuple expression followed by brackets containing a zero-based index to the element. The index must be known at compile time.

In order to represent an array (an ordered collection of homogeneous elements) in Cairo, one may use a pointer to the beginning of the array. The standard library function `alloc()` may be used to “dynamically” allocate a new array. For example, `let (local struct_array : MyStruct*) = alloc()` allocates a new memory segment and treats it as a pointer to `MyStruct`. The expression `struct_array[n]` is used to access the `n-th` element of the array, where `n=0` is the first element.`struct_array[index]` is compiled to `[struct_array + index * MyStruct.SIZE]`, and is of type `MyStruct`.

## 3 Functions

### 3.1 Function Execution

A function is a reusable unit of code that receives arguments and returns a value. Cairo provides two low-level instructions `call addr` and `ret` to facilitate resuable code. The high-level syntax are `function_name()` and `return ()`. The `call` is similar to `jmp label` where a function is considered a label.

When a function starts, the `fp` is initialized to the current value of `ap`. When a function calls another function, the `fp` changes for the inner call and restores when the inner call returns. Under the hood, `call addr` works as the following:

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 `````` ``````# call addr pushes (fp, pc) and jumps # 1) Stores the value of the current fp, which will be restored once the called function ends using ret. # 2) Stores the address of the next instruction, to run once the called function ends. This will be assigned to pc when ret is invoked. # 3) Increase ap by 2, to account for the last two writes. # 4) Updates fp to be the new ap, so it points to the start of the new frame within the called function's scope. # 5) Jump to the function label [ap] <-- fp [ap + 1] <-- return_pc ap += 2 fp <-- ap jmp addr # ret jumps to stored pc and resumes fp. # 1) Jumps to return_pc (stored on the stack). # 2) Restores the value of the previous fp. jmp [fp - 1] fp <-- [fp - 2] ``````

### 3.2 Access Register Values

Cairo provides two functions to retrieve the values of the three registers.

 ``````1 2 3 4 5 6 7 8 9 `````` ``````from starkware.cairo.common.registers import get_ap from starkware.cairo.common.registers import get_fp_and_pc let get_ap_res = get_ap() tempvar my_ap = get_ap_res.ap_val let fp_and_pc = get_fp_and_pc() tempvar my_fp = fp_and_pc.fp_val tempvar my_pc = fp_and_pc.pc_val ``````

When Cairo needs to use the address `fp` in a compund expression, it will try to replace `fp` with a variable named `__fp__` that contains the value of `fp`. For example, in the following code, line B requires line A while line C doesn’t.

 ``````1 2 3 `````` ``````local __fp__ = fp_and_pc.fp_val # A. tempvar x = fp # B. tempvar y = [fp] # C. ``````

### 3.3 Function Arguments and Return Values

Function arguments are written to the stack before the `call` instruction. Because a `call` pushes the next `pc` and current `fp` into the stack, the function arguments are available at `[fp - 3]`, `[fp - 4]`, and so on in reverse order. For each argument, the Cairo compiler creates a reference `argname` to its value and a constant `Args.argname` with its offset `(0, 1, 2, ...)`. The use of `argname` is replaced by `[fp - (2 + n_args) + Args.argname]`. Cairo supports the named argment call (a syntactic sugar) that can use compound expression: `foo(x=4, y=5)`.

The function writes to the stack its return values just before the `ret` instruction. The retun values will be available to caller at `[ap - 1]`, `[ap - 2]` and so on. The Cairo compiler creates constants named `foo.Return.z` and `foo.Return.w` for the return values.

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 `````` ``````# function definition func foo(x, y) -> (z, w): [ap] = x + y; ap++ # z. [ap] = x * y; ap++ # w. ret end # call the funciton foo(4, 5) [ap] = 4; ap++ # x. [ap] = 5; ap++ # y. call foo # high-level call foo(x=4, y=5) and access the result foo(x=4, y=5) [ap] = [ap - 1] + [ap - 2]; ap++ # Compute z + w. # define a type reference to return value let foo_ret = foo(x=4, y=5) [ap] = foo_ret.z + foo_ret.w; ap++ # syntactic sugar to assign return values to a tuple let (z, w) = foo(x=4, y=5) [ap] = z + 2; app++ # call with named arguments and check the number of argument let args = cast(ap, foo.Args*) args.x = 4; ap++ args.y = 5; ap++ # Check that ap was advanced the correct number of times # (this will ensure arguments were not forgotten). static_assert args + foo.Args.SIZE == ap let foo_ret = call foo ``````

## 4 Hints

A hint is a block of Python code that will be executed by the prover right before the next instruction.

To access Cairo data, use `memory[ap]` to access `ap` value, `ids.x` for constant, reference and function arguments. A hint is attached to the next instruction and executed before the next instruction.

## 5 Program Input & Output

A Cairo program may have a secret program input and a public program output.

You can specify a program input file using `--program_input=input_filename.json` flag. Then you can use hints to access the input file content using the variable `program_input['key_name']`.

To create a program output, you first need to add `%builtins output` directive. The directive makes the funciton `main()` get one argument and return one value. The argument, conventionally called `output_ptr`, is a pointer to a block of memory to which it may write its outputs. `main()` should return the value of the pointer after writing, signifying where the chunk of output memory ends.

The program uses a different layout such as `--layout=small` that requires the number of steps to be divisible by `512`. You can set it via `--steps=512`.

## 6 Segments

Cairo programs are themselves kept in memory, in what is called the program segment. This segment is of fixed length and contains the numeric representation of the Cairo program. The program counter `pc` starts at the beginning of the program segment.

A Cairo program also requires an execution segment. This is where the register `ap` and `fp` start, and where data generated during the run of a Cairo program (variables, return addresses for function calls, etc.) is stored. The length of the execution segment is variable as it may depend on the program input.

Every builtin has its own variable-length continuous area in memory that is located in its own segment.

## 7 Builtins and Implicit Arguments

Builtins are predefined optimized low-level execution units which are added to the Cairo CPU board to perform predefined computations which are expensive to perform in vanilla Cairo (e.g., range-checks, Pedersen hash, ECDSA, …).

The CPU communicates with builtins thourgh memeory: each builtin has a segment and applies constraints on the memory cells in the segment.

Implicit argument is a syntactic sugar that adds arguments and return values to the funciton When you use the high-level `return` statement, you don’t have to explicitly return the implicit argument. The Cairo compiler returns the current binding of the implicit argument.

There are three ways to call a function with implicit argments:

• Explicitly `function_name{x=y}()`: `y` is bind to `x`. After the call, `y` has the return value of `x`.
• Implicitly using an implicit argument with the same name in caller.
• Implicitly using a `with` statement in caller.

Because an implicit argument is implemented as a reference, it might be revoked. The solution is to use a local variable to keep the implicit argument. When there is no branches, `alloc_locals` can add the local variables automatically. Adding `tempvar` at the end of every branch also works.

Cairo supports a few possible layouts. Each layout specifies which of the different builtins exist and how many instances of that builtin can be used. This is measured as the ratio between the number of instructions (`n_steps`) and the number of available builtin instances. The default `plain` layout has no builtins.

The `%builtins` directive specifies which builtins are used by the program. Each builtin adds an argument to main() and requires a return value.