This is a study note of How Cairo Works. It covers both low-level and high-level language features of Cairo.

1 Introduction to Cairo

In Cairo, the basic data type felt (field element) is an integer in the range 0 <= x <P, where P is a 256-bit prime number. All the computations are done modulo P. In Cairo, the divisioin x / y satisfies (x / y) * y == x, therefore 2 * (P + 1) / 2 = P + 1 which is 1 (mod P), 1 / 2 = (P + 1 ) / 2. Additionally, multiply a number by 2 or 3 doesn’t always result in a number divisible by 2 or 3.

The goal of a Cairo program is to prove that some computation is correct. A nondeterministic computation method is often used to first guess a value (the nondeterministic part), then verify other properites.

Cairo uses a read-only nondeterministic memory model. The value for each memory cell is chosen by the prover, but is immutable during the program execution. We use the syntax [x] to represent the value of the memory at address x. The memory can be thought as a write-once memory. The assert [x] == y can be interpreted either as write value y to address x or verify that the value of address x is y. They mean the same thing in read-only nondeterministic memory model. The memory addresses accessed by a program must be continuous. The memory offsets must be in the range of [-2**15, 2**15).

The only values that may change over time are three register values:

  • ap: the allocation pointer points to a yet-unused memory cell.
  • fp: the frame pointer points to the frame of the current function. The addresses of all the function’s arguments and local variables are relative to the value of fp. When a function starts, fp is equal to ap. But unlike ap, the value of fp remains the same throughout the scope of a function.
  • pc: the program counter points to the current instruction.

A simple Cairo instruction takes the form of an assertion for equality, so named assert-equal instruction. For example: [ap] = [ap - 1] * [fp]; ap++ states that the production of two cells is written to the value of the next unused cell. ; ap++ is not a separate instruction, it is part of the assert-euqal instruction.

ap, fp and pc are pointers to memory cells. The values of memory cells are [ap], [fp] and [pc].

2 The Program Counter and Reference

2.1 The Program Counter

Each instruction takes 1 or 2 field elements. When an immediate value is used in the instruction, it takes 2 field elements. The program counter (pc) keeps the address of the current insturciton. Usually it advances by 1 or 2 – the size of the instruciton.

A jmp instruction may be used to jump to a different instruction in three flavors:

  • jmp abs 17: pc = 17.
  • jmp rel 17: pc = pc + 17
  • jmp label: a relative jump.

Both jmp rel and jmp lable can be used with an if condition for conditonal jump.

2.2 Reference

The let x = <expr> syntax defines a reference to the expr’s value and substitue x accordingly when it is used later. A reference can hold any expression and may depende on ap or fp. For example: let x =[[fp + 3] + 1].

The assert <compound-expr> = <compound-expr> asserts the equality between the values of two compound expressions. The Cairo compiles it into multiple instructions and ap may advance an unknown number of steps. Hence, you should avoid using ap and fp in compound expression and use the referece, temporary, or local variable. For example, let x = [ap - 1] and let y = [ap - 2], then assert x * x = x + 5 * y. Here the ap chanes dynamically during the execution of the compound expression.

References are not variables: the scope of each definition is defined according to a static analysis of the order in which the instructions will be executed. It will follow a basic flow from jumps and conditional jumps, but if there are colliding definitions for the same reference, the reference will be revoked.

If there is a label or a call instruction between the definition of a reference that depends on ap and its usage, the reference may be revoked, since the compiler may not be able to compute the change of ap for jump and call. Sometime the compiler will not automatically detect a jump and the reference will not be revoked. Using the reference in such cases may result in an undefined behavior. A reference depends on fp is never revoked.

The syntax ref_name.member_name, where ref_name is a typed reference (T*) with value val and type T, and T.member_name is a member definition, compiles to [ref_name + T.member_name]. The member_name is the memory offset. The result is [val + offset]. You may omit the type when define a typed reference like: let ptr = cast([fp], MyStruct*) – assume [fp] contains a point to a struct.

A temporary variable is a syntactic sugar for a reference based on the ap register. For a simple expression, tempvar var_name = <expr> is equivalent to [ap] = <expr>; ap++ and let var_name = [ap - 1]. A temporary variable is a reference that may be revoked. In practice, let is used in pointer operations and function call; tempvar is used for value expression.

Local variables are based on the fp register. The first local variable will be a reference to [fp + 0], the second one to [fp + 1] and so on. The Cairo compiler auto-generates a constant SIZEOF_LOCALS with is equal to the accumulated size of locals within the same scope. The instruction alloc_locals is transformed to ap += SIZEOF_LOCALS.

Note that unless the local variable is defined and initialized in the same line, the local directive itself does not translate to a Cairo instruction (this is another difference from tempvar) – it simply translates to a reference definition. This is one of the reasons you must increase the value of ap manually. The type of a local variable must be explicitly stated (otherwise, felt is used), and it is not deduced from the type of the initialization value.

You can specify a typed local variable in two different ways: local x : T* = <expr> and local y : T = <expr>. The first one allocates one cell whose value is a pointer to a struct of type T, x.a is [[fp + 0] + T.a]. The second one allocates T.SIZE cells. y refers to the address of the structure fp + 1 rather than [fp + 1]. y.a is [fp + 1 + T.a]. Use y is an error. For example, tempvar z = y will faile, since it compiles to [ap] = fp, which is not a valid instruction in Cairo. Nevertheless, defining a variable called __fp__ will allow the code to compile.

Tuples are represented as a comma-separated list of elements enclosed in parentheses. For example: (3, x). Cairo requires a trailing comma for single-element tuples, to distinguish them from regular parentheses. For example (5,) is a single-element tuple. Tuple elements are accessed with the tuple expression followed by brackets containing a zero-based index to the element. The index must be known at compile time.

In order to represent an array (an ordered collection of homogeneous elements) in Cairo, one may use a pointer to the beginning of the array. The standard library function alloc() may be used to “dynamically” allocate a new array. For example, let (local struct_array : MyStruct*) = alloc() allocates a new memory segment and treats it as a pointer to MyStruct. The expression struct_array[n] is used to access the n-th element of the array, where n=0 is the first element.struct_array[index] is compiled to [struct_array + index * MyStruct.SIZE], and is of type MyStruct.

3 Functions

3.1 Function Execution

A function is a reusable unit of code that receives arguments and returns a value. Cairo provides two low-level instructions call addr and ret to facilitate resuable code. The high-level syntax are function_name() and return (). The call is similar to jmp label where a function is considered a label.

When a function starts, the fp is initialized to the current value of ap. When a function calls another function, the fp changes for the inner call and restores when the inner call returns. Under the hood, call addr works as the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# call addr pushes (fp, pc) and jumps
# 1) Stores the value of the current fp, which will be restored once the called function ends using ret.
# 2) Stores the address of the next instruction, to run once the called function ends. This will be assigned to pc when ret is invoked.
# 3) Increase ap by 2, to account for the last two writes.
# 4) Updates fp to be the new ap, so it points to the start of the new frame within the called function's scope.
# 5) Jump to the function label
[ap] <-- fp
[ap + 1] <-- return_pc
ap += 2
fp <-- ap
jmp addr

# ret jumps to stored pc and resumes fp.
# 1) Jumps to return_pc (stored on the stack).
# 2) Restores the value of the previous fp.
jmp [fp - 1]
fp <-- [fp - 2]

3.2 Access Register Values

Cairo provides two functions to retrieve the values of the three registers.

1
2
3
4
5
6
7
8
9
from starkware.cairo.common.registers import get_ap
from starkware.cairo.common.registers import get_fp_and_pc

let get_ap_res = get_ap()
tempvar my_ap = get_ap_res.ap_val

let fp_and_pc = get_fp_and_pc()
tempvar my_fp = fp_and_pc.fp_val
tempvar my_pc = fp_and_pc.pc_val

When Cairo needs to use the address fp in a compund expression, it will try to replace fp with a variable named __fp__ that contains the value of fp. For example, in the following code, line B requires line A while line C doesn’t.

1
2
3
local __fp__ = fp_and_pc.fp_val  # A.
tempvar x = fp  # B.
tempvar y = [fp]  # C.

3.3 Function Arguments and Return Values

Function arguments are written to the stack before the call instruction. Because a call pushes the next pc and current fp into the stack, the function arguments are available at [fp - 3], [fp - 4], and so on in reverse order. For each argument, the Cairo compiler creates a reference argname to its value and a constant Args.argname with its offset (0, 1, 2, ...). The use of argname is replaced by [fp - (2 + n_args) + Args.argname]. Cairo supports the named argment call (a syntactic sugar) that can use compound expression: foo(x=4, y=5).

The function writes to the stack its return values just before the ret instruction. The retun values will be available to caller at [ap - 1], [ap - 2] and so on. The Cairo compiler creates constants named foo.Return.z and foo.Return.w for the return values.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# function definition
func foo(x, y) -> (z, w):
    [ap] = x + y; ap++  # z.
    [ap] = x * y; ap++  # w.
    ret
end

# call the funciton foo(4, 5)
[ap] = 4; ap++  # x.
[ap] = 5; ap++  # y.
call foo

# high-level call foo(x=4, y=5) and access the result
foo(x=4, y=5)
[ap] = [ap - 1] + [ap - 2]; ap++  # Compute z + w.

# define a type reference to return value
let foo_ret = foo(x=4, y=5)
[ap] = foo_ret.z + foo_ret.w; ap++

# syntactic sugar to assign return values to a tuple
let (z, w) = foo(x=4, y=5)
[ap] = z + 2; app++

# call with named arguments and check the number of argument
let args = cast(ap, foo.Args*)
args.x = 4; ap++
args.y = 5; ap++
# Check that ap was advanced the correct number of times
# (this will ensure arguments were not forgotten).
static_assert args + foo.Args.SIZE == ap
let foo_ret = call foo

4 Hints

A hint is a block of Python code that will be executed by the prover right before the next instruction.

To access Cairo data, use memory[ap] to access ap value, ids.x for constant, reference and function arguments. A hint is attached to the next instruction and executed before the next instruction.

5 Program Input & Output

A Cairo program may have a secret program input and a public program output.

You can specify a program input file using --program_input=input_filename.json flag. Then you can use hints to access the input file content using the variable program_input['key_name'].

To create a program output, you first need to add %builtins output directive. The directive makes the funciton main() get one argument and return one value. The argument, conventionally called output_ptr, is a pointer to a block of memory to which it may write its outputs. main() should return the value of the pointer after writing, signifying where the chunk of output memory ends.

The program uses a different layout such as --layout=small that requires the number of steps to be divisible by 512. You can set it via --steps=512.

6 Segments

Cairo programs are themselves kept in memory, in what is called the program segment. This segment is of fixed length and contains the numeric representation of the Cairo program. The program counter pc starts at the beginning of the program segment.

A Cairo program also requires an execution segment. This is where the register ap and fp start, and where data generated during the run of a Cairo program (variables, return addresses for function calls, etc.) is stored. The length of the execution segment is variable as it may depend on the program input.

Every builtin has its own variable-length continuous area in memory that is located in its own segment.

7 Builtins and Implicit Arguments

Builtins are predefined optimized low-level execution units which are added to the Cairo CPU board to perform predefined computations which are expensive to perform in vanilla Cairo (e.g., range-checks, Pedersen hash, ECDSA, …).

The CPU communicates with builtins thourgh memeory: each builtin has a segment and applies constraints on the memory cells in the segment.

Implicit argument is a syntactic sugar that adds arguments and return values to the funciton When you use the high-level return statement, you don’t have to explicitly return the implicit argument. The Cairo compiler returns the current binding of the implicit argument.

There are three ways to call a function with implicit argments:

  • Explicitly function_name{x=y}(): y is bind to x. After the call, y has the return value of x.
  • Implicitly using an implicit argument with the same name in caller.
  • Implicitly using a with statement in caller.

Because an implicit argument is implemented as a reference, it might be revoked. The solution is to use a local variable to keep the implicit argument. When there is no branches, alloc_locals can add the local variables automatically. Adding tempvar at the end of every branch also works.

Cairo supports a few possible layouts. Each layout specifies which of the different builtins exist and how many instances of that builtin can be used. This is measured as the ratio between the number of instructions (n_steps) and the number of available builtin instances. The default plain layout has no builtins.

The %builtins directive specifies which builtins are used by the program. Each builtin adds an argument to main() and requires a return value.