This is a study note of How Cairo Works. It covers both low-level and high-level language features of Cairo.
1 Introduction to Cairo
In Cairo, the basic data type felt
(field element) is an integer in the range 0 <= x <P
, where P
is a 256-bit prime number. All the computations are done modulo P
. In Cairo, the divisioin x / y
satisfies (x / y) * y == x
, therefore 2 * (P + 1) / 2 = P + 1
which is 1 (mod P)
, 1 / 2 = (P + 1 ) / 2
. Additionally, multiply a number by 2
or 3
doesn’t always result in a number divisible by 2
or 3
.
The goal of a Cairo program is to prove that some computation is correct. A nondeterministic computation method is often used to first guess a value (the nondeterministic part), then verify other properites.
Cairo uses a read-only nondeterministic memory model. The value for each memory cell is chosen by the prover, but is immutable during the program execution. We use the syntax [x]
to represent the value of the memory at address x
. The memory can be thought as a write-once
memory. The assert [x] == y
can be interpreted either as write value y to address x
or verify that the value of address x is y
. They mean the same thing in read-only nondeterministic memory model. The memory addresses accessed by a program must be continuous. The memory offsets must be in the range of [-2**15, 2**15)
.
The only values that may change over time are three register values:
ap
: the allocation pointer points to a yet-unused memory cell.fp
: the frame pointer points to the frame of the current function. The addresses of all the function’s arguments and local variables are relative to the value offp
. When a function starts,fp
is equal toap
. But unlikeap
, the value offp
remains the same throughout the scope of a function.pc
: the program counter points to the current instruction.
A simple Cairo instruction takes the form of an assertion for equality, so named assert-equal instruction. For example: [ap] = [ap - 1] * [fp]; ap++
states that the production of two cells is written to the value of the next unused cell. ; ap++
is not a separate instruction, it is part of the assert-euqal instruction.
ap
, fp
and pc
are pointers to memory cells. The values of memory cells are [ap]
, [fp]
and [pc]
.
2 The Program Counter and Reference
2.1 The Program Counter
Each instruction takes 1 or 2 field elements. When an immediate value is used in the instruction, it takes 2 field elements. The program counter (pc) keeps the address of the current insturciton. Usually it advances by 1 or 2 – the size of the instruciton.
A jmp
instruction may be used to jump to a different instruction in three flavors:
jmp abs 17
:pc = 17
.jmp rel 17
:pc = pc + 17
jmp label
: a relative jump.
Both jmp rel
and jmp lable
can be used with an if
condition for conditonal jump.
2.2 Reference
The let x = <expr>
syntax defines a reference to the expr
’s value and substitue x
accordingly when it is used later. A reference can hold any expression and may depende on ap
or fp
. For example: let x =[[fp + 3] + 1]
.
The assert <compound-expr> = <compound-expr>
asserts the equality between the values of two compound expressions. The Cairo compiles it into multiple instructions and ap
may advance an unknown number of steps. Hence, you should avoid using ap
and fp
in compound expression and use the referece, temporary, or local variable. For example, let x = [ap - 1]
and let y = [ap - 2]
, then assert x * x = x + 5 * y
. Here the ap
chanes dynamically during the execution of the compound expression.
References are not variables: the scope of each definition is defined according to a static analysis of the order in which the instructions will be executed. It will follow a basic flow from jumps and conditional jumps, but if there are colliding definitions for the same reference, the reference will be revoked.
If there is a label or a call instruction between the definition of a reference that depends on ap
and its usage, the reference may be revoked, since the compiler may not be able to compute the change of ap
for jump and call. Sometime the compiler will not automatically detect a jump and the reference will not be revoked. Using the reference in such cases may result in an undefined behavior. A reference depends on fp
is never revoked.
The syntax ref_name.member_name
, where ref_name
is a typed reference (T*
) with value val
and type T
, and T.member_name
is a member definition, compiles to [ref_name + T.member_name]
. The member_name
is the memory offset. The result is [val + offset]
. You may omit the type when define a typed reference like: let ptr = cast([fp], MyStruct*)
– assume [fp]
contains a point to a struct.
A temporary variable is a syntactic sugar for a reference based on the ap
register. For a simple expression, tempvar var_name = <expr>
is equivalent to [ap] = <expr>; ap++
and let var_name = [ap - 1]
. A temporary variable is a reference that may be revoked. In practice, let
is used in pointer operations and function call; tempvar
is used for value expression.
Local variables are based on the fp
register. The first local variable will be a reference to [fp + 0]
, the second one to [fp + 1]
and so on. The Cairo compiler auto-generates a constant SIZEOF_LOCALS
with is equal to the accumulated size of locals within the same scope. The instruction alloc_locals
is transformed to ap += SIZEOF_LOCALS
.
Note that unless the local variable is defined and initialized in the same line, the local
directive itself does not translate to a Cairo instruction (this is another difference from tempvar
) – it simply translates to a reference definition. This is one of the reasons you must increase the value of ap
manually. The type of a local variable must be explicitly stated (otherwise, felt
is used), and it is not deduced from the type of the initialization value.
You can specify a typed local variable in two different ways: local x : T* = <expr>
and local y : T = <expr>
. The first one allocates one cell whose value is a pointer to a struct of type T
, x.a
is [[fp + 0] + T.a]
. The second one allocates T.SIZE
cells. y
refers to the address of the structure fp + 1
rather than [fp + 1]
. y.a
is [fp + 1 + T.a]
. Use y
is an error. For example, tempvar z = y
will faile, since it compiles to [ap] = fp
, which is not a valid instruction in Cairo. Nevertheless, defining a variable called __fp__
will allow the code to compile.
Tuples are represented as a comma-separated list of elements enclosed in parentheses. For example: (3, x)
. Cairo requires a trailing comma for single-element tuples, to distinguish them from regular parentheses. For example (5,)
is a single-element tuple. Tuple elements are accessed with the tuple expression followed by brackets containing a zero-based index to the element. The index must be known at compile time.
In order to represent an array (an ordered collection of homogeneous elements) in Cairo, one may use a pointer to the beginning of the array. The standard library function alloc()
may be used to “dynamically” allocate a new array. For example, let (local struct_array : MyStruct*) = alloc()
allocates a new memory segment and treats it as a pointer to MyStruct
. The expression struct_array[n]
is used to access the n-th
element of the array, where n=0
is the first element.struct_array[index]
is compiled to [struct_array + index * MyStruct.SIZE]
, and is of type MyStruct
.
3 Functions
3.1 Function Execution
A function is a reusable unit of code that receives arguments and returns a value. Cairo provides two low-level instructions call addr
and ret
to facilitate resuable code. The high-level syntax are function_name()
and return ()
. The call
is similar to jmp label
where a function is considered a label.
When a function starts, the fp
is initialized to the current value of ap
. When a function calls another function, the fp
changes for the inner call and restores when the inner call returns. Under the hood, call addr
works as the following:
|
|
3.2 Access Register Values
Cairo provides two functions to retrieve the values of the three registers.
|
|
When Cairo needs to use the address fp
in a compund expression, it will try to replace fp
with a variable named __fp__
that contains the value of fp
. For example, in the following code, line B requires line A while line C doesn’t.
|
|
3.3 Function Arguments and Return Values
Function arguments are written to the stack before the call
instruction. Because a call
pushes the next pc
and current fp
into the stack, the function arguments are available at [fp - 3]
, [fp - 4]
, and so on in reverse order. For each argument, the Cairo compiler creates a reference argname
to its value and a constant Args.argname
with its offset (0, 1, 2, ...)
. The use of argname
is replaced by [fp - (2 + n_args) + Args.argname]
. Cairo supports the named argment call (a syntactic sugar) that can use compound expression: foo(x=4, y=5)
.
The function writes to the stack its return values just before the ret
instruction. The retun values will be available to caller at [ap - 1]
, [ap - 2]
and so on. The Cairo compiler creates constants named foo.Return.z
and foo.Return.w
for the return values.
|
|
4 Hints
A hint is a block of Python code that will be executed by the prover right before the next instruction.
To access Cairo data, use memory[ap]
to access ap
value, ids.x
for constant, reference and function arguments. A hint is attached to the next instruction and executed before the next instruction.
5 Program Input & Output
A Cairo program may have a secret program input and a public program output.
You can specify a program input file using --program_input=input_filename.json
flag. Then you can use hints to access the input file content using the variable program_input['key_name']
.
To create a program output, you first need to add %builtins output
directive. The directive makes the funciton main()
get one argument and return one value. The argument, conventionally called output_ptr
, is a pointer to a block of memory to which it may write its outputs. main()
should return the value of the pointer after writing, signifying where the chunk of output memory ends.
The program uses a different layout such as --layout=small
that requires the number of steps to be divisible by 512
. You can set it via --steps=512
.
6 Segments
Cairo programs are themselves kept in memory, in what is called the program segment. This segment is of fixed length and contains the numeric representation of the Cairo program. The program counter pc
starts at the beginning of the program segment.
A Cairo program also requires an execution segment. This is where the register ap
and fp
start, and where data generated during the run of a Cairo program (variables, return addresses for function calls, etc.) is stored. The length of the execution segment is variable as it may depend on the program input.
Every builtin has its own variable-length continuous area in memory that is located in its own segment.
7 Builtins and Implicit Arguments
Builtins are predefined optimized low-level execution units which are added to the Cairo CPU board to perform predefined computations which are expensive to perform in vanilla Cairo (e.g., range-checks, Pedersen hash, ECDSA, …).
The CPU communicates with builtins thourgh memeory: each builtin has a segment and applies constraints on the memory cells in the segment.
Implicit argument is a syntactic sugar that adds arguments and return values to the funciton When you use the high-level return
statement, you don’t have to explicitly return the implicit argument. The Cairo compiler returns the current binding of the implicit argument.
There are three ways to call a function with implicit argments:
- Explicitly
function_name{x=y}()
:y
is bind tox
. After the call,y
has the return value ofx
. - Implicitly using an implicit argument with the same name in caller.
- Implicitly using a
with
statement in caller.
Because an implicit argument is implemented as a reference, it might be revoked. The solution is to use a local variable to keep the implicit argument. When there is no branches, alloc_locals
can add the local variables automatically. Adding tempvar
at the end of every branch also works.
Cairo supports a few possible layouts. Each layout specifies which of the different builtins exist and how many instances of that builtin can be used. This is measured as the ratio between the number of instructions (n_steps
) and the number of available builtin instances. The default plain
layout has no builtins.
The %builtins
directive specifies which builtins are used by the program. Each builtin adds an argument to main() and requires a return value.