This is a note of week 1 of the Coursera course Cryptography I.

## 1 Introduction

A central theme in cryptography is that anything can be done with trusted authority can also be done without.

The course objectives are:

• learn how crypto primitives work
• mort importantly, learn how to use them to reason about security

Cryptography is used in secure communication, file encryption, content protection, user authentication, ditial signature, anonymous communication, digital cash, secure multi-party computation(elections, private auctions), and much much more.

This courses covers two main parts: shared key cryptogrpahyt and public key cryptography that address the two core functions: secure communication (confidentiality and integrity) and secret key establishment.

Two cryptograhy magics are: anonymous computation outsourcing and zero knowledge proof.

Cryptography is not

• a solution to all secruity problems
• reliable unless implemented and used properly
• something you should try to invent yourself with ad-hoc designs

Three steps in cryptography:

• precisely specify threat model
• propose a construction
• prove that breaking construction under threat model will solve an underlying hard problem

## 2 Discrete Probability

### 2.1 Probability Distribution

Discrete probability is defined over a universe `U` that is a fintie set like `U = {0, 1}^n`. Probability distribution `P` over `U` is a function `P: U -> [0, 1]` such that the sum of the probability of all elements in the universe is `1`.

A Uniform Distribution assigns to every element the same probability. A Point Distribution assigns one element `x0` a probability of `1` and everything elese a `0` probability.

### 2.2 Event and Random Variable

A subset `A` in a universe is called an event. Its probabiliy `Pr[A]` is the sum of the probability of its elements. We have `Pr[U] = 1`.

The union bound if formulated as `Pr[A1 union A2] <= Pr[A1] + Pr[A2]`. If `A1 intersect A2 = empty`, then `Pr[A1 union A2] = Pr[A1] + Pr[A2]`.

A random variable `X` is a funcition `X: U -> V`. A random variable `X` induces a distribution on `V` that is defined as `Pr[X=v] := Pr[u that X(u) = v]`.

A uniform random variable over `U={0, 1}^n`, represented as `r <--R-- U`, is defined as `for every element a in U, Pr[r = a] = 1 / |U|`. Formally, `r` is the identity funciton `r(x) = x for all x in U`.

### 2.3 Randomized Algorithm

A deterministic algorithm `A` always produces the output for the same input. A randomized algorithm `A` takes an input and an implicit argument `r`, where `r` is sampled anew every time the algorithm runs. Specifically, the `r` is a uniform random variable. One way to think about randomized algorithm is that it defines a random variable. The algoirhtm defins a distribution and the set of all possible outpus.

### 2.4 Independence

If `Pr[A and B] = Pr[A] * Pr[B]`, then events `A` and `B` are independent. Similarly, two random variables `X` and `Y` taking values in `V` are independent if `for all a and b in V, Pr[X=a and Y=b] = Pr[X=a] * Pr[Y=b]`.

### 2.5 `XOR`

`XOR` of two strings in `{0, 1}^n` is their bit-wise addition modular `2`.

 ``````1 2 3 4 5 6 `````` ``````A XOR 0 = A, A XOR A = 0, (B XOR A) XOR A = B XOR 0 = B, Commutative: A XOR B = B XOR A, Associative: (A XOR B) XOR C = A XOR (B XOR C), Distributive: (A or B) XOR C = (A XOR C) or (B XOR C), ``````

There is a joke that all things a cryptographer knows to do is just `XOR` things. There is a strong reason behind this joke because `XOR` has a very important property: if `Y` is a random variable over `U`, and `X` is an independent uniform variable over `U`, then `Z := Y XOR X` is a uniform variable on `U`.

To prove it, we can start with `n = 1` and calculate `Pr[Z=0]`, which means `Pr[(X, Y) = (0, 0) or (X, Y) = (1, 1))]` that is `1/2`. Practically, it means that regardless of the value `Y` (all `0`, all `1`, many `0`, or many `1`), the result is a uniform random variable, i.e., both `0` and `1` have a probability of `0.5`.

As shown in the proof, the probability distribution of `Y` doesn’t matter as long as 1) `X` is a uniform random variable and 2) the operation such as `XOR` is distributive to make the sum of probability of all `Y` values to `1`.

Let `r1`, `r2`, …, `rn` from `U` be independent identically distributed (iid) random variables, when `n = 1.2 * |U| ^ 0.5` then there exist `i` and `j` such that `Pr[i != j and ri = rj] >= 0.5`.

For example, if `U={0, 1}^128`, then `n` is about `2 ** 64`, some two sampled messages will likely be the same. For people’s birthday, `n` is about `24`. For one million messages, `n` is about `1200`. If `n` is `2200`, the probability is `0.9` that two messages will be the same.

## 3 Cipher

A cipher defined over `(K, M, C)` is a pair of “efficient” algorithms `(E, D)` where an encryption algorithm `E: K x M -> C` and a decryption algorithm `D: K x C -> M` such that for `m` in `M`, `k` in `K`, `D(k, E(k, m)) = m`. This equation is called consistent equation. Efficient means the cipher runs in polynomial time. `E` is often a randomized algorithm that generates random bits to encrypt the messages. `D` is always deterministic that gives consistent output for a given pair of key and ciphertext `(k, c)`.

One Time Pad (OTP) cipher developed by Vernam in 1917, we have `M = C = K = {0, 1} ^ n`. The key `k` is a random bit string as long as the message `m`. The cipher is defined as `E(k, m) = k xor m` and `D(k, c) = k xor c`. It is very fast but has long keys.

## 4 Perfect Secrecy

Shannon developed information theoretic security concept of perfect secrecy in 1949. The basic idea is that a ciphertext should reveal no information about its plaintext – the most powerful advisory learns nothing about plaintext from the ciphertext.

Formally, a cipher `(E, D) over (K, M, C)` has perfect secrecy if for every two same-length messages `m0`, `m1`, for any `c` cipher text, if we randomly pick a key from the uniform random variable `k`, `Pr[E(k, m0) = c] = Pr[E(k, m1) = c]`. In other words, the number of random keys (from a uniform distribution) can generate the same `c` for every `m` should be the same.

OTP has perfect secrecy because for any pair of message `m` and ciphertext `c`, there is only one key can generate the `c` from the `m`: `Pr[E(k, m) = c] = 1 / |K|`.

It means there is no cipher-only attack for OTP though other attacks may be possible. Shannon also proved that to have perfect secrecy, the key lenght has to be at least the length of the message. It means that it is hard to have perfect secrecy in practice.

## 5 Stream Ciphers

### 5.1 Definition

A pseduo random generate (PRG) uses a short random seed key to general much longer pseudo random key: `G: {0, 1}^s -> {0, 1} ^ n` where `n >> s`. A stream cipher is defined as `E(k, m) = m xor G(k)` and `D(k, c) = c xor G(k)`. Because it doesn’t have prefect secrecy, we need a different definition of security.

### 5.2 PRG Predictability

We say that a generator `G: K -> {0, 1} ^ n` is predictable if there exists an efficient algorithm `A`, there exists an `i` where `1 <= i <= n-1` and output of `G(k)` for all `1, 2,..., i` , such that the probability of the prediction of `G(k) for i + 1` using `A` equals `1 / 2 + epsilon` where `epsilon` is a non-negligible positive number. If the PRG is predictable then the stream cipher is not secure because the generated key is predictable.

A RPG is unpredictable is there is no effecient algorithm can predict bit `i+1` for non-negligible `epsilon`.

Some weaks PRGs include linear congruent generator and the `random()` in `glibc`.

In practice, `epsilon` is a scalar and non-negligible `epsilon >= 1 / 2 ^ 30` because it is likely to happen over `1GB` of data. It is negligible if `epsilon < = 1 / 2 ^ 80`.

In theory, `epsilon` is defined as a negligible function that maps non-negative integers to non-negative real numbers of probabilities. It is non-negligible if there exists a `d` that for infinitely many values of `z` that satisfy `epsilon(z) >= 1 / z ^ d`. It means that the function `epsilon` is bigger than one over a polynomial infinitely often.

If for every `d` value, there exists an interger `Zd` such that for all `Z >= Zd` (thus there are a finite number of values not satisfying the following condition), `epsilon` is smaller than one over polynomials, then it is negligible.

For example, `epsilon(z) = 1 / 2 ^ z` is negligible, but `e(z) = 1 / z ^ 1000` is non-negligible.

### 5.3 Secure Stream Cipher

Not strictly, a stream cipher is secure if its PRG is unpredictable.

You should never use the same key(pad) more than once because if `m1 XOR PRG(k) -> c1` and `m2 XOR PRG(k) -> c2`, then `c1 XOR c2 = m1 XOR m2` that reveals a lot information. It happens in Microsoft PPTP protocol that encrypts both client and server data with the same key. If more than one keys used, they should not be related – 802.11b WEP uses a set of related keys and recycles keys on every reboot or after `2^24` frames.

It is also easy to find a small change in plaintext because the ciphertext also change in the same location. For netowrk traffic, use a new key for each session. Don’t use stream ciphers in disk encryption.

Stream ciphers don’t have integrity. OTP is malleable because it is impossible to detect the modification of the ciphertext.

RC4 and CSS are insecure. Use eStream implementations such as Salsa.

## 6 Secure PRG

What it means that `G(k) -> {0, 1}^n, k <--R-- {0, 1}^s` is indistinguishable from a uniform random variable of `r <--R-- {0, 1}^n`?

A statistical test on `{0, 1}^n` is an algorithm `A` such that `A(x)` outputs `0` (not random) or `1` (random) for a `n-bit string x`. For example, `A(x) = 1` if and only if `|#0(x) - #1(x)| <= 10 * n ^ 1 / 2` or `|#00(x) - n / 4| <= 10 * n ^ 1 /2` or `max-run-of-0(x) <= 10 * log(n)`. There are many statistical tests.

Let `G: K -> {0, 1}^n` be a PRG and `A` a statistical test on `{0, 1}^n`, the advantage of the PRG is defined as `Adv[A, G] = | Pr[A(G(k)) = 1] - Pr[A(r) = 1]` where `k` is a seed and `r` is a uniform random variable from `{0, 1}^n`. If the advantage is close to `0`, then `A` cannot distinguish `G` from a random string. It can also be stated as the PRG is computationally indistinguishable from a uniform random variable.

A PRG is secure if for all efficient statistical tests, their advantages are negligible. It is impossible to prove a PRG is secure or not. If it is provable, then it means that `P != NP` which is yet to be solved.

A predictable RPG is insecure. In 1982, Yao proved that an unpredictable PRG is secure. It means that if next-bit predictors cannot distinguish G from random then no statistical test can.

## 7 Semantic Security

An encryption `E` is semantically secure if for any efficient statistical test `A`, the advantage of `E` is negligible for any pair of explict same length messages `m0`, `m1`. It means that `{E(k, m0)}` is computationally undistinguishable from `{E(k, m1)}`.

If a cipher is semantically secure then no bit of information is revealed to an efficient adversary (efficient statistical test), rather than all adversaries. OTP is semantically for all adversaries, not just the efficient statistical test.

An important theory is that if a PRG is secure, then its stream cipher is semantically secure.

To prove it, we can prove that for any semantically secure adversary `A`, there exists a PRG adversary `B` such that `Adv[A, E] <= 2 * Adv[B, G]`. If the right hand side is true, then the left hand side must be negligible too.

The idea is that if we replace a secure PRG with a uniform random variable `r`, the adversary cannot tell the difference because the PRG is secure.

Let `W0`, `W1` be the events that output `1` when receiving the encryption of `m0` or `m1` respectively. Similarly, let `R0`, `R1` be the events that output `1` for encryption of `m0` and `m1`. We have `|Pr[R0] - Pr[R1]| = Adv[A, OTP] = 0`. We want that for `b = 0, 1`, there exists an adversary `B` such that `|Pr[Wb] - Pr[Rb]| = Adv[B, G]`. Then `|Pr[W0] - Pr[W1]| <= 2 * Adv[B, G]` means `Adv[A, E] <= 2 * Adv[B, G]` because `Adv[A, E] = |Pr[W0] - Pr[W1]|`.

The claim that there exists a `B` such that `|Pr[Wb] - Pr[Rb]| = Adv[B, G]` means that if we substitute the PRG with OTP, the adversary cannot tell the difference because the PRG is secure. We construct a statistical test `B` that takes a random `n-bit` string `y` as input and outputs `0` or `1` for random or not random. It uses the `A` to generate a pair of messages `m0` and `m1`. Then `B` returns `m0 XOR y` as an input to `A` and `A` returns its output as the output of `B`: `0` for `m0` and `1` for `m1` based on the test result of `A`.

By definition, `Adv[B, G] = |Pr[B(G(k)) = 1] - Pr[B(r) = 1]|`. With the above `B`, `Pr[B(r) = 1]` is the same as `Pr[R0]` because the input is an truely random string. For `Pr[B(G(k)) = 1]`, we have a pseduo random string `y` as input and `m0 XOR y` becomes `Pr[W0]`. It means `|Pr[W0] - Pr[R0]| = Adv[B, G]`. It is true when `b=1`.