This is a note of week 1 of the Coursera course Cryptography I.

1 Introduction

A central theme in cryptography is that anything can be done with trusted authority can also be done without.

The course objectives are:

  • learn how crypto primitives work
  • mort importantly, learn how to use them to reason about security

Cryptography is used in secure communication, file encryption, content protection, user authentication, ditial signature, anonymous communication, digital cash, secure multi-party computation(elections, private auctions), and much much more.

This courses covers two main parts: shared key cryptogrpahyt and public key cryptography that address the two core functions: secure communication (confidentiality and integrity) and secret key establishment.

Two cryptograhy magics are: anonymous computation outsourcing and zero knowledge proof.

Cryptography is not

  • a solution to all secruity problems
  • reliable unless implemented and used properly
  • something you should try to invent yourself with ad-hoc designs

Three steps in cryptography:

  • precisely specify threat model
  • propose a construction
  • prove that breaking construction under threat model will solve an underlying hard problem

2 Discrete Probability

2.1 Probability Distribution

Discrete probability is defined over a universe U that is a fintie set like U = {0, 1}^n. Probability distribution P over U is a function P: U -> [0, 1] such that the sum of the probability of all elements in the universe is 1.

A Uniform Distribution assigns to every element the same probability. A Point Distribution assigns one element x0 a probability of 1 and everything elese a 0 probability.

2.2 Event and Random Variable

A subset A in a universe is called an event. Its probabiliy Pr[A] is the sum of the probability of its elements. We have Pr[U] = 1.

The union bound if formulated as Pr[A1 union A2] <= Pr[A1] + Pr[A2]. If A1 intersect A2 = empty, then Pr[A1 union A2] = Pr[A1] + Pr[A2].

A random variable X is a funcition X: U -> V. A random variable X induces a distribution on V that is defined as Pr[X=v] := Pr[u that X(u) = v].

A uniform random variable over U={0, 1}^n, represented as r <--R-- U, is defined as for every element a in U, Pr[r = a] = 1 / |U|. Formally, r is the identity funciton r(x) = x for all x in U.

2.3 Randomized Algorithm

A deterministic algorithm A always produces the output for the same input. A randomized algorithm A takes an input and an implicit argument r, where r is sampled anew every time the algorithm runs. Specifically, the r is a uniform random variable. One way to think about randomized algorithm is that it defines a random variable. The algoirhtm defins a distribution and the set of all possible outpus.

2.4 Independence

If Pr[A and B] = Pr[A] * Pr[B], then events A and B are independent. Similarly, two random variables X and Y taking values in V are independent if for all a and b in V, Pr[X=a and Y=b] = Pr[X=a] * Pr[Y=b].

2.5 XOR

XOR of two strings in {0, 1}^n is their bit-wise addition modular 2.

1
2
3
4
5
6
A XOR  0 = A,
A XOR  A = 0,
(B XOR  A) XOR  A = B XOR  0 = B,
Commutative: A XOR  B = B XOR A,
Associative: (A XOR  B) XOR  C = A XOR  (B XOR  C),
Distributive: (A or B) XOR C = (A XOR C) or (B XOR C),

There is a joke that all things a cryptographer knows to do is just XOR things. There is a strong reason behind this joke because XOR has a very important property: if Y is a random variable over U, and X is an independent uniform variable over U, then Z := Y XOR X is a uniform variable on U.

To prove it, we can start with n = 1 and calculate Pr[Z=0], which means Pr[(X, Y) = (0, 0) or (X, Y) = (1, 1))] that is 1/2. Practically, it means that regardless of the value Y (all 0, all 1, many 0, or many 1), the result is a uniform random variable, i.e., both 0 and 1 have a probability of 0.5.

As shown in the proof, the probability distribution of Y doesn’t matter as long as 1) X is a uniform random variable and 2) the operation such as XOR is distributive to make the sum of probability of all Y values to 1.

2.6 The Birthday Paradox

Let r1, r2, …, rn from U be independent identically distributed (iid) random variables, when n = 1.2 * |U| ^ 0.5 then there exist i and j such that Pr[i != j and ri = rj] >= 0.5.

For example, if U={0, 1}^128, then n is about 2 ** 64, some two sampled messages will likely be the same. For people’s birthday, n is about 24. For one million messages, n is about 1200. If n is 2200, the probability is 0.9 that two messages will be the same.

3 Cipher

A cipher defined over (K, M, C) is a pair of “efficient” algorithms (E, D) where an encryption algorithm E: K x M -> C and a decryption algorithm D: K x C -> M such that for m in M, k in K, D(k, E(k, m)) = m. This equation is called consistent equation. Efficient means the cipher runs in polynomial time. E is often a randomized algorithm that generates random bits to encrypt the messages. D is always deterministic that gives consistent output for a given pair of key and ciphertext (k, c).

One Time Pad (OTP) cipher developed by Vernam in 1917, we have M = C = K = {0, 1} ^ n. The key k is a random bit string as long as the message m. The cipher is defined as E(k, m) = k xor m and D(k, c) = k xor c. It is very fast but has long keys.

4 Perfect Secrecy

Shannon developed information theoretic security concept of perfect secrecy in 1949. The basic idea is that a ciphertext should reveal no information about its plaintext – the most powerful advisory learns nothing about plaintext from the ciphertext.

Formally, a cipher (E, D) over (K, M, C) has perfect secrecy if for every two same-length messages m0, m1, for any c cipher text, if we randomly pick a key from the uniform random variable k, Pr[E(k, m0) = c] = Pr[E(k, m1) = c]. In other words, the number of random keys (from a uniform distribution) can generate the same c for every m should be the same.

OTP has perfect secrecy because for any pair of message m and ciphertext c, there is only one key can generate the c from the m: Pr[E(k, m) = c] = 1 / |K|.

It means there is no cipher-only attack for OTP though other attacks may be possible. Shannon also proved that to have perfect secrecy, the key lenght has to be at least the length of the message. It means that it is hard to have perfect secrecy in practice.

5 Stream Ciphers

5.1 Definition

A pseduo random generate (PRG) uses a short random seed key to general much longer pseudo random key: G: {0, 1}^s -> {0, 1} ^ n where n >> s. A stream cipher is defined as E(k, m) = m xor G(k) and D(k, c) = c xor G(k). Because it doesn’t have prefect secrecy, we need a different definition of security.

5.2 PRG Predictability

We say that a generator G: K -> {0, 1} ^ n is predictable if there exists an efficient algorithm A, there exists an i where 1 <= i <= n-1 and output of G(k) for all 1, 2,..., i , such that the probability of the prediction of G(k) for i + 1 using A equals 1 / 2 + epsilon where epsilon is a non-negligible positive number. If the PRG is predictable then the stream cipher is not secure because the generated key is predictable.

A RPG is unpredictable is there is no effecient algorithm can predict bit i+1 for non-negligible epsilon.

Some weaks PRGs include linear congruent generator and the random() in glibc.

In practice, epsilon is a scalar and non-negligible epsilon >= 1 / 2 ^ 30 because it is likely to happen over 1GB of data. It is negligible if epsilon < = 1 / 2 ^ 80.

In theory, epsilon is defined as a negligible function that maps non-negative integers to non-negative real numbers of probabilities. It is non-negligible if there exists a d that for infinitely many values of z that satisfy epsilon(z) >= 1 / z ^ d. It means that the function epsilon is bigger than one over a polynomial infinitely often.

If for every d value, there exists an interger Zd such that for all Z >= Zd (thus there are a finite number of values not satisfying the following condition), epsilon is smaller than one over polynomials, then it is negligible.

For example, epsilon(z) = 1 / 2 ^ z is negligible, but e(z) = 1 / z ^ 1000 is non-negligible.

5.3 Secure Stream Cipher

Not strictly, a stream cipher is secure if its PRG is unpredictable.

You should never use the same key(pad) more than once because if m1 XOR PRG(k) -> c1 and m2 XOR PRG(k) -> c2, then c1 XOR c2 = m1 XOR m2 that reveals a lot information. It happens in Microsoft PPTP protocol that encrypts both client and server data with the same key. If more than one keys used, they should not be related – 802.11b WEP uses a set of related keys and recycles keys on every reboot or after 2^24 frames.

It is also easy to find a small change in plaintext because the ciphertext also change in the same location. For netowrk traffic, use a new key for each session. Don’t use stream ciphers in disk encryption.

Stream ciphers don’t have integrity. OTP is malleable because it is impossible to detect the modification of the ciphertext.

RC4 and CSS are insecure. Use eStream implementations such as Salsa.

6 Secure PRG

What it means that G(k) -> {0, 1}^n, k <--R-- {0, 1}^s is indistinguishable from a uniform random variable of r <--R-- {0, 1}^n?

A statistical test on {0, 1}^n is an algorithm A such that A(x) outputs 0 (not random) or 1 (random) for a n-bit string x. For example, A(x) = 1 if and only if |#0(x) - #1(x)| <= 10 * n ^ 1 / 2 or |#00(x) - n / 4| <= 10 * n ^ 1 /2 or max-run-of-0(x) <= 10 * log(n). There are many statistical tests.

Let G: K -> {0, 1}^n be a PRG and A a statistical test on {0, 1}^n, the advantage of the PRG is defined as Adv[A, G] = | Pr[A(G(k)) = 1] - Pr[A(r) = 1] where k is a seed and r is a uniform random variable from {0, 1}^n. If the advantage is close to 0, then A cannot distinguish G from a random string. It can also be stated as the PRG is computationally indistinguishable from a uniform random variable.

A PRG is secure if for all efficient statistical tests, their advantages are negligible. It is impossible to prove a PRG is secure or not. If it is provable, then it means that P != NP which is yet to be solved.

A predictable RPG is insecure. In 1982, Yao proved that an unpredictable PRG is secure. It means that if next-bit predictors cannot distinguish G from random then no statistical test can.

7 Semantic Security

An encryption E is semantically secure if for any efficient statistical test A, the advantage of E is negligible for any pair of explict same length messages m0, m1. It means that {E(k, m0)} is computationally undistinguishable from {E(k, m1)}.

If a cipher is semantically secure then no bit of information is revealed to an efficient adversary (efficient statistical test), rather than all adversaries. OTP is semantically for all adversaries, not just the efficient statistical test.

An important theory is that if a PRG is secure, then its stream cipher is semantically secure.

To prove it, we can prove that for any semantically secure adversary A, there exists a PRG adversary B such that Adv[A, E] <= 2 * Adv[B, G]. If the right hand side is true, then the left hand side must be negligible too.

The idea is that if we replace a secure PRG with a uniform random variable r, the adversary cannot tell the difference because the PRG is secure.

Let W0, W1 be the events that output 1 when receiving the encryption of m0 or m1 respectively. Similarly, let R0, R1 be the events that output 1 for encryption of m0 and m1. We have |Pr[R0] - Pr[R1]| = Adv[A, OTP] = 0. We want that for b = 0, 1, there exists an adversary B such that |Pr[Wb] - Pr[Rb]| = Adv[B, G]. Then |Pr[W0] - Pr[W1]| <= 2 * Adv[B, G] means Adv[A, E] <= 2 * Adv[B, G] because Adv[A, E] = |Pr[W0] - Pr[W1]|.

The claim that there exists a B such that |Pr[Wb] - Pr[Rb]| = Adv[B, G] means that if we substitute the PRG with OTP, the adversary cannot tell the difference because the PRG is secure. We construct a statistical test B that takes a random n-bit string y as input and outputs 0 or 1 for random or not random. It uses the A to generate a pair of messages m0 and m1. Then B returns m0 XOR y as an input to A and A returns its output as the output of B: 0 for m0 and 1 for m1 based on the test result of A.

By definition, Adv[B, G] = |Pr[B(G(k)) = 1] - Pr[B(r) = 1]|. With the above B, Pr[B(r) = 1] is the same as Pr[R0] because the input is an truely random string. For Pr[B(G(k)) = 1], we have a pseduo random string y as input and m0 XOR y becomes Pr[W0]. It means |Pr[W0] - Pr[R0]| = Adv[B, G]. It is true when b=1.