This guide is designed to explain why you need to hide information and how can you do this when you do not trust the channel through which messages are conveyed. We will discuss about cryptographic system, encryption, decryption, one-way function, asymmetric keys and more. You may think of cryptography as the thing that keeps you untouchable inside of a soap bubble travelling by air around the world.

Do you think it is safer by plane?

Terminology plaintext or cleartext : intelligible message that sender wants to transmit to a receiver

ciphertext : unintelligible message resulted from plaintext encryption using a cryptosystem

encryption : the process of converting a plaintext into a ciphertext

decryption : the process of converting a ciphertext into a plaintext (reverse of encryption)

Conventional cryptography It is also called symmetric-key or shared-key encryption. The same key is used to encrypt and decrypt a message. Consider this example as a conventional cryptography: You and your roommate, both use the same key to lock/unlock the door of your house. Thus, you share the same key to secure the room. It is true that your roommate could have a copy of your key so he can join the room when you are at work or vice-versa. Example of conventional cryptosystems that use symmetric-key: Data Encryption Standard (DES), Advanced Encryption Standard (AES) Advantages: Fast. Disadvantages: Not safe! The sender and receiver must agree upon a secret key and prevent others from getting access to it. There is also a big problem if they are not in the same physical location because of key distribution. How could you give your home key to your roommate, which is in America while you are in China? Practical advice: Symmetric key should be changed with any message, so that only one message can be leaked in case of disaster (crypt-analysed, stole, etc).

Key distribution In the previous paragraph we were talking about cryptosystems using symmetric-keys and the lack of an efficient method to securely share your key with your roommate. Key distribution comes to help solving this shortcoming. Next we are going to explain how key exchange becomes possible over an untrusted communication channel. Diffie-Hellman key exchange This key exchange is based on an algorithm that mathematically cannot easily compute discrete logarithms of large numbers in a reasonable amount of time. We will offer an overview of the algorithm using colours before we run straightforward with numbers and abstract formula. Step 1: Alice and Bob come to an agreement for a common colour. Step 2: Alice choose her secret colour that will not tell to Bob. Bob will do the same thing. Step 3: Alice will mix the common colour with the secret one and the result is a mixture. Bob will also mix his secret colour with the common one and will obtain a different mixture from Alice’s one. Step 4: Alice and Bob exchange the mixtures. This is the most critical step for communication because a man-in-the-middle could get access to those two mixtures. There is also a problem if the man-in-the-middle has both mixtures. Colour decomposition is irreversible. So the only chance to find two’s secret colour is mixing all possible colours with the common colour from step one. Also, remember that a secret colour can be also a mixture of many other colours. Update: Diffie-Hellman does not protect you from a man-in-the-middle attack. To see why, imagine an attacker receiving all messages from Alice and replaying them back to Bob. Step 5: Alice will add again her secret colour to the mixture that Bob sent to her. Bob will follow the same steps. Finally Alice and Bob will obtain a common secret colour. Now, Alice and Bob can safely exchange the symmetric-key we were talking in a previous chapter, because they can encrypt and decrypt any message (sent through a communication channel) using the above secret colour. And here comes math. It is always about math when we do not have enough colours. Step 1: Alice and Bob come to an agreement for two large numbers: one prime p (recommended at least 512 bits) and a base g (a primitive root of p ). p > 2 g < p Step 2: Alice chooses a secret integer a . Bob chooses a secret integer b . a < p-1 b < p-1 Step 3: Alice computes public value x = g^a mod p . Bob computes public value y = g^b mod p , where mod is modulo operator. Step 4: Alice and Bob exchange x and y . Step 5: Alice computes her secret key k_a = y^a mod p . Bob computes his secret key k_b = x^b mod p . Mathematically it can be proved that k_a = k_b . Alice and Bob now have a common secret key used for encryption and decryption of any plaintext they exchange to safely communicate. Example: p = 23, g = 5 a = 6 b = 15 x = 5^6 mod 23 = 15625 mod 23 = 8 = x y = 5^15 mod 23 = 30517578125 mod 23 = 19 = y keys exchange: k_a = 19^6 mod 23 = 47045881 mod 23 = 2 k_b = 8^15 mod 23 = 35184372088832 mod 23 = 2 If a man-in-the-middle knows both secret integers a = 6 and b = 15 he could find the secret key used for communication. Here is how: k_a = k_b = g^(a*b) mod p = 5^90 mod 23 = 2 Advantages: Safe. Avoids man-in-the-middle attacks. Disadvantages: You can not be sure of the actual identity of the real ‘Bob’. Diffie-Hellman can be also explained using XOR (exclusive or) operator: Suppose Alice wants to transmit the message M = Hello to Bob. The binary representation of the message M is B(M) = 0100100001100101011011000110110001101111 . Alice encrypts the message with a secret key K = 1010101000101110100101010001110010101010 . B(M) xor K = 0100100001100101011011000110110001101111 ^ 1010101000101110100101010001110010101010 = 1110001001001011111110010111000011000101 = L (encrypted M) The equivalent message as plaintext for message L is âKùpÅ . Bob receives âKùpÅ and use the same secret key K that he has already exchanged with Alice to decrypt the message. L xor K = 1110001001001011111110010111000011000101 ^ 1010101000101110100101010001110010101010 = 0100100001100101011011000110110001101111 = M (original message) Why it is this algorithm important? Because protocols like: SSL, TSL, SSH, PKI or IPSec, all use Diffie-Hellman.

Public key cryptography Safe key distribution is resolved by public-key because it does not require a secure initial key exchange between you and your roommate. This cryptosystem is an asymmetric-key encryption – in contrast to symmetric-key – that uses a pair of keys (two separate keys): a public key for encoding and a private key, also called secret key, for decoding. The public-key should not compromise the private-key even though both are linked. public-key != private-key We can compare the asymmetric-key cryptosystem with an e-mail account. Your e-mail address is accessible to wide public (anyone can send you an e-mail at your@email.com, for example) but you are the only one who has the password to log in (that means only you can read the content of the e-mails). The public-key is your e-mail address and the private-key is the password linked with your e-mail address. How it works: Step 1: Create a pair of private-public keys (we will discuss later about generating pairs of keys). Step 2: Share your public key with your friends. Step 3: Sender uses your public key to encrypt the plaintext ( original message + encryption = ciphertext ). Step 4: Sender sends you the ciphertext. Step 5: Use your private key to decrypt the ciphertext ( ciphertext + decryption = original message ). Advantages: Convenience and security is increased. Disadvantages: Slow encryption speed. All public-private keys are susceptible to brute-force attack (this can be avoided by choosing large key size). You can not verify partner’s identity (vulnerable to impersonation). Usage: Since large key size produces too large output of encrypted message, encrypting and transmitting messages take longer. For practise purpose, public keys are preferred for short messages encryption, such as transmitting private keys or digital certificates, rather than encrypting long messages. The inconvenient is that shorter key length offers lower security, but you win when it comes to encrypted messages length or transfer time. Because of that, keys should be frequently replaced with new ones.

RSA RSA named for Rivest, Shamir and Adleman, is the next implementation of public key cryptosystem that use Diffie-Hellman method described in a previous paragraph. This algorithm is based on the fact the large integers are difficult to factorize. I will explain RSA algorithm step by step not before I assume you love math :) First of all you should have knowledge about mod (modulo operation) and coprime integers. Euler’s theorem: x^phi(z) mod z = 1 where phi(z) is Totient function, z positive integer. Briefly, Totient function counts the numbers of the coprimes to z . If z is prime, then phi(z) = z-1 (*) . Example: Consider z = 7 1 relatively prime to 7 2 relatively prime to 7 3 relatively prime to 7 4 relatively prime to 7 5 relatively prime to 7 6 relatively prime to 7 => phi(z) = phi(7) = z-1 = 6 Let’s continue with Euler’s theorem: x^phi(z) mod z = 1 <-> exponentiate (x^phi(z) mod z) * (x^phi(z) mod z) = 1 * 1 <-> x^(2*phi(z)) mod z = 1 Using mathematical induction we can prove that: x^(K*phi(z)) mod z = 1 <-> multiply by x x^(K*phi(z)+1) mod z = x (**) That means a number x exponentiate to an integer multiple of phi(z)+1 returns itself. z - prime From (*) equation and Euler’s theorem, we have: x^(z-1) mod z = 1 x^z mod z = x Far now we proved nothing about RSA. Now it is time to link together all those equations. Let’s think of two prime numbers p , q . Replace z with p*q . phi(p*q) = phi(p) * phi(q) = (p-1)*(q-1), from (*) equation. x^phi(p*q) mod p*q = 1 x^((p-1)*(q-1)) mod p*q = 1 (***) From equation (**) with K = 1 and equation (***) we have: x^(phi(z)+1) mod z = x x^((p-1)*(q-1)+1) mod p*q = x That means we can find (p-1)*(q-1)+1 only if we can factorize the p*q number. Consider x as a message. We can pick a random prime number E (encoding key) that must be coprime to (p-1)*(q-1) . Then we calculate D (decoding key) as: E^(-1) mod (p-1)*(q-1) where D is inverse mod. Now we can use RSA algorithm as we have the public-key ( E ) and the private-key ( D ): ciphertext = plaintext^E mod p*q plaintext = ciphertext^D mod p*q Attacks against RSA is based on the weakness of exponent E and small ciphertext if the result ciphertext^E < p*q . It is recommended to use large key size of encryption.

Hash functions So far we are glad that we can protect the content of messages we exchange over an untrusted connection, but we never addressed the problem of content integrity. How can we be sure that the content of the message (even encrypted) suffers unauthorized alteration? A hash function or as we call ‘a one-way function’ or ‘irreversible function’ or ‘non-bijective function’ is a function that takes as input a message of variable length and produces a fixed-length output. For example, calculate the checksum of the following string using different hash functions: Input string: hello World MD5 : 39d11ab1c3c6c9eab3f5b3675f438dbf SHA1 : 22c219648f00c61e5b3b1bd81ffa8e7767e2e3c5 SHA256 : 1ca107777d9d999bdd8099875438919b5dca244104e393685f... What if we modify only a SINGLE letter from the original message? For example ‘E’: Input string: hEllo World MD5 : b31981417dcc9209db702566127ce717 SHA1 : b7afc9fde8ebac31b6bc482de96622482c38315c SHA256 : 98fe983aad94110b31539310de222d6a962aeec73c0865f616... As you can see the result is completely different. The big problem of hash functions is that susceptible to collision: tibi@tbarbu-pc:~/hash_collision$ ls -lH message* -rw-r--r-- 1 tibi tibi 128 2012-09-12 17:20 message1 -rw-r--r-- 1 tibi tibi 128 2012-09-12 17:21 message2 tibi@tbarbu-pc:~/hash_collision$ diff -y -W10 --suppress-common-lines \ <(hexdump -e '/1 "%02X

"' message1)\ <(hexdump -e '/1 "%02X

"' message2) E7 | 67 0F | 8F 23 | A3 44 | C4 B4 | 34 7F | FF tibi@tbarbu-pc:~/hash_collision$ md5sum message1 message2 1e934ac2f323a9158b43922500ca7040 message1 1e934ac2f323a9158b43922500ca7040 message2 As you can see two files with different content – only 6 bytes in this case had to be changed – have the same MD5 checksum. We call this hash collision.

Digital certificate We have been talking for a long time about encryption and decryption but what if our cryptosystem is secure enough though we can not be sure about the real identity of the person he/she pretends to be? Well, Diffie-Hellman key exchange did not address the shortcoming of being sure of the real identity. Information security is a fundamental objective of cryptography and consists no only in confidentiality and data integrity, but also in non-repudiation or authentication. Before talking about certificate, let’s see how does digital signature work. At the end we will see there is a big difference as regarding authentication and non-repudiation. As we discussed about asymmetric-key and hash functions, we will explain why are those important for digital signature. An analog to digital signature is the handwriting signature. Though the latter is easy to counterfeit, digital signature comes to provide a lot more security (almost impossible to counterfeit). Let’s see how it works: Step 1: First of all you have to generate a pair of keys: a public and a private key. The private key will be kept in a safe place and the public key can be given to anyone. Suppose you want to compose a document containing the message M . Step 2: Compute digest. You will use a hash function to compute a digest for you message. Step 3: Compute digital signature. Using you private key you will sign the hash result (digest). Now you can send your message M attached with the SIGNED hash result to your friend. Step 4: Verifying digital signature. Your friend uses the same hash function to calculate the digest of the message M and compare the result with your SIGNED digest. If they are identically it means that the message M is not altered (this is called data integrity). Now, your friend has one more step to verify that the message M comes from you. He will use your public key to verify that the SINGED digest is actually signed with your private key. Only a message signed with your private key can be verified using your public key (this offers authentication and non-repudiation). You may wonder why do we run the message M through a hash function (step 2) and not sign only the message. Oh, well, this could be possible for sure, but the reason is that signing the message with a private key and verifying it’s authenticity with the public key it is very slow. Moreover, it produces a big volume of data. Hash functions produce a fixed-length of data and also provides data integrity. There is one problem: How can your friend be sure which is your public key? He can’t, but a digital certificate CAN! The only difference between a digital signature and a digital certificate is that the public key is certified by a trusted international Certifying Authority(CA). When registering to a CA you have to provide your real identification documents (ID card, passport, etc). Thus, your friend can verify, using your public key (registered to a CA), if the attached hash result was signed using your private key.