Data Encryption in the Cloud, Part 2: Encryption 101

https://static.pexels.com/photos/207580/pexels-photo-207580.jpeg

Data encryption was big news in 2017 but not in a way IT professional would have hoped. It’s ascendancy in the public conscience came about due to the proliferation of a type of malware attack called ransomware. This attack leverages standard data encryption technology to digitally hold user and company data hostage for ransom. So ironically, a solution designed to thwart malicious actors has actually become a valuable tool in the utility belts of these same actors.

But the effectiveness of data encryption in ransomware also highlights why it should be a cornerstone of any company’s security posture. Given its capability to make data unreadable to those who do not have the key to unlock it, encryption can actually be a powerful tool for protecting data confidentiality. We are now moving to a world where every piece of data will be encrypted, whether in transit between devices or stored at rest in your servers. Therefore, it is incumbent on every IT professional to have a good understanding of encryption technologies and know how to implement it in their companies.

Previously, we covered the why of data encryption. We will focus now on the what and the how of encryption with follow-up posts on how it is specifically implemented in the big three public cloud vendors. Note that while the series is focused on encryption of data at rest, the principles discussed will broadly cover encrypting data in transit.  A good place to begin is with defining some key terms,

Defining Our Terms

Encryption can be defined as the use of mathematical functions/algorithms to make data unreadable in order to disguise it from unauthorized actors. The data can be files, texts, code, images, etc. Decryption transforms the disguised data back into readable data. There are several components related to the encryption/decryption process:

  • Plaintext is the unencrypted data or message. It can refer to the data prior to encryption or after it has been decrypted.
  • Ciphertext is the is the encrypted data or message. Here is an example of a plaintext and its associated ciphertext.

Screen Shot 2018-02-08 at 1.20.02 PM

  • A Cipher Is the mathematical function/algorithm that does the actual encryption and decryption.
  • A Key is a value that serves as input to the cipher and must used to decrypt ciphertext

In short, the encryption process takes a key (K) and inputs it into the cipher (CI) along with the plaintext (P) to create the ciphertext (C).

The decryption process is essentially the inverse with the key (K) being used as an input into the cipher (CI) to transform ciphertext (C) back to plaintext (P).


One of the earliest known analog ciphers was the Caesar Cipher. It was reportedly used by Julius Caesar during the reign of the Roman empire. It used a simple cipher and a fixed key.

  • Cipher:
    • Take each letter and encrypt it by shifting the letter X positions in the alphabet where X equals the key value.
    • If the shift reaches past the letter Z, wrap back around to the beginning of the alphabet.
  • Key = 3

For example, the plaintext ZOOM would be encrypted to the ciphertext CRRP.
A better cipher was the Vigenere Cipher which was created in the 16th century in Italy. It improved on the Caesar Cipher by adding the ability to input a variable key and introducing a more complex algorithm. The cipher can be summarized as follows:

  • Break the message into groups where the number of letters in each group is equal to the number of letters in the key.
  • For each letter in a group, shift it by X where X equals the numeric position of the corresponding letter of the key in the alphabet
  • If the shift reaches past the letter Z, wrap back around to the beginning of the alphabet
  • Remove all spaces from the message.

The encryption process for the plaintext, BLUE MAN GROUP using the key CAFE would work like this:

  • Break the message into groups of four – BLUE MANG ROUP
  • Convert the key into its relative position number in the alphabet

C=3
A=1
F=6
E=5

  • Convert the plaintext by shifting every letter as specified by the key
B L U E M A N G R O U P
+3 +1 +6 +5 +3 +1 +6 +5 +3 +1 +6 +5
E M A J P B T L U P A U
  • Concatenate the message into the final ciphertext:  EMAJPBTLUPAU

While early data encryption relied on the secrecy of the cipher to maintain the security of the system, that has changed over time as it has been shown that any cryptosystem, relying on the secrecy of its algorithm, can be cracked. This change in view was articulated by Dutch cryptographer, Auguste Kerckhoffs. Kerckhoffs’ Principle states that “A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.” Today, cipher algorithms are largely open standard and it is generally acknowledged that the security of any cryptosystem relies on the strength of the cipher and of the encryption key.

In fact, modern ciphers are so much more advanced and use such complex algorithms that most ciphertexts can only be reasonably cracked using sufficient computing power. Ciphers such as AEs-256 and RSA-2048 can potentially generate ciphertext that takes more time to break than the current age of the universe.

Encryption Methods

Moving on, we are going to look at three common encryption methods. For each method, I will illustrate by using a utility called openssl. If you are interested in getting into more details than I have time to do in this post, I recommend playing with openssl yourself. It is pre-installed with macOS and generally in every Linux distribution.

Symmetric or Secret Key Encryption

The same key is used for both encrypting and decrypting data. When a user or an application encrypts a piece of data using a symmetric key,  a user or application that wants to read that data must decrypt using the same key.

A common analogy for symmetric key encryption is the use of a lockbox where a secret message is stored in a box that is locked by the sender with a key. The recipient can only access the message by unlocking the box using the same key as the sender or an exact copy of that key.

Here’s an example of how to create a plaintext file and then encrypt it with a 128-bit symmetric key; the openssl command should be fairly easy to understand. You may notice that the encryption command includes an -iv switch followed by a random string of characters. The Initiation Vector injects random characters into the encryption process to add more randomness to the ciphertext and to make it more secure.

Screen Shot 2018-02-08 at 2.07.08 PM

To decrypt the ciphertext, you just use the decrypt -d option and supply the same symmetric key used to encrypt the original plaintext file.

Screen Shot 2018-02-08 at 2.08.09 PM

In production, an application would likely use an encryption software development kit (SDK) or call some encryption API to encrypt data it has written prior to it being stored and then to decrypt it when the data is retrieved.

The challenge with the symmetric key encryption approach is around key storage and management. The secret key has to be carefully protected and distributed securely to all parties that need to encrypt or decrypt data. If the key is lost, then the data is unreadable and if a malicious actor gains access to the key, they will have full access to the data.

The most widely used symmetric key encryption standard today is Advanced Encryption Standard (AES). AES can use keys of different sizes including 128-bit, 192-bit and 256 bit with AES-256 being the most commonly used key size for data encryption at rest. The AES cipher uses a complex algorithm with a series of linked mathematical functions called Substitution-Permutation Network (SPN). I am not going to get into the details of how this works other than to note that the plaintext is transformed through multiple rounds of bit-level substitution and permutation, using the key as input. The size of the key also determines how many rounds the plaintext goes through with 14 rounds being the total for AES-256. Properly implemented, It would require a supercomputer >1 quadrillion years to brute-force decrypt the resulting ciphertext, without the key.

Asymmetric or Public Key Encryption

A different key is used for encryption than for decryption. The encryption key is known as the public key and as the name implies, is typically made publicly available to anyone who wants to send an encrypted message to the owner of the public key. The public key is paired with a private key which only the owner should have access to and is used to decrypt any message that was encrypted with the associated public key.

A common analogy for asymmetric key encryption is the use of mailboxes for postal mail. The secret message can dropped into a publicly located mailbox through the freely accessible mail slot. However, only the postman who has the key can open the mailbox and access the message.

Here’s an example of how to encrypt a plaintext file using a public key. Note that both the public and private keys are saved in the the pem format used by RSA.

Screen Shot 2018-02-08 at 2.10.40 PM

To decrypt the ciphertext, you just use the decrypt option and supply the private key that is associated with the above public key.

Screen Shot 2018-02-08 at 2.12.02 PM

What makes asymmetric key encryption compelling is that while you can easily derive the public key from the private key, it is computationally very difficult to derive the private key from the public key. This solves the issue of how to distribute the encryption key securely since it can be made publicly available without fear that data can then be compromised or that the private key may be cracked using the public key.

The biggest downside is that it is computationally expensive to encrypt and decrypt the data using asymmetric key encryption. In fact, symmetric encryption is ~1000x faster and requires 1/10th the key size of asymmetric encryption to encrypt the same data with the same level of security.

The most widely used asymmetric key encryption standard today is Rivest-Shamir-Adleman (RSA). RSA can use keys of different sizes with RSA-1024 and RSA-2048 being the most commonly used key size for data encryption at rest. Without getting into detail about how the RSA cipher works, it essentially runs the plaintext through a series of mathematical functions based on the values of the public key. The mathematical function involves the factoring and multiplication of very large prime numbers. Readers will note that the available key size for RSA is much larger than that of AES. However due to the different algorithms that are used for their respective ciphers, the security of each standard is relatively equivalent. The table below shows the relative security of AES to RSA based on key size:

Symmetric (AES) Key Length Asymmetric (RSA) Key Length
80 bits 1024 bits
112 bits 2048 bits
128 bits 3072 bits
256 bits 15360 bits

Envelope Encryption

A hybrid approach is used that can leverage both Symmetric and Asymmetric Key Encryption. With this approach, the following steps are taken to encrypt data:

  • A one-time symmetric key, called a Date Encryption Key (DEK) is generated and used to encrypt a piece of data.
  • A separate symmetric key or public key, called a Key Encryption Key (KEK), is generated and used to encrypt the DEK. The KEK can leverage either Symmetric or Asymmetric encryption.
  • The encrypted DEK is appended to or placed alongside the ciphertext and stored together.

Below is a diagram of how Amazon Web Services leverages envelope encryption. Note that they use a “Master Key” which functions as a Key Encryption Key.

For the decryption process, the reverse is done:

  • An application retrieves the ciphertext and associated DEK
  • The application retrieves the KEK if it is a symmetric key or the associated private key if the KEK is an asymmetric public key
  • The encrypted DEK is decrypted using the KEK if it is a symmetric key or using the associated private key if the KEK is an asymmetric public key
  • The ciphertext is then decrypted using the DEK

Envelope encryption has a number of advantages:

  • Easier data key protection – You don’t have to worry about where to securely store every data key since the keys are encrypted. They can be stored along with their encrypted data. You just have to focus on securing a smaller set of key encryption keys.
  • Easier key management – With a smaller set of key encryption keys to manage, you can choose to rotate/change just the KEKs and not have to rotate the DEKs and to re-encrypt your data.
  • Combines the strengths of both symmetric and asymmetric key encryption methods – I mentioned earlier that while asymmetric key encryption makes for easier key management, it is not as efficient as symmetric key encryption. In this case, you can use a symmetric key to encrypt the data and use a public key to encrypt the data encryption key..

Having a basic understanding of encryption will help you to evaluate different data encryption technologies and implementations so you can better secure yours and your company’s assets. In part 3 of this series, we delve into the topic of Key Management. It is no exaggeration to say that key management is the most important component of any data encryption strategy. You may use the most difficult cipher to crack in the world but it doesn’t matter if your encryption keys are weak or falls into the wrong hands due to improper management.

In part 4 of this series I provide a detailed overview and comparison of how data encryption at rest is implemented at the big three public cloud vendors – Amazon Web Services, Microsoft Azure and Google Cloud Platform.

Leave a comment