This is part 3 of a blog series on encrypting data at rest in the Cloud. My first post argued for why data encryption should be a critical component of any company’s security posture. I then followed up with a blog post that walked through the basics of encryption. Moving on, we want to focus on Key Management, including Key Generation. In part 4 of this series I provide a detailed overview and comparison of how data encryption at rest is implemented at the big three public cloud vendors – Amazon Web Services, Microsoft Azure and Google Cloud Platform.
Previously, I mentioned Kerkchoffs’ Principle which can be summarized as “the security of any cryptosystem depends not on the secrecy of the cipher but the secrecy of the keys.” You can use AES-256 or RSA-2048 to encrypt your data so that, for all intents and purposes, the ciphertext would be impossible to ever crack. But that doesn’t matter if your keys are lost or are compromised. It’s like having a state of the art anti-burglar system for your home but leaving the house keys and alarm codes under the welcome mat.
Cryptographic or encryption key management can be understood as managing the full lifecycle of keys from creation to destruction, including the storage and protection of those keys. There are a number of activities related to key management including:
- Key Generation – This is the creation of encryption keys using a pseudo random number generator (More details on this in the second half of this post). Every key that is created should be tracked and audited.
- Key Storage – Once the keys are generated, they need to be securely stored and backed up so that they cannot be lost, tampered with or accessed without proper authorization. For password/passphrase based encryption, the passwords and passphrases must be securely stored as well.
- Key Activation – A key can be activated at creation time or at a later time manually or automatically. If multiple copies of the key are created and activated, they needs to be stored and tracked..
- Key Distribution – Once activated, there needs to be a way for authorized applications, systems and users to request and to retrieve keys for encryption and decryption.
- Key Rotation – It is recommended that keys be rotated on a periodic basis. Key rotations should be able to occur on an established schedule or manually by an administrator. When a new key is created and distributed to replace an old key, the old key must be deactivated and retained so it can used for decryption purposes.
- Key Expiration – An encryption key can be created that will only be used for a specified period. An example of this are one-time encryption keys that are generated for envelope encryption. When a key expires, it must be retained for decryption purposes.
- Key Revocation – A key may need to be revoked, if it has been compromised, so it can no longer be used for encryption or decryption. But the key may need to be retained if it has already been used for encryption. In some cases, a revoked key may need to be reactivated for a short period of time to decrypt data.
- Key Destruction – In some cases, a key may actually need to be completely removed. In that case, every instance of that key, must be deleted.
While it is theoretically possible to handle these key management tasks manually, the only way to scale and to ensure CIA (Confidentiality, Integrity, Availability) is to automate through a robust key management system and not just rely on keys written on a piece of sticky note or in a text file on a laptop. This is particularly the case in enterprise environments and industries with highly sensitive data where it is likely that there will be a hierarchy of keys encrypting other keys that will need to be managed
In general, there are three type of Key Management Systems (KMS):
- Software-based KMS – This can be software downloaded and installed on a set of physical or virtual machines. It can be also be a virtual appliance with the KMS software pre-installed and configured. A software solution can be very low-cost, especially if you choose to go the open source route. It is also easy to try out as compared with a hardware solution. The KMS software can be baked into an application or be part of a storage or backup appliance. A popular standalone KMS is Hashicorp Vault.
- Hardware-Based KMS – Today when most people think of hardware key management, they assume a Hardware Security Module (HSM). A HSM is specialized hardware designed specifically for cryptography and key management. They are tamper proof and hardened to ensure that keys are properly secured. An HSM could be used in conjunction with KMS software or the KMS software could be embedded into the HSM. There are a number of HSM vendors including Gemalto SafeNet and Thales with their Vormetric Data Security Manager.
- Cloud-Based KMS – The big three public cloud providers – Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), all offer encryption and key management as a managed service. The advantage of these services are that they operate on a pay-as-you-go model, you don’t have to managed the underlying hardware or software and they are deeply integrated with their other cloud services. We will be going into detail on the offerings from all three providers in my next blog post.
Now that we’ve walked through an overview of key management, let’s dive deeper into Key Generation.
In my previous Encryption Primer post, we spoke extensively about the use of keys and walked through example of how to encrypt and decrypt data using both symmetric and asymmetric keys. Now we want to delve into how keys are actually generated.
However, we can’t really talk about encryption and key generation without first tackling the topic of entropy. In Cryptography, entropy is the measure of uncertainty or randomness in a system. Entropy refers to the randomness of the cipher and of the encryption key. The higher the entropy, the more random the results and the more secure the ciphertext.
Since encryption depends so heavily on entropy, you need a component that can reliably generate random bits which can then be used to generate keys and to create ciphertext. This requires an entropy source provided by a Random Number Generator (RNG) and/or a Pseudo Random Number Generator (PRNG), also known as a Deterministic Random Number Generator (DRNG). Without getting into too much detail, this is how it works:
- The RNG samples data from the analog world, such as temperature measurements, hard drive activity and mouse clicks, as its source of entropy for generating random bits
- The PRNG/DRNG takes these random bits from the RNG and stores them in a memory buffer called the entropy pool
- The PRNG/DRNG applies a mathematical algorithm to the content of the entropy pool to generate pseudo random bits that can be used for key generation and data encryption.
The reason that an RNG needs to sample analog data is that computers aren’t capable of generating truly random numbers on their own. They need a truly random source of bit to use from the “real world.” The downside of a RNG is that they need a continual entropy source such as keyboard and hard drive activity. When an RNG runs out of entropy, it will no longer be able to generate random numbers for creating keys and encrypting data. Two scenarios where this may occur, for example, is a virtual machine that is abstracted from the underlying hardware and when an a machine is first booted up and do not have enough random data collected to feed a RNG.
Typically, a PRNG is what is used to generate encryption keys. It’s primary advantage is that while the analog entropy source for a RNG can theoretically run out, a PRNG can continually generate a stream of pseudo random bits using just a few random bits in the entropy pool.
A common implementation of entropy can be found in all modern operating system. Linux and macOS, for example, samples analog events such as hardware interrupts from mouse clicks, keyboard activity, hard drive activity, etc., to generate random bits that are fed to the entropy pool of the OS kernel.
Below, I am connected to a Linux box and I can find out how much entropy has been collected so far by polling /proc/sys/kernel/random/entropy_avail. That number will shrink and grow as the random bits are used and as the server samples more analog sources. We can see what is actually in the entropy pool by polling /dev/urandom. If the bits in the pool do not seem that random, that is because I am displaying it in hexadecimal so it is easier to read in the terminal.
We’ll see how entropy is used in cryptography as we talk specifically about key generation.
Symmetric Key Generation
There are a number of methods for generating a symmetric key but we’ll focus specifically here on two methods:
- Direct Key Generation – A key generation algorithm is invoked that uses random bits coming from a PRNG as input. Typically, an application that is encrypting data would use direct key generation as part of its encryption processes. Below is an example of how an AES-128 symmetric key can be generated using openssl:
This key can then be used for encryption/decryption.
- Password/Passphrase Based Encryption – In this case, a user supplies a password or passphrase which serves as input to a Key Derivative Function that will generate a symmetric key. Below is an example of how an AES-128 password based symmetric key can be generated using openssl with the password 123456:
The salt value above is used to randomize the generated keys by adding random bits to the password during the generation process. This way a different key is generated even when the same password is used and creates entropy in the key generation process even if a password/passphrase with little randomness is used. The initiation vector (iv) is one-time use only string that helps ensure the ciphertext is always unique even if the same key is used to encrypt the same plaintext multiple times.
A malicious attacker could try to crack the ciphertext by guessing the symmetric key in a brute force attack. However, It is unlikely they would have enough compute cycles to do so successfully if you are using something like an AES-128 or AES-256 key. The easier approach would be to do a brute force or dictionary attack to guess the password/passphrase and use it to derive the key needed to decrypt the ciphertext. Below is an example of a plaintext that has been encrypted using a password-derived key and how the resulting ciphertext can be decrypted using the same password:.
For this reason, it is recommended and in some cases mandated, that a password/passphrase be used that has sufficiently high entropy. There are a number of hardware and software RNG and PRNG based tools that can generate random passwords/passphrases. You can do this, for example, using openssl:
Given the risk in having a weak password/passphrase, you may ask why would anyone use it for generating encryption keys? The reason comes down to convenience and the lack of adequate key management. If you don’t have a robust way to store and distribute keys, then the simplest approach is to use an easily remembered and easy to type password or passphrase. That’s why good key management is so important because it lets you encrypt data more securely using random password/passphrases and keys.
Asymmetric Key Generation
Unlike symmetric keys which are derived directly from random bits provided by a PRNG/DRNG, an asymmetric key pair is created by taking the random bits provided by the PRNG and using them as seed values for a key generation algorithm based on the factoring and multiplication of very large prime numbers. This algorithm generates both the private key and the public key.
Below is an example of how you can use openssl to generate a RSA-1024 private key which is outputted to a pem file called private.pem. We can then use that file with openssl to generate the associated public key and output it to another pem file called public.pem.
The public key can be made available to anyone or any application to use for encrypting data that can only be seen by the owner of the private key. The private key needs to be stored and protected, preferably in a sound key management system, as discussed earlier.
Now that we have a basic understanding of data encryption and key management, we will review and compare, in our next blog post, the encryption offering from AWS, Azure and GCP. Stay tuned.
[…] as envelope encryption and key encryption key, I strongly suggest reading my encryption primer and key management blog posts. They will provide a foundation for understand the concepts discussed in this […]