In this article, I am going to talk about the behind-the-scene moments of an HTTPS session. This post will not be a deep dive into the topic, but rather, will aim to provide the reader a high level understanding of the topic, so they are better able to grasp how the protocol works.
The job of HTTPS is chiefly two-fold:
- Prove the authenticity of a server (website) to a client (browser).
- Provide a secure channel of communication between a client and a server.
At the heart of this protocol lies the encryption/decryption of the data being transferred.
Why encrypt data?
Because if not encrypted, it will be susceptible to eavesdropping by intruders. All your private information (credit card details, login credentials, chat conversations, etc.) will not be private anymore, if transported unencrypted (plaintext).
How is HTTP different from HTTPS?
HTTP lies in the Layer 7 of the OSI model. Simply put, it provides a mechanism to clients/servers to interact with each other over the world wide web.
HTTPS is nothing but HTTP using TLS (which is an evolution of SSL – Secure Socket Layer). TLS is what differentiates HTTPS from HTTP.
Encryption (a slight detour)
The basic premise of the working of HTTPS is that the data exchanged between applications over the internet will be encrypted using some encryption algorithm. Encryption can broadly be categorised into two groups:
- Symmetric encryption – Using the same key for encryption/decrpyption.
- Asymmetric encryption – Using a public/private key-pair (essentially two keys). Public key is (in most cases) used for encryption of the plaintext, and the private key is used to decrypt the generated ciphertext.
Asymmetric algorithms are more complex, more computationally expensive and much slower compared to symmetric ones. (Also, the key sizes differ by quite a margin – a 256 bit symmetric AES vs a 2048 bit asymmetric RSA).
Therefore, in an HTTPS session, a combination of the two is used. Symmetric algorithm is used for the actual encryption/decryption of the message. Asymmetric algorithm is used to transfer this symmetric key (well, not the exact key, as you’ll see below) between the two communicating systems.
What prevents me from carrying out a man-in-the-middle attack by issuing my own public key?
Why would you trust my public key?
Lets bring the Certificate Authority into the picture. The job of a CA is to provide digital certificates to entities to prove the ownership of (among other things) a public key, so that nobody else can fake it (and claim to be that entity).
Using the public key (which is part of the digital certificate), I’ll encrypt whatever needs encryption and send it over the communication channel. The beauty of asymmetric algorithms is that data encrypted using the public key can only* be decrypted using the corresponding private key (which belongs to the actual owner of the public/private key-pair). So, I will remain assured that only the intended recipient will be able to decrypt the message.
*NOTE: It isn’t that the ciphertext cannot be decrypted without the private key, but just that it would be computationally very hard to break the same.
Why would you encrypt a symmetric key using asymmetric encryption?
Same reason as the above – only the intended recipient will be able to retrieve the encrypted symmetric key by decrypting it with the private asymmetric key.
Now that we seem to be heading somewhere, lets take a look at how an actual HTTPS session begins.
Before a TLS session actually begins, a handshake is performed between the client and the server. A lot of things happen behind the scenes during this handshake. (Do give this awesome article a read if you are interested in going under the hood of a typical TLS handshake.)
In brief, the following things occur:
- Client informs the server of its supported cipher suites, and the sever chooses one.
- Server sends over its certificate (issued by a CA) and a random value (this will be used later).
- Client authenticates the server’s certificate.
- Client generates a Pre-Master Secret, encrypts the same using the server’s public key (from the certificate), and sends across the same to the server.
- Using the random value (sent by the server) and the Pre-Master Secret, both, the client and the server, generate the same Master Secret.
- Master Secret is used (by both, client and server) to generate the necessary session keys (for encrypting messages and for hashing – MAC).
Now that the TLS handshake is complete, the client and the server can begin exchanging messages by using the generated session keys. Each time an encrypted message is exchanged between the client and the server, the corresponding hash is also shared (typically done using HMAC).
Simply put, HTTPS is nothing but an encrypted/authenticated form of HTTP, and the encryption of messages is performed using symmetric algorithms (the symmetric key is exchanged using an asymmetric algorithm – which is part of the digital certificate).
One question still remains. How do digital certificates work and why should you trust a CA? Seems like another blog post is in the offing!