Design a Content Distribution Network

A CDN (Content Distribution Network) is a cost-effective and geographically distributed file storage service that is designed to replicate files across its multiple data centers to serve static content to a large number of geographically distributed users quickly, serving each user from the data center that can serve them fastest. There are secondary benefits, such as fault-tolerance, allowing users to be served from other data centers if any particular data center is unavailable.

We use a CDN if our geographically distributed userbase can benefit from the extensive network of data centers that a CDN provides. The reasons to consider using a CDN are:

Lower latency—A user is served from a nearby data center, so latency is lower. Lower latency may also carry other benefits, such as improving SEO (search engine optimization). Search engines tend to penalize slow web pages both directly and indirectly.
Scalability—With a third-party provider, we do not need to scale our system ourselves. The third party takes care of scalability.
Lower unit costs—A third-party CDN usually provides bulk discount, so we will have lower unit costs as we serve more users and higher loads. It can provide lower costs as it has economies of scale from serving traffic for many companies and spread the costs of hardware and appropriately skilled technical personnel over this larger volume.
Higher throughput—A CDN provides additional hosts to our service, which allows us to serve a larger number of simultaneous users and higher traffic.
Higher availability—The additional hosts can serve as a fallback should our service’s hosts fail, especially if the CDN is able to keep to its SLA. Being geographically distributed across multiple data centers is also beneficial for availability, as a disaster that causes an outage on a single data center will not affect other data centers located far away.

Some disadvantages of using a CDN include the following:

The additional complexities of including another service in our system. Examples of such complexities include:
- An additional DNS lookup
- An additional point of failure
A CDN may have high unit costs for low traffic. There may also be hidden costs, like costs per GB of data transfer because CDNs may use third-party networks.
Migrating to a different CDN may take months and be costly. Reasons to migrate to another CDN include:
- A particular CDN may not have hosts located near our users. If we acquire a significant user base in a region not covered by your CDN, we may need to migrate to a more suitable CDN.
- A CDN company may provide poor service, such as not fulfilling its SLA, which affects our own users; provide poor customer support; or experience incidents like data loss or security breaches.
There may be security and privacy concerns in storing your data on a third party. We can implement encryption at rest so the CDN cannot view our data, which will incur additional cost and latency (from encrypting and decrypting data).
The flip side of allowing a third-party to ensure high availability is that if technical problems occur with the CDN occur, we do not know how long it will take for the CDN company to fix them.
The configuration management of a CDN or any third-party tool/service in general may be insufficiently customizable for certain of our use cases, leading to unexpected problems.

Requirements

Functional requirements are simple. Authorized users should be able to create directories, upload files with 10 GB file size limit, and download files. Non-functional requirements are :

Scalable—The CDN may scale to support petabytes of storage and download volumes of terabytes per day.
High availability—Four or five 9s uptime required.
High performance—A file should be downloaded from the data center that can serve it fastest to the requestor. However, synchronization may take time, so upload performance is less important, as long as the file is available on at least one data center before synchronization is complete.
Durable—A file must not be corrupted.
Security and privacy—The CDN serves requests and sends files to destinations outside the data center. Files should only be downloaded and uploaded by authorized users.

CDN Authentication

The purpose of authentication is to verify a user’s identity, while the purpose of authorization is to ensure that a user accessing a resource (such as a file in our CDN) has permission to do so. These measures prevent hotlinking, in which a site or service accesses CDN assets without permission. Our CDN incurs the costs of serving these users without getting paid for it, and unauthorized file or data access may be a copyright violation.

CDN authentication and authorization can be done with either cookie-based authentication or token-based authentication. A token-based authentication uses less memory, can use third-party services with more security expertise, and allow fine-grained access control.

We refer to a CDN customer as a site or service that uploads assets to a CDN and then directs its users/clients to the CDN. We refer to a CDN user as a client that downloads assets from a CDN. The CDN issues each customer a secret key and provides an SDK or library to generate access tokens from the following information. The access token generation process is as follows:

The user sends an authentication request to the customer app. The customer app may perform the authentication using an authentication service. (Some authentication protocols are Simple Login and OpenID Connect.)
The customer app generates an access token using the SDK, with the following inputs:
- Secret key—The customer’s secret key.
- CDN URL—The CDN URL that the generated access token is valid for.
- Expiry—The access token’s expiry timestamp, after which the user needs a new access token.
- Referrer—This is a Referrer HTTP request header.
- Allowed IPs—This may be a list of IP address ranges that are authorized to download CDN assets.
The customer app stores the token and then returns this token to the user. For additional security, the token can be stored in an encrypted form.
Whenever a customer app gives a user a CDN URL, and the user makes a GET request for this CDN asset, the GET request should be signed with the access token. This is called URL signing. An example of a signed URL is http://12345.r.cdnsun.net/photo.jpeg?secure=DMF1ucDxtHCxwYQ. “secure=DMF1ucDxtHCxwYQ” is a query parameter to send the access token to the CDN. The CDN performs authorization. It verifies that the user’s token is valid and that the asset can be downloaded with that token, as well as with the user’s IP or country/region. Finally, the CDN returns the asset to the user.
When a user logs out, the customer app destroys the user’s token. The user will need to generate another token when logging in.

Sequence diagram of token generation process, followed by using the token to request CDN assets, and destroying the token upon user logout.

High-level architecture

Below figure shows high-level architecture of our CDN. A user request is handled by an API gateway, which is a layer/service that makes requests to various other services. These include SSL termination, authentication and authorization, rate limiting, and logging to a shared logging service for purposes such as analytics and billing. We can configure the API gateway to look up the metadata service to determine which storage service host to read or write to for any user. If the CDN asset is encrypted at rest, the metadata service can also record this, and we can use a secrets management service to manage the encryption keys.

Assets are stored on a storage service, and the metadata service keeps track of the storage service hosts and file directories that store each asset. If the assets are encrypted, we use a secrets management service to manage the encryption keys. If the requested asset is missing, the API gateway retrieves it from the origin (i.e., our service; this is configured in the metadata service), adds it to the storage service, and updates the metadata service.

We can generalize the operations into reads (download) vs. writes (directory creation, upload, and file deletion). For simplicity of the initial design, every file can be replicated to every data center. Otherwise, our system will have to handle complexities such as:

The metadata service will track which data centers contain which files.
A file distribution system that periodically uses user query metadata to determine the optimal file distribution across the data centers. This includes the number and locations of replicas.

Storage service

The storage service is a cluster of hosts/nodes which contain the files. We should store files in the hosts’ filesystems. Files should be replicated for availability and durability, with each file assigned to multiple (e.g., three) hosts. We need availability monitoring and a failover process that updates the metadata service and provisions replacement nodes. The host manager can be in-cluster or out-cluster. An in-cluster manager directly manages nodes, while an out-cluster manager manages small independent clusters of nodes, and each small cluster manages itself.

In-cluster

We can use a distributed file system like HDFS, which includes ZooKeeper as the in-cluster manager. ZooKeeper manages leader election and maintains a mapping between files, leaders, and followers. An in-cluster manager is a highly sophisticated component that also requires reliability, scalability, and high performance. An alternative that avoids such a component is an out-cluster manager.

Out-cluster

Each cluster managed by an out-cluster manager consists of three or more nodes distributed across several data centers. To read or write a file, the metadata service identifies the cluster it is or should be stored in and then reads or writes the file from a randomly selected node in the cluster. This node is responsible for replication to other nodes in the cluster. Leader election is not required, but mapping files to clusters is required. The out-cluster manager maintains a mapping of files to clusters.

Common operations

When the client makes a request with our CDN service’s domain (e.g., cdnservice.flickr.com) rather than an IP address, GeoDNS assigns the IP address of the closest host, where a load balancer directs it an API gateway host.

Downloads

For a download, the next step is to select a storage host to serve this request. The metadata service aids in this selection by maintaining and providing the following metadata. It can use Redis and/or SQL:

The storage service hosts which contain the files. Some or all the hosts may be on other data centers, so that information must be stored, too. Files take time to be replicated across hosts.
The metadata service of each data center keeps track of the current load of its hosts. A host’s load can be approximated by the sum of the sizes of the files it is currently serving.
File ownership and access control.
Health status of hosts.

Below is a sequence diagram of the steps taken by the API gateway to download a file, assuming the CDN does contain this asset.

Query the rate limiting service to check if the request exceeds the client’s rate limit. We assume that rate limiter allows the request through.
Query the metadata service to get the storage service hosts that contain this asset.
Select a storage host and stream the asset to the client.
Update the metadata service with the load increase of the storage host. If the metadata service records the asset’s size, this step can be done in parallel with step 3. Otherwise, the API gateway will need to measure the asset’s size, to update the metadata service with the correct load increase.

The CDN may not contain this asset. It may have deleted it for reasons including the following:

There may be a set retention period for assets, such as a few months or years, and this period had passed for that asset. The retention period may also be based on when the asset was last accessed.
A less likely reason is that the asset was never uploaded because the CDN ran out of storage space (or had other errors), but the customer believed that the asset was successfully uploaded.
Other errors in the CDN.

If the CDN does not have the asset, it will need to download it from the origin, which is a backup location provided by the customer. This will increase latency. It will then need to store it by uploading it to the storage service and updating the metadata service. To minimize latency, the storage process can be done in parallel with returning the asset to the client.

What if we needed to store assets in encrypted form? We can store the encryption keys in a secrets management service (which requires authentication). When an API gateway host is initialized, it can authenticate with the secrets management service, which will pass the former a token for future requests. When an authorized user requests an asset, the host can first obtain the asset’s encryption key from the secrets management service, fetch the encrypted asset from the storage service, decrypt the asset, and return it to the user. If the asset is large, it may be stored in multiple blocks in the storage service, and each block will need to be separately fetched and decrypted.

File upload and file deletion

A file is identified by its ID, not its content. A file can be GB or TB in size. What if file upload or download fails before it is complete? It will be wasteful to upload or download the file from the beginning. We should develop a process similar to checkpointing or bulkhead to divide a file into chunks, so a client only needs to repeat the upload or download operations on the chunks that have not completed. Such an upload process is known as multipart upload, and we can also apply the same principles to downloads, too.

We can design a protocol for multipart uploads. In such a protocol, uploading a chunk can be equivalent to uploading an independent file. For simplicity, chunks can be of fixed size, such as 128 MB. When a client begins a chunk upload, it can send an initial message that contains the usual metadata such as the user ID, the filename, and size. It can also include the number of the chunk about to be uploaded. In multipart upload, the storage host will now need to allocate a suitable address range on the disk to store the file and record this information. When it starts receiving a chunk upload, it should write the chunk to the appropriate addresses. The metadata service can track which chunk uploads have completed. When the client completes uploading the final chunk, the metadata service marks the file as ready for replication and download. If a chunk upload fails, the client can reupload just this chunk instead of the entire file.

If the client stops uploading the file before all chunks are successfully uploaded, these chunks will uselessly occupy space in our storage host. We can implement a simple cron job or a batch ETL job that periodically deletes these chunks of incompletely uploaded files.

Another question is how to handle adding, updating (the contents), and deleting files on this distributed system. We can discuss some possible solutions adopted:

A single-leader approach that designates a particular data center to perform these operations and propagate the changes to the other data centers. This approach may be adequate for our requirements, especially if we do not require the changes to be rapidly available on all data centers.
The client acquires a lock on this file in every data center, performs this operation on every data center, and then releases the locks.

Certain files may be used mostly by particular regions, so not all data centers need to contain a copy of the files. We can set replication criteria to determine when a file should be copied to a particular data center (e.g., number of requests or users for this file within the last month). Certain contents are separated into multiple files because of application requirements to serve certain file combinations to certain users. For example, a video file may be served to all users, and it has an accompanying audio file in a particular language. This logic can be handled at the application level rather than by the CDN.

Cache invalidation

As a CDN is for static files, cache invalidation is much less of a concern. We discussed various caching strategies and designing a system to monitor the cache for stale files. This system will have to anticipate high traffic.