A system has functional and non-functional requirements. Functional requirements describe the inputs and outputs of the system. Non-functional requirements refer to requirements other than the system inputs and outputs. Typical non-functional requirements include the following

  • Scalability—The ability of a system to adjust its hardware resource usage easily and with little fuss to cost-efficiently support its load.
  • Availability—The percentage of time a system can accept requests and return the desired response.
  • Performance/latency and Throughput—Performance or latency is the time taken for a user’s request to the system to return a response. The maximum request rate that a system can process is its bandwidth. Throughput is the current request rate being processed by the system. Throughput/bandwidth is the inverse of latency. A system with low latency has high throughput.
  • Fault-tolerance—The ability of a system to continue operating if some of its components fail and the prevention of permanent harm (such as data loss) should downtime occur.
  • Security—Prevention of unauthorized access to systems.
  • Privacy—Access control to Personally Identifiable Information (PII), which can be used to uniquely identify a person.
  • Accuracy—A system’s data may not need to be perfectly accurate, and accuracy tradeoffs to improve costs or complexity are often a relevant discussion.
  • Consistency—Whether data in all nodes/machines match.
  • Cost—We can lower costs by making tradeoffs against other non-functional properties of the system.
  • Complexity, maintainability, debuggability, and testability—These are related concepts that determine how difficult it is to build a system and then maintain it after it is built.

A customer, whether technical or non-technical, may not explicitly request non-functional requirements and may assume that the system will satisfy them. This means that the customer’s stated requirements will almost always be incomplete, incorrect, and sometimes excessive. Without clarification, there will be misunderstandings on the requirements. Non-functional requirements are commonly traded off against each other. In any system design interview, we must discuss how various design decisions can be made for various tradeoffs.

Scalability

Scalability is the ability of a system to adjust its hardware resource usage easily and with little fuss to cost-efficiently support its load. The process of expanding to support a larger load or number of users is called scaling. Scaling requires increases in CPU processing power, RAM, storage capacity, and network bandwidth. Scaling can refer to vertical scaling or horizontal scaling.

Vertical scaling is conceptually straightforward and can be easily achieved just by spending more money. It means upgrading to a more powerful and expensive host, one with a faster processor, more RAM, a bigger hard disk drive, a solid-state drive instead of a spinning hard disk for lower latency, or a network card with higher bandwidth. Vertical scaling is conceptually trivial. Regardless of budget, current technological limitations will impose a maximum amount of processing power, RAM, or storage capacity that is technologically possible on a single host.

Horizontal scaling refers to spreading out the processing and storage requirements across multiple hosts. “True” scalability can only be achieved by horizontal scaling. Horizontal scaling is almost always discussed in a system design interview.

Availability

Availability is the percentage of time a system can accept requests and return the desired response. High availability is required in most services, and other non-functional requirements may be traded off to allow high availability without unnecessary complexity.

Fault-tolerance

Fault-tolerance is the ability of a system to continue operating if some of its components fail and the prevention of permanent harm (such as data loss) should downtime occur. This allows graceful degradation, so our system can maintain some functionality when parts of it fail, rather than a complete catastrophic failure.

One replication technique is to have multiple (such as three) redundant instances/copies of a component, so up to two can be simultaneously down without affecting uptime. One instance is designated as the source of truth (often called the leader), while the other two components are designated as replicas (or followers). There are various possible arrangements of the replicas. One replica is on a different server rack within the same data center, and another replica is in a different data center. Another arrangement is to have all three instances on different data centers, which maximizes fault-tolerance with the tradeoff of lower performance. Replication also helps to increase availability.

Security

During an interview, we may need to discuss possible security vulnerabilities in our system and how we will prevent and mitigate security breaches. This includes access both from external parties and internally within our organization. We may also discuss rate limiting to prevent DDoS attacks. 

Privacy

Personally Identifiable Information (PII) is data that can be used to uniquely identify a customer, such as full name, government identifiers, addresses, email addresses, and bank account identifiers. PII must be safeguarded to comply with regulations such as the General Data Protection Regulation (GDPR).

Within our system, access control mechanisms should be applied to PII stored in databases and files. We can use mechanisms such as the Lightweight Directory Access Protocol (LDAP). We can encrypt data both in transit (using SSL) and at rest. Consider using hashing algorithms such as SHA-2 and SHA-3 to mask PII 

Performance

Performance or latency is the time taken for a user’s request to the system to return a response. This includes the network latency of the request to leave the client and travel to the service, the time the service takes to process the request and create the response, and the network latency of the response to leave the service and travel to the client. A typical request on a consumer-facing app (e.g., viewing a restaurant’s menu on a food delivery app or submitting a payment on an ecommerce app) has a desired latency of tens of milliseconds to several seconds. High-frequency trading applications may demand latency of several milliseconds.

Strictly speaking, latency refers to the travel time of a packet from its source to its destination. However, the term “latency” has become commonly used to have the same meaning as “performance,” and both terms are often used interchangeably. We still use the term latency if we need to discuss packet travel time.

Consistency

Consistency has different meanings in ACID and CAP (from the CAP theorem). ACID consistency focuses on data relationships like foreign keys and uniqueness. CAP consistency is actually linearizability, defined as all nodes containing the same data at a moment in time, and changes in data must be linear; that is, nodes must start serving the changes at the same time. Eventually, consistent databases trade off consistency for improvements in availability, scalability, and latency. 

Cost

In system design discussions, we can suggest trading off other non-functional requirements for lower cost. Examples:

  • Lower availability for improved costs by decreasing the redundancy of a system (such as the number of hosts, or the replication factor in a database).
  • Higher latency for improved costs by using a data center in a cheaper location that is further away from users.

Discuss the cost of implementation, cost of monitoring, and cost of each non-functional requirement such as high availability.

Besides the cost of maintenance in the form of addressing possible production problems, there will also be costs due to the natural atrophy of software over time as libraries and services are deprecated. Identify components that may need future updates. Which dependencies (such as libraries) will prevent other components from being easily updated if these dependencies become unsupported in the future?

Accuracy

Accuracy is a relevant non-functional requirement in systems with complex data processing or a high rate of writes. Accuracy of data means that the data values are correct and are not approximations. Estimation algorithms trade off accuracy for lower complexity.

A cache is stale if the data in its underlying database has been modified. A cache may have a refresh policy where it will fetch the latest data at a fixed periodic interval. A short refresh policy is more costly. An alternative is for the system to update or delete the associated cache key when data is modified, which increases complexity.

Accuracy is somewhat related to consistency. Systems that are eventually consistent trade off accuracy for improvements in availability, complexity, and cost. When a write is made to an eventually consistent system, results from reads made after this write may not include the effects of the write, which makes them inaccurate. The eventually consistent system is inaccurate until the replicas are updated with the effects of the write operation.

Complexity

The first step to minimize complexity is to clarify both functional and non-functional requirements, so we do not design for unnecessary requirements. As we sketch design diagrams, note which components may be separated into independent systems. Use common services to reduce complexity and improve maintainability. Common services that are generalizable across virtually all services include

  • Load balancer service
  • Rate limiting
  • Authentication and authorization
  • Logging, monitoring
  • Caching
  • DevOps and CI/CD if applicable

Services that are generalizable for certain organizations, such as those that collect user data for data science, include analytics and machine learning.