The question is to design a service for landlords to rent rooms for short-term stays to travelers. This may be both a coding and system design question. A coding discussion will be in the form of coding and object-oriented programming (OOP) solution of multiple classes. This question can be applied to reservation systems in general, such as

  • Movie tickets
  • Air tickets
  • Parking lots
  • Taxis or ridesharing

We will cover below in the subsequent sections

  • Designing a reservation system
  • Designing systems for operations staff to manage items and reservations
  • Scoping a complex system

Requirements

Before we discuss requirements, we can discuss the kind of system that we are designing. Airbnb is:

  1. A reservation app, so there is a type of user (Guests) who makes reservations on finite items. There is also a type of user (hosts) who creates listings of these items.
  2. A marketplace app. It matches people who sell products and services to people who buy them. Airbnb matches hosts and guests.
  3. It also handles payments and collects commissions. This means there are internal users who do customer support and operations (commonly abbreviated as “ops”), to mediate disputes and monitor and react to fraud. This distinguishes Airbnb from simpler apps like Craigslist. The majority of employees in companies like Airbnb are customer support and operations.

A host’s use cases include the following

  • Onboarding and updates to add, update, and delete listings. Updates may include small tasks like changing listing photos. There may be much intricate business logic. For example, a listing may have a minimum and/or maximum booking duration, and pricing may vary by day of week or other criteria. The app may display pricing recommendations. Listings may be subject to local regulations.
  • Handle bookings—for example accept or reject booking requests:
    • A host may be able to view a guest’s ratings and reviews by other hosts, before accepting or rejecting the guest’s booking request.
    • Airbnb may provide additional options such as automated acceptances under certain host-specified criteria, such as guests with a high average rating.
    • Cancel a booking after accepting it. This may trigger monetary penalties or suspension listing privileges.
  • Communicate with guests, such as via in-app messaging.
  • Post a rating and review of a guest and view the guest’s rating and review.
  • Receive payment from the guest (minus Airbnb’s commission).
  • Receive tax filing documents.
  • Analytics, such as viewing earnings, ratings, and review contents over time.
  • Communicate with operations staff, including requests for mediation (such as requesting guests to pay for damages) or reporting fraud.

A guest’s use cases include the following:

  • Search and view listings.
  • Submit a booking request and payment and check the statuses of booking requests.
  • Communicate with hosts.
  • Post a rating and review of a listing and view the host’s rating and review.
  • Communicate with operations staff, analogous to hosts.

Airbnb Operational use cases include

  • Reviewing listing requests and removing inappropriate listings.
  • Communicating with customers for purposes such as dispute mediation, offering alternative listings, and sending refunds.

A payment solution must consider numerous currencies and regulations (including taxes) that differ by country, state, city, and other levels of government and are different for various products and services. We may impose different transaction fees by payment type (e.g., a maximum transaction amount for checks or a discount for payments made via gift cards to encourage the purchase of gift cards). There are various of ways to accept payments, such as

  • Cash.
  • Various debit and credit card processors like MasterCard, Visa, and many others. Each has their own API.
  • Online payment processors like PayPal or Alipay.
  • Check/cheque.
  • Store credit.
  • Payment cards or gift cards that may be specific to certain combinations of companies and countries.

Functional requirements

  • A host may list a room. Assume a room is for one person. Room properties are city and price. The host may provide up to 10 photos and a 25 MB video for a room.
  • A guest may filter rooms by city, check-in, and check-out date.
  • A guest may book a room with a check-in and check-out date. Host approval for booking is not required.
  • A host or guest may cancel a booking at any time before it begins.
  • A host or guest may view their list of bookings.
  • A guest can have only one room booked for any particular date.
  • Rooms cannot be double-booked.

The following are outside the scope of this interview. It is good to mention these possible functional requirements to demonstrate your critical thinking and attention to detail.

  • Analytics.
  • Airbnb may provide hosts with pricing recommendations. A listing may set a minimum and maximum price/night, and Airbnb may vary the price within this range.
  • Additional pricing options and properties, such as cleaning fees and other fees, different prices on peak dates (e.g., weekends and holidays) or taxes.
  • Payments or refunds, including cancellation penalties.
  • Customer support, including dispute mediation. A good clarifying question is whether we need to discuss how operations (ops) reviews listing requests.
  • Insurance.
  • Chat or other communication between any parties, such as host and guest.
  • Signup and login.
  • Compensation of hosts and guests for outages.
  • User reviews, such as a guest reviewing their stay or a host reviewing their guest’s behavior.

Non-functional requirements

  • Scalable to 1 billion rooms or 100 million daily bookings. Past booking data can be deleted. No programmatically generated user data.
  • Strong consistency for bookings, or more precisely listing availability, so there will be no double bookings or bookings on unavailable dates in general. Eventual consistency for other listing information such as description or photos may be acceptable.
  • High availability because there are monetary consequences of lost bookings.
  • High performance is unnecessary. P99 of a few seconds is acceptable.
  • Typical security and privacy requirements. Authentication required. User data is private.

Design decisions

Replication

A search can be only done on one city at a time. We can take advantage of this to allocate a data center host to a city with many listings or to multiple cities that have fewer listings. Because write performance is not critical, we can use single-leader replication. To minimize read latency, the secondary leader and the followers can be geographically spread out across data centers. We can use a metadata service to contain a mapping of city to leader and follower host IP addresses, for our service to look up the geographically closest follower host to fetch the rooms of any particular city or to write to the leader host corresponding to that city. This mapping will be tiny in size and only modified by admins infrequently, so we can simply replicate it on all data centers, and admins can manually ensure data consistency when updating the mapping.

We can use a CDN to store the room photos and videos, and as usual other static content like JavaScript and CSS.

Handling overlapping bookings

If multiple users attempt to book the same room with overlapping dates, the first user’s booking should be granted, and our UI should inform the other users that this room is no longer available for the dates they selected and guide them through finding another available room. This may be a negative UX experience, so we may want to briefly brainstorm a couple of alternative approaches. You may suggest other possibilities.

We can randomize the order of the search results to reduce such occurrences, though that may interfere with personalization (such as recommender systems.)

Lock rooms during booking flow

When a user clicks on a search result to view the details of a room and possibly submit a booking request, we can lock these dates for the room for a few minutes. During this time, searches by other users with overlapping dates will not return this room in the result list. If this room is locked after other users have already received their search results, clicking on the room’s details should present a notification of the lock and possibly its remaining duration if those users wish to try again, just in case that user did not book that room.

High-level architecture

From the previous section’s requirements discussion, we draw our high-level architecture, shown in below figure. Each service serves a group of related functional requirements. This allows us to develop and scale the services separately:

  • Booking service—For guests to make bookings. This service is our direct revenue source and has the most stringent non-functional requirements for availability and latency. Higher latency directly translates to lower revenue. Downtime on this service has the most serious effect on revenue and reputation. However, strong consistency may be less important, and we can trade off consistency for availability and latency.
  • Listing service—For hosts to create and manage listings. It is important but less critical than the booking and listing services. It is a separate service because it has different functional and non-functional requirements than the booking and availability services, so it should not share resources with them.
  • Availability service—The availability service keeps track of listing availability and is used by both the booking and listing services. The availability and latency requirements are as stringent as the booking service. Reads must be scalable, but writes are less frequent and may not require scalability.
  • Approval service—Certain operations like adding new listings or updating certain listing information may require ops approval prior to publishing. We can design an approval service for these use cases. We name the service the “approval service” rather than the more ambiguous-sounding “review service.”
  • Recommender service—Provides personalized listing recommendations to guests. We can see it as an internal ads service. A detailed discussion is out of scope in the interview, but we can include it in the diagram and discuss it just for a short moment.
  • Regulations service—As discussed earlier, the listing service and booking service need to consider local regulations. The regulations service can provide an API to the listing service, so the latter can provide hosts with the appropriate UX for creating listings that comply with local regulations. The listing service and regulation service can be developed by separate teams, so each team member can concentrate on gaining domain expertise relevant to their respective service. Dealing with regulations may be initially outside the scope of an interview, but the interviewer may be interested to see how we handle it.

We can employ functional partitioning by geographical region, similar to the approach discussed with Craigslist. Listings can be placed in a data center. We deploy our application into multiple data centers and route each user to the data center that serves their city.

Create or update a listing

Creating a listing can be divided into two tasks. The first task is for the host to obtain their appropriate listing regulations. The second task is for the host to submit a listing request. In this chapter, we refer to both creating and updating listings as listing requests. The sequence is as follows:

  1. The host is currently on the client (a webpage mobile app component) that provides a button to create a new listing. When the host clicks on the button, the app sends a request to the listing service that contains the user’s location. (The host’s location can be obtained by asking the host to manually provide it or by asking the host to grant permission to access their location.)
  2. The listing service forwards their location to the regulation service. The regulation service responds with the appropriate regulations.
  3. The listing service returns the regulations to the client. The client may adjust the UX to accommodate the regulations. For example, if there is a rule that a booking must last at least 14 days, the client will immediately display an error to the host if they enter a minimum booking period of less than 14 days.

The host enters their listing information and submits it. This is sent as a POST request to the listing service. The listing service does the following:

  1. Validates the request body.
  2. Writes to a SQL table for listings, which we can name the Listing table. New listings and certain updates need manual approval by the Ops staff. The Listing SQL table can contain a Boolean column named “Approved” that indicates if a listing has been approved by ops.
  3. If Ops approval is required, it sends a POST request to the Approval service to notify Ops to review the listing.
  4. Sends the client a 200 response.

The form to create a listing may be divided into multiple parts, and a host may fill and submit each part separately, and each submission is a separate request. For example, adding photos to a listing may be done one at a time. A host may fill in a listing’s title, type, and description and submit it as one request and then fill in pricing details and submit it as another request, and so on.

Notifications can be implemented as a batch ETL job that makes requests to the listing service and then requests a shared notifications service to send notifications. The batch job can query for incomplete listings then

  • Notify hosts to remind them that they have not completed the listing process.
  • Notify ops of incomplete listings, so ops staff can contact hosts to encourage and guide them to complete the listing process.

Booking service

The steps of a simplified booking/reservation process are as follows:

  1. A guest submits a search query for the listing that matches the following and receives a list of available listings. Each listing in the result list may contain a thumbnail and some brief information. As discussed in the requirements section, other details are out of scope.
    • City
    • Check-in date
    • Check-out date
  2. The guest may filter the results by price and other listing details.
  3. The guest clicks on a listing to view more details, including high-resolution photos and videos if any. From here, the guest may go back to the result list.
  4. The guest has decided on which listing to book. They submit a booking request and receive a confirmation or error.
  5. If the guest receives a confirmation, they are then directed to make payment.
  6. A guest may change their mind and submit a cancellation request.

Similar to the listing service discussed earlier, we may choose to send notifications such as

  • Notify guests and hosts after a booking is successfully completed or canceled.
  • If a guest filled in the details of a booking request but didn’t complete the booking request, remind them after some hours or days to complete the booking request.
  • Recommend listings to guests based on various factors like their past bookings, listings they have viewed, their other online activity, their demographic, etc. The listings can be selected by a recommender system.
  • Notifications regarding payments. Regarding payment, we may choose to escrow payments before the host accepts or request payment only after the host accepts. The notification logic will vary accordingly.

Let’s quickly discuss scalability requirements. As discussed earlier, we can functionally partition listings by city. We can assume that we have up to one million listings in a particular city. We can make a generous overestimate of up to 10 million daily requests for search, filtering, and listing details. Even assuming that these 10 million requests are concentrated in a single hour of the day, this works out to less than 3,000 queries per second, which can be handled by a single or small number of hosts. Nonetheless, the architecture discussed in this section will be capable of handling much larger traffic.

All queries are processed by a backend service, which queries either the shared Elasticsearch or SQL services as appropriate.

High-level architecture of the booking service

Search and filter requests are processed on the Elasticsearch service. The Elasticsearch service may also handle pagination, so it can save memory and CPU usage by returning only a small number of results at a time. Elasticsearch supports fuzzy search, which is useful to guests who misspell locations and addresses. Photos and videos are downloaded from the CDN. A booking request is forwarded to the availability service. Our SQL service used by this booking service can use the leader-follower architecture. The infrequent writes are made to the leader host, which will replicate them to the follower hosts.

The Elasticsearch index needs to be updated when a listing’s availability or details change. Adding or updating a listing requires write requests to both the SQL service and Elasticsearch service. This can be handled as a distributed transaction to prevent inconsistency should failures occur during writes to either service. A booking request requires writes to the SQL services in both the booking service and availability service (discussed in the next section) and should also be handled as a distributed transaction.

If the booking causes the listing to become ineligible for further listings, the booking service must update its own database to prevent further bookings and also update the Elasticsearch service so this listing stops appearing in searches.

Sequence diagram of our simplified booking process. Making payment will involve a large number of requests to multiple services.

Availability service

The availability service needs to avoid situations like the following:

  • Double bookings.
  • A guest’s booking may not be visible to the host.
  • A host may mark certain dates as unavailable, but a guest may book those dates.

The availability service provides the following endpoints:

  • Given a location ID, listing type ID, check-in date, and check-out date, GET available listings.
  • Lock a listing from a particular check-in date to check-out date for a few (e.g., five) minutes.
  • CRUD a reservation, from a particular check-in date to check-out date.

The availability service consists of a backend service, which makes requests to a shared SQL service. The shared SQL service has a leader-follower architecture. The SQL service can contain an availability table, which can have the following columns. There is no primary key:

  • listing_id—The listing’s ID assigned by the listing service.
  • date—The availability date.
  • booking_id—The booking/reservation ID assigned by the booking service when a booking is made.
  • available—A string field that functions as an enum. It indicates if the listing is available, locked, or booked. We may save space by deleting the row if this (listing_id, date) combination is not locked or booked. However, we aim to achieve high occupancy, so this space saving will be insignificant. Another disadvantage is that our SQL service should provision sufficient storage for all possible rows, so if we save space by not inserting rows unless required, we may not realize that we have insufficient storage provisioned until we have a high occupancy rate across our listings.
  • timestamp—The time this row was inserted or updated.

We can display a six-minute timer on the client (web or mobile app) for a listing lock process. The timer on the client should have a slightly longer duration than the timer on the backend because the clocks on the client and the backend host cannot be perfectly synchronized.