Introduction

A number of cloud hosting providers provide optimized static content delivery services, such as Amazon AWS Cloudfront and Microsoft Windows Azure CDN (Content delivery network). In this article, we will explore the elements of a scalable infrastructure that can be used to deliver static content at high capacity peak loads and build a test platform that we can perform load testing and benchmark performance.

Scenario

Let’s assume that a fictitious company Chimpcorp wants to offload the serving of static files from it’s primary website and stream this data to partners and customers around the world over the Internet. They want a cost-effective, yet scalable solution that can allow large numbers of users to download the files at the same time. Furthermore, the data must be available at all times and resilient to failure and hacking attempts. Our task is to build a web infrastructure to handle this. We can summarize the goals below:

  • Serve static content
  • Cost effective
  • Low latency Scalable
  • Fault Tolerant
  • World-wide access

We have also developed a list of assumptions and verified them with Chimpcorp in order to narrow down our design:

  • The content consists of files approximately 5 KB in size
  • Content is static and will not change frequently
  • Servers must be able to host more than 10,000 simultaneous connections
  • All users will connect to a public Internet website (download.chimpcorp.com)
  • All users must access the content via HTTP
  • All users access identical content
  • There is no requirement to track user sessions
  • The servers must be secured to prevent tampering of data

Deconstructing the problem

Our first step is to break down the entire conversation between the end user and the web server serving up the content. The conversation can be described as follows:

  1. User launches a browser from their device
  2. User types in a URL into their browser
  3. The browser checks its cache; if requested object is in cache and is fresh, skip to #13
  4. Browser asks OS for server’s IP address corresponding to the DNS name in the URL
  5. OS verifies that the DNS name is not already cached in it’s host file
  6. OS makes a DNS lookup and provides the corresponding IP address to the browser
  7. Browser opens a TCP connection to server (Socket to Port 80)
  8. HTTP traffic traverses the Internet to the server
  9. Browser sends a HTTP GET request through TCP connection
  10. Server looks up required resource (if it exists) and responds using the HTTP protocol
  11. Browser receives HTTP response
  12. After sending the response the server closes the socket
  13. Browser checks if the response is a 2xx (200 OK) or redirect (3xx result status codes), authorization request (401), error (4xx and 5xx), etc. and handles the request accordingly
  14. If cacheable, response is stored in cache
  15. Browser decodes response (e.g. if it’s gzipped)
  16. Browser determines what to do with response (e.g. is it a HTML page, is it an image, is it a sound clip?)
  17. Browser renders response, or offers a download dialog for unrecognized types
  18. User views the data .. and so on

Evaluating components and areas for optimization

Our next step is to analyze the individual elements in the conversation for opportunities for optimization and scaling.

  1. End users’ browser/bandwidth/ISP – Beside the fact that users must access our servers via HTTP over the Internet, we do not have any control over the type and version of browser, the quality and bandwidth of ISP service, or the device that the user accesses the content from.
  2. DNS Lookup – It take approximately 20-120 milliseconds for DNS to resolve an IP Address. Users connecting from around the world can either use Geo-aware redirection or Anycast DNS for smart resolution of IP addresses to a web host close to the server.
  3. Server Location – As users will be accessing the host from locations around the world, the servers should be co-located close to where the users are in order to reduce round trip times. We can use Geo-aware DNS to relay users to servers that are located in their geographical region.
  4. TCP Session Parameters – As we are serving small static content over our website, we can analyze the specific parameters of the TCP session in order to identify potential areas for optimization. Examples of TCP parameters are listed below
    1. TCP Port Range
    2. TCP Keepalive Time
    3. TCP Recycle and Reuse times
    4. TCP Frame Header and Buffer sizes
    5. TCP Window Scaling
  5. TCP Header Expiration/Caching – We can set an expiry header to expire far into the future to reduce page loads for static content that does not change. We can also use Cache control headers specified in the HTTP 1.1 standard to tweak caching.
  6. HTTP Server Selection – With the wide variety of HTTP Servers available in the market, our optimal selection should take into account the stated project objectives. We should be looking for a Web Server that can efficiently server static content to large numbers of users, be able to scale out and have some degree of customization for effectiveness.
  7. Server Resource Allocation – Upon our selection of the appropriate Web server, we can select the appropriate hardware setup, bearing in mind specific performance bottlenecks for our chosen HTTP server, such as Disk I/O, Memory allocation and Web caching.
  8. Optimizing Content – We can optimize how content is presented to users. For example, compressed files take less time to be downloaded from the server to the end user and image files should can be optimized and scaled accordingly.
  9. Content Offloading – Javascripts, images, CSS and static files can be offloaded to Content Delivery Networks. For this scenario, we will rely on our web servers to host this data.
  10. Dynamic Scaling – Depending on the load characteristics of our server, we should find a solution to rapidly scale out our web performance either horizontally (adding nodes) or vertically (adding resources).

Design Parameters

Our next stage is to compile the analysis into tangible design parameters that will shape our final design. The design components and related attributes are listed as follows:

  • Server Hosting Platform: A global cloud services provider is a cost efficient way to deploy identical server farms around the world.
  • DNS Hosting: A highly available DNs forwarding solution that incorporates anycast DNS is preferrable to a GEO-aware resolution service.
  • HTTP Server Selection: An Nginx web server configured on an optimized Linux platform will provide highly-scalable and resource efficient platform for our task. The primary advantage of Nginx over other more popular Web Server technologies such as Apache is that Nginx doesn’t spawn a new thread for each incoming connection. Existing worker processes accept new requests from a shared listen socket. Nginx is also widely supported and has an active user community and support base.

Nginx Optimization

The following parameters were used to optimize the Nginx server deployment

  1. CPU and Memory Utilization – Nginx is already very efficient with how it utilizes CPU and Memory. However, we can tweak several parameters based on the  type of workload that we plan to serve. As we are primarily serving static files, we expect our workload profile to be less CPU intensive and more disk-process oriented.
    1. Worker_processes –  We can configure the number of single-threaded Worker processes to be 1.5 to 2 x the number of CPU cores to take advantage of Disk bandwidth (IOPs).
    2. Worker_connections – We can define how many connections each worker can handle. We can start with a value of 1024 and tweak our figures based on results for optimal performance. The ulimit -n command gives us the numerical figure that we can use to define the number of worker_connections.
    3. SSL Processing – SSL processing in Nginx is fairly processor hungry and if your site serves pages via SSL, then you need to evaluate the Worker_process/CPU ratios. You can also turn off Diffie-Hellman cryptography and move to a quicker cipher if you’re not subject to PCI standards. (Examples: ssl_ciphers RC4:HIGH:!aNULL:!MD5:!kEDH;)
  2. Disk Performance – To minimize IO bottlenecks on the Disk subsystem, we can tweak Nginx to minimize disk writes and ensure that Nginx does not resort to on-disk files due to memory limitations.
    1. Buffer Sizes – Buffer size defines how much data we can store in the host. A buffer size that is too low will result in Nginx having to upstream responses on disk, which introduces additional latency due to disk read/write IO response times.
      1. client_body_buffer_size: The directive specifies the client request body buffer size, used to handle POST data. If the request body is more than the buffer, then the entire request body or some part is written in a temporary file.
      2. client_header_buffer_size: Directive sets the headerbuffer size for the request header from client. For the overwhelming majority of requests it is completely sufficient to have a buffer size of 1K.
      3. client_max_body_size: Directive assigns the maximum accepted body size of client request, indicated by the line Content-Length in the header of request. If size is greater the given one, then the client gets the error “Request Entity Too Large” (413).
      4. large_client_header_buffers: Directive assigns the maximum number and size of buffers for large headers to read from client request. The request line can not be bigger than the size of one buffer, if the client sends a bigger header nginx returns error “Request URI too large” (414). The longest header line of request also must be not more than the size of one buffer, otherwise the client get the error “Bad request” (400). These parameters should be configured as follows:
        client_body_buffer_size 8K;
        client_header_buffer_size 1k;
        client_max_body_size 2m;
        large_client_header_buffers 2 1k;
    2. Access/Error Logging – Access Logs record every request for a file and quickly consume valuable disk I/O. Error logs should not be set too Low unless it is our intention to capture every single HTTP error. A warm level of logging is sufficient for most production environments. We can configure Logs to store data in chunks, defining chunk sizes in (8KB, 32KB, 128KB)
    3. Open File Cache – The open file cache directive stores Open file descriptors, including information of the file, location and size.
    4. OS File Caching – We can define parameters around the size of the cache used by the underlying server OS to cache frequently accessed disk sectors. Caching the web server content will reduce or even eliminate disk I/O.
  3. Network I/O and latency – There are several parameters that we can tweak in order to optimize how efficiently the server can manage a given amount of network bandwidth due to peak loads.
    1. Time outs – Timeouts determine how long the server maintains a connection and should be configured optimally to conserve resources on the server.
      1. client_body_timeout: Directive sets the read timeout for the request body from client. The timeout is set only if the body is not obtained in one read step. If after this time the client send nothing, nginx returns error “Request time out” (408).
      2. client_header_timeout: Directive assigns timeout with reading of the title of the request of client. The timeout is set only if a header is not obtained in one readstep. If after this time the client send nothing, nginx returns error “Request time out” (408).
      3. keepalive_timeout: The first parameter assigns the timeout for keep-alive connections with the client. The server will close connections after this time. The optional second parameter assigns the time value in the header Keep-Alive: timeout=time of the response. This header can convince some browsers to close the connection, so that the server does not have to. Without this parameter, nginx does not send a Keep-Alive header (though this is not what makes a connection “keep-alive”).  The author of Nginx claims that 10,000 idle connections will use only 2.5 MB of memory
      4. send_timeout: Directive assigns response timeout to client. Timeout is established not on entire transfer of answer, but only between two operations of reading, if after this time client will take nothing, then nginx is shutting down the connection.

        These parameters should be configured as follows:

        client_body_timeout 10; client_header_timeout 10; keepalive_timeout 15; send_timeout 10;

    2. Data compression – We can use Gzip to compress our static data, reducing the size of the TCP packet payloads that will need to traverse the web to get to the client computer. Furthermore, this also reduces CPU load when serving large file sizes. The Nginx HTTP Static Module should be used gzip on; gzip_static on;
    3. TCP Session parameters – The TCP_* parameters of Nginx
      1. TCP Maximum Segment Lifetime (MSL) – The MSL defines how long the server should wait for stray packets after closing a connection and this value is set to 60 by default on a Linux server.
    4. Increase System Limits – Specific parameters such as the number of open file parameters and the number of available ports to serve connections can be increased.

Solution Design A: Amazon AWS

The design parameters in the section above were used to build our scalable web solution. The components of our solution are as follows:

  • Amazon EC2 AMIs: Elastic Compute Cloud will be used to host our server farms. Nginx offers a fully supported AMI instance in EC2 that we can tweak to further optimize performance to suit our needs. This AMI is readily deployable from the AWS marketplace and includes support from Nginx Software Inc. We will deploy Nginx on a High-CPU Medium Instance featuring the following build specs:
    • 1.7 GB RAM
    • 5 EC2 Compute Units (2 Virtual Cores)
    • 350 GB instance storage
    • 64-bit architecture
    • Moderate I/O performance
  • Elastic IPs: Amazon provides an Elastic IP Service that allows us to associate a static Public IP Address to our virtual host.
  • Amazon Route 53: This scalable DNS service allows us to implement an Anycast DNS solution for resolving hosts to an environment that is closest to them.
  • Additional Options: A number of automation and deployment tools were utilized to enhance the efficiency of the environment:
    • EC2 Command Line tools
    • Automated deployment and lifecycle management via Chef
    • Development testing via Vagrant
    • Centralized code repository and version control via Github
  • Solution VariantsThe following design variants were introduced in order to benchmark our original build selection against alternative deployment scenarios. The scenarios are as follows:
    • Nginx Systems AMI
    • CentOS optimized Nginx with PHP-FPM
    • Apache Web Server on Ubuntu

Measuring our solution’s effectiveness Here we define a number of simple measures that we can use to benchmark the effectiveness of our solution

  • Cost per user – defined as the cost of the solution divided by the number of users within a given period, this measures the cost effectiveness of the solution
  • Server Connectivity Metrics – These metrics are relevant to Web Server performance
    • Number of Requests per Second
    • Number of Connections per Second
    • Average/Min/Max Response rates
    • Response Times (ms)
  • System Performance Metrics – these metrics relate specifically to system performance
    • CPU/Memory Load

Testing our design

We provisioned an M2.Medium tier EC2 instance in the same availability zone to test host level performance without the complications of network latency between test host and server. We ran tests at increasing levels of concurrency and the number of requests per second. We used the following tools to test the performance of our solution in relation to our test scenario:

Test Results:

Baseline (Nginx AMI):  Httperf tests returned 1.1ms reply times for linearly increasing loads up to 30,000 connections/ second over large sample sizes. The host started to display increasing standard deviations closer to 30,000 simultaneous connections, indicating potential saturation. Memory usage on the host remained stable at around 17 MB even during peak loads.

Optimized CentOS (Nginx AMI): Httperf tests returned similar response times and response rates as the baseline host, up to 30,000 connections/second ( >1000 concurrent connections), however results showed higher consistency over large samples and lower standard deviation.

Apache (Ubuntu Host): Httperf tests returned 100+ms response times for linearly increasing loads up to 10,000 connections/second, quickly saturating at 6,000 connections/sec. Each httpd instance occupied approximately 100MB of memory and quickly consumed the majority of system resources on the 1.7GB RAM virtual host.

Baseline

Fig: Nginx Optimized Performance

Conclusions:

Overall performance on the Nginx platform for delivering static content (HTML files) was far superior to Apache. Nginx performance out-of-the-box is respectable and with further customization of settings, can provide highly optimal and scalable results.

Recommendations:

In order to further increase the performance of the solution, we propose the following recommendations:

  • Increase sample sizes – Due to time constraints, the number of repetitions run for load testing was low. For real-life production environments. we recommend running httperf in a wrapper like ab.c over a larger number of repetitions (>1000) at varying load factors in order to build a respectable sample. At this point, trends will be easier to identify.
  • Implement in-memory caching – not currently supported natively in Nginx
  • Implement Elastic Load Balancing  – ELB has been load tested to over 20k simultaneous connections
  • Migrate Static Content to Cloud Front – While Nginx can provide superior performance, it’s is most popularly deployed as a reverse proxy to offload static content from dynamic code like PHP. Amazon’s Cloud Front is optimized to provide superior and scalable web content delivery that can synchronize across multiple locations.
  • Cost Management – If costs are a concern, we can certainly move to the open source Nginx solution in comparison to the provisioned AMI instance and save on support and licensing costs.

References:

http://stackoverflow.com/questions/2092527/what-happens-when-you-type-in-a-url-in-browser http://www.hongkiat.com/blog/ultimate-guide-to-web-optimization-tips-best-practices/ http://yuiblog.com/blog/2007/04/11/performance-research-part-4/
http://geekexplains.blogspot.co.uk/2008/06/whats-web-server-how-does-web-server.html http://nbonvin.wordpress.com/2011/03/24/serving-small-static-files-which-server-to-use/ Overview of Nginx architecture Implement Nginx in AWS Httperf command reference Performance  testing
http://gwan.com/en_apachebench_httperf.html