Cloud

Benchmarking Cloud Acceleration

Whenever we compare one thing to another, we start by defining what it is that we wish to measure, our metric. We then establish a baseline value, at one time it was a mark on our bench, a benchmark, then that becomes the foundation for our comparisons using this metric. When it comes to computers, and more specifically servers, we often try and replicate the results manufacturers tout via simple Proof of Concept (PoC) testing.

When testing servers, specifically those that handle Web applications, three metrics, which are measured per second, are the most prevalent: connections (CPS), transactions (TPS), and requests (RPS). CPS is how many connections the server can set up and sustain per second; this is similar to how many shoppers a store like Target can let through the door and contain in the store. CPS is very similar. Suppose I told you that a particular server platform could sustain 20 million CPS. Is that significant? The next metric TPS is typically associated with Secure Sockets Layer (SSL) performance and is often seen as SSL TPS. You can think of this as how many open check-out lines there are at a specific time in a given Target store. Finally, we have RPS, which is often combined with a request or response payload size ranging from 0KB to 100KB. Think of this as how big the store’s shopping cart is, and RPS is how many of these carts can flow out of the store per unit time. Now if I told you that a web server could handle 2.2 million 1KB shopping carts per second that’s something compelling. For web applications while CPS is interesting it only shows part of the story, RPS qualified with a 128 byte or 1KB payload offers a more complete picture.

An essential qualifier for metrics is also peak versus average or worse no mention of either. Marketers sometimes lean on performance data that highlights their core message, and they’ll often select a single value to make their point. Our objective for all the testing outlined below is to provide readers with realistic expectations that can then become achievable results when they finally document their PoC for management review. The last thing anyone wants is for a customer to set up a PoC and find out that the manufacturer’s provided benefits were overly ambitious.

When benchmarking Solarflare Cloud Acceleration technology we want to look at the average performance data for both CPS and RPS. The applications tested ranged from Software Load Balancers to Web Servers, Application Servers, and In-Memory Databases. Here are two tables, one representing 25GbE testing and the second 100GbE testing. It should be noted that all of the results shown in both tables are the averages of each performance range tested for that application.

25GbE – Average Performance Gain
ProgramCPSRPS 128BRPS 1KRPS 10K
Redis Open Source81%
Nginx Plus Web143%43%
Netty.io70%
Memcached280%

You might notice that the Software Load Balancers — HAProxy and Nginx Proxy — are not present in the above table, that was because they were not tested at 25GbE. Our understanding from customers is that these applications will be predominately deployed using 100GbE moving forward. In the below table the application platform Netty.io is not present simply because the servers used for this testing did not have 100GbE cards installed at the time.

100GbE – Average Performance Gain
ProgramCPSRPS 128BRPS 1KRPS 10K
Redis Open Source39%
Nginx Plus Web121%65%
Nginx Proxy282%178%
HAProxy370%216%
Memcached280%

We explored Nginx Plus using both 25GbE and 100GbE, and we can see from the two graphs below that 10KB payloads at both speeds are capable of saturating the links. Some interesting observations were made at 25GbE:

  • The kernel requires eight Nginx workers to saturate the link, while only four workers are necessary when using Cloud Onload™. This means that more CPU cycles can be spent in your applications versus serving up web content.
  • With the kernel at 0KB and 1KB payloads the performance scales near linearly.
  • Cloud Onload delivers significant performance gains for both 0KB and 1KB payloads with an average improvement of 143% and 43% respectively.

As we look at 100GbE performance below for Nginx Plus configured as a web server we can see the following:

  • Nginx configured with eight workers running Cloud Onload and testing with 10KB packets can saturate the link. This is only twice the number of workers required to do the same for 25GbE. It’s possible that the load induction system may have reached its limit, and we’ll be revisiting this soon.
  • The limit is still nearly 4.5 million RPS for Cloud Onload so perhaps this testing is being constrained by the application or perhaps the number of requests we’re able to generate into the application. Regardless we’re still demonstrating an average performance gain of 121% for 0KB packets and 65% for the more realistic 1KB packets.

In general, looking at the four application classes tested we can confidently say that Cloud Onload delivers an average performance boost of 100% when using RPS benchmarks moving 128 bytes to 1KB of data. The benefit statements are the result of benchmarking designed to focus on the value of optimizing networking through Cloud Onload kernel bypass. Real-world use cases are not the same as benchmarks and as such the role that networking plays may vary, so your overall measurable benefits may be different. Everything tested is publicly available, and in the case of Cloud Onload, it can be purchased from Solarflare.

If you would like to review our latest Cloud Onload benchmarking documents, please visit www.solarflare.com/cloud-onload