Hypervisors virtual network performance comparison from a Virtualized load-balancer point of view

Introduction

At Exceliance, we edit and sell a Load-Balancer appliance called ALOHA (stands for Application Layer Optimisation and High-Availability).
A few month ago, we managed to make it run on the most common hypervisors available:

  • VMWare (ESX, vsphere)
  • Citrix XenServer
  • HyperV
  • Xen OpenSource
  • KVM

< ADVERTISEMENT>So whatever your hypervisor is, you can runan Aloha on top of it :) </ADVERTISEMENT>

Since a Load-Balancer appliance is Network IO intensive, we thought it was a good opportunity to bench each Hypervisor from a virtual network performance point of view.

Well, more and more companies use Virtualization in their infrastructures, so we guessed that a lot of people would be interested by the results of this bench, that’s why we decided to publish them on our blog.

Things to bear in mind about virtualization

One of the interesting feature of Virtualization is to be able to consolidate several servers onto a single hardware.
As a consequence, the resources (CPU, memory, disk and network IOs) are shared between several virtual machines.
An other issue to take into account is that the Hypervisor is like a new “layer” between the hardware and the OS inside the VM, which means that it may have an impact on the performance.

Purpose of benchmarking Hypervisors

First of all: WE ARE TOTALLY NEUTRAL AND HAVE NO INTEREST SAYING GOOD OR BAD THINGS ABOUT ANY HYPERVISORS.

Our main goal here is to check if each Hypervisor performs well enough to allow us to sell our Virtual Appliance on top of it.
From the tests we’ll run, we want to be able to measure the impact of the Hypervisor on the Virtual Machine performance

Benchmark platform and procedure

To run these tests, we use the same server for all Hypervisors, just swapping the hard-drive, to run each hypervisor independently.

Hypervisor Hardware summarized below:

  • CPU quad core i7 @3.4GHz
  • 16G of memory
  • Network card 1G copper e1000e

NOTE: we benched some other network cards and we got UGLY results. (Cf conclusions)
NOTE: there is a single VM running on the hypervisor: The Aloha.

The Aloha Virtual Appliance used is the Aloha VA 4.2.5 with 1G of memory and 2 vCPUs.
The client and WWW servers are physical machines plugged on the same LAN than the Hypervisor.
The client tool is inject and the web server behind the Aloha VA is httpterm.
So basically, the only thing that will change during these tests is the Hypervisor.

The Aloha is configured in reverse-proxy mode (using HAProxy) between the client and the server, load-balancing and analyzing HTTP requests.
We focused mainly on virtual networking performance: number of HTTP connections per seconds and associated bandwidth.
We ran the benchmark with different object size: 0, 1K, 2K, 4K, 8K, 16K, 32K, 48K, 64K.
NOTE: by “HTTP connection”, we mean a single HTTP request with its response over a single TCP connection, like in HTTP/1.0.

Basically, the 0K object test is used to get the number of connections per second the VA can do and the 64K object is used to measure the maximum bandwidth.

NOTE: the maximum bandwith will be 1G anyway, since we’re limitated by the physical NIC.

We are going to bench Network IO only, since this is the intensive usage a load-balancer does.
We won’t bench disks IOs…

Tested Hypervisors


We benched a native Aloha against Aloha VA embedded in the Hypervisors listed below:

  • HyperV
  • RHEV (KVM based)
  • vshpere 5.0
  • Xen 4.1 on Ubuntu 11.10
  • XenServer 6.0

Benchmark results


Raw server performance (native tests, without any hypervisor)

For the first test, we run the Aloha on the server itself without any Hypervisor.
That way, we’ll have some figures on the capacity of the server itself. We’ll use those numbers later in the article to compare the impact of each Hypervisor on performance.

native_performance

Microsoft HyperV


We tested HyperV on a Windows 2008 r2 server.
For this hypervisor 2 network cards are available:

  1. Legacy network adapteur: which emulates the network layer through the tulip driver.
    ==> With this driver, we got around 1.5K requests per seconds, which is really poor…
  2. Network adapteur: requires the hv_netvsc driver supplied by Microsoft in open source since Linux Kernel 2.6.32.
    ==> this is the driver we used for the tests

hyperv_performance

RHEV 3.0 Beta (KVM based)

RHEV is Red Hat Hypervisor, based on KVM.
For the Virtualization of the Network Layer, RHEV uses the virtio drivers.
Note that RHEV was still in Beta version when running this test.

VMWare Vsphere 5

There are 3 types of network cards available for Vsphere 5.0
1. Intel e1000: e1000 driver, emulates network layer into the VM.
2. VMxNET 2: allows network layer virtualization
3. VMxNET 3: allows network layer virtualization
The best results were obtained with the vmxnet2 driver.

Note: We have not tested Vsphere 4 neither ESX 3.5.

vsphere_performance

Xen OpenSource 4.1 on Ubuntu 11.10

Since CentOS 6.0 does not provide Xen OpenSource in its official repositories, we decided to use the latest (Oneiric Ocelot) Ubuntu server distribution, with Xen 4.1 on top of it.
Xen provides two network interfaces:

  1. emulated one, based on 8139too driver
  2. virtualized network layer, xen-vnif

Of course, the results are much better with xen-vnif, so we’re going to use this driver for the test.

xen41_performance

Citrix Xenserver 6.0


The network driver used for XenServer is the same one than the Xen OpenSource: xen-vnif.

xenserver60_performance

Hypervisors comparison


HTTP connections per second


The graph below summarizes the http connections per second capacity for each Hypervisor.
It shows us the Hypervisor overhead by comparing the light blue line which represents the server capacity without any Hypervisor to each hypervisor line..

http_connections_comparison

Bandwith usage


The graph below summarizes the http connections per second capacity for each Hypervisor.
It shows us the Hypervisor overhead by comparing the light blue line which represents the server capacity without any Hypervisor to each hypervisor line..

bandwith_comparison

Performance loss


Well, comparing Hypervisors to each others is nice, but remember, we wanted to know how much performance was lost in the hypervisor layer.
The graph below shows, in percentage, the loss generated by each hypervisor when compared to the native Aloha.
The highest the percentage, the worste for the hypervisor…

performance_loss_comparison

Conclusion

  • the Hypervisor layer has a non-negligible impact on networking performance on a Virtualized Load-Balancer running in reverse-proxy mode.
    But I guess it would be the same for any VM which is Networking IO intensive
  • The shortest the connections, the biggest the impact is.
    For very long connection like TSE, IMAP, etc… virtualization might make sense
  • Vsphere seems in advanced compared to its competitors on a performance point of view.
  • HyperV and Citrix XenServer have interesting performance.
  • RHEV (KVM) and Xen open source can still improve performance.
    Unless this is related to our procedure.
  • Even if the hardware layer is not accessed by the VM anymore, it still has a huge impact on performance.
    IE, on vsphere, we could not go higher than 20K connections per second with a Realtek NIC in the server…
    With Intel e1000e driver, we got up to 55K connections per second….
    So, even when you use virtualization, hardware counts!

Links

Posted in benchmark, performance | Tagged , , , | 13 Comments

Enhanced SSL load-balancing with Server Name Indication (SNI) TLS extension

Synopsis

Some time ago, we wrote an article which explained how to load-balance SSL services, maintaining affinity using the SSLID.
The main limitation of this kind of architecture is that you must dedicate a public IP address and port per service.
If you’re hosting web or mail services, you could run out of public IP address quickly.

TLS protocol has been extended in 2003, RFC 3546, by an extension called SNI: Server Name Indication, which allows a client to announce in clear the server name it is contacting.

NOTE: two RFC have obsoleted the one above, the latest one is RFC 6066

The Aloha Load-balancer can use this information to choose a backend or a server.
This allows people to share a single VIP for several services.

Of course, we can use SNI switching with SSLID affinity to build a smart and reliable SSL load-balanced platform.

NOTE: Server Name information is sent with each SSL Handshake, whether you’re establishing a new session or you’re resuming an old one.

SNI is independent from the protocol used at layer 7. So basically, it will work with IMAP, HTTP, SMTP, POP, etc…

Limitation

Bear in mind, that in 2012, not all clients are compatible with SNI.
Concerning web browsers, a few of used in 2012 them are still not compatible with this TLS protocol extension.

We strongly recommend you to read the Wikipedia Server Name Indication page which lists all the limitation of this extension.

  • Only HAProxy nightly snapshots from 8th of April are compatible (with no bug knows) with it.
  • Concerning Aloha, it will be available by Aloha Load-balancer firmware 5.0.2.

Diagram

The picture below shows a platform with a single VIP which host services for 2 applications:
sni_loadbalancing

We can use SNI information to choose a backend, then, inside a backend, we can use SSLID affinity.

Configuration

Choose a backend using SNI TLS extension


The configuration below matches names provided by the SNI extention and choose a farm based on it.
In the farm, it provides SSLID affinity.
If no SNI extention is sent, then we redirect the user to a server farm which can be used to tell the user to upgrade its software.

# Adjust the timeout to your needs
defaults
  timeout client 30s
  timeout server 30s
  timeout connect 5s

# Single VIP with sni content switching
frontend ft_ssl_vip
  bind 10.0.0.10:443
  mode tcp
  
  acl application_1 req_ssl_sni -i application1.domain.com
  acl application_2 req_ssl_sni -i application2.domain.com

  use_backend bk_ssl_application_1 if application_1
  use_backend bk_ssl_application_2 if application_2

  default_backend bk_ssl_default

# Application 1 farm description
backend bk_ssl_application_1
  mode tcp
  balance roundrobin

  # maximum SSL session ID length is 32 bytes.
  stick-table type binary len 32 size 30k expire 30m

  acl clienthello req_ssl_hello_type 1
  acl serverhello rep_ssl_hello_type 2

  # use tcp content accepts to detects ssl client and server hello.
  tcp-request inspect-delay 5s
  tcp-request content accept if clienthello

  # no timeout on response inspect delay by default.
  tcp-response content accept if serverhello

  stick on payload_lv(43,1) if clienthello

  # Learn on response if server hello.
  stick store-response payload_lv(43,1) if serverhello

  option ssl-hello-chk
  server server1 192.168.1.1:443 check
  server server2 192.168.1.2:443 check

# Application 2 farm description
backend bk_ssl_application_2
  mode tcp
  balance roundrobin

  # maximum SSL session ID length is 32 bytes.
  stick-table type binary len 32 size 30k expire 30m

  acl clienthello req_ssl_hello_type 1
  acl serverhello rep_ssl_hello_type 2

  # use tcp content accepts to detects ssl client and server hello.
  tcp-request inspect-delay 5s
  tcp-request content accept if clienthello

  # no timeout on response inspect delay by default.
  tcp-response content accept if serverhello

  stick on payload_lv(43,1) if clienthello

  # Learn on response if server hello.
  stick store-response payload_lv(43,1) if serverhello

  option ssl-hello-chk
  server server1 192.168.2.1:443 check
  server server2 192.168.2.2:443 check

# Sorry backend which should invite the user to update its client
backend bk_ssl_default
  mode tcp
  balance roundrobin
  
  # maximum SSL session ID length is 32 bytes.
  stick-table type binary len 32 size 30k expire 30m

  acl clienthello req_ssl_hello_type 1
  acl serverhello rep_ssl_hello_type 2

  # use tcp content accepts to detects ssl client and server hello.
  tcp-request inspect-delay 5s
  tcp-request content accept if clienthello

  # no timeout on response inspect delay by default.
  tcp-response content accept if serverhello

  stick on payload_lv(43,1) if clienthello

  # Learn on response if server hello.
  stick store-response payload_lv(43,1) if serverhello

  option ssl-hello-chk
  server server1 10.0.0.11:443 check
  server server2 10.0.0.12:443 check

Choose a server using SNI: aka SSL routing


The configuration below matches names provided by the SNI extention and choose a server based on it.
If no SNI is provided or we can’t find the expected name, then the traffic is forwarded to server3 which can be used to tell the user to upgrade its software.

# Adjust the timeout to your needs
defaults
  timeout client 30s
  timeout server 30s
  timeout connect 5s

# Single VIP 
frontend ft_ssl_vip
  bind 10.0.0.10:443
  mode tcp
  default_backend bk_ssl_default

# Using SNI to take routing decision
backend bk_ssl_default
  mode tcp

  acl application_1 req_ssl_sni -i application1.domain.com
  acl application_2 req_ssl_sni -i application2.domain.com

  use-server server1 if application_1
  use-server server2 if application_2
  use-server server3 if !application_1 !application_2

  option ssl-hello-chk
  server server1 10.0.0.11:443 check
  server server2 10.0.0.12:443 check
  server server3 10.0.0.13:443 check

Related Links

Links

Posted in architecture, HAProxy, optimization, ssl | Tagged , , , , , | 1 Comment

load balancing, affinity, persistence, sticky sessions: what you need to know

Synopsis

To ensure high availability and performance of Web applications, it is now common to use a load-balancer.
While some people uses layer 4 load-balancers, it can be sometime recommended to use layer 7 load-balancers to be more efficient with HTTP protocol.

NOTE: To understand better the difference between such load-balancers, please read the Load-Balancing FAQ.

A load-balancer in an infrastructure

The picture below shows how we usually install a load-balancer in an infrastructure:
reverse_proxy
This is a logical diagram. When working at layer 7 (aka Application layer), the load-balancer acts as a reverse proxy.
So, from a physical point of view, it can be plugged anywhere in the architecture:

  • in a DMZ
  • in the server LAN
  • as front of the servers, acting as the default gateway
  • far away in an other separated datacenter

Why does load-balancing web application is a problem????


Well, HTTP is not a connected protocol: it means that the session is totally independent from the TCP connections.
Even worst, an HTTP session can be spread over a few TCP connections…
When there is no load-balancer involved, there won’t be any issues at all, since the single application server will be aware the session information of all users, and whatever the number of client connections, they are all redirected to the unique server.
When using several application servers, then the problem occurs: what happens when a user is sending requests to a server which is not aware of its session?
The user will get back to the login page since the application server can’t access his session: he is considered as a new user.

To avoid this kind of problem, there are several ways:

  • Use a clustered web application server where the session are available for all the servers
  • Sharing user’s session information in a database or a file system on application servers
  • Use IP level information to maintain affinity between a user and a server
  • Use application layer information to maintain persistance between a user and a server

NOTE: you can mix different technc listed above.

Building a web application cluster

Only a few products on the market allow administrators to create a cluster (like Weblogic, tomcat, jboss, etc…).
I’ve never configured any of them, but from Administrators I talk too, it does not seem to be an easy task.
By the way, for Web applications, clustering does not mean scaling. Later, I’ll write an article explaining while even if you’re clustering, you still may need a load-balancer in front of your cluster to build a robust and scalable application.

Sharing user’s session in a database or a file system

This Technic applies to application servers which has no clustering features, or if you don’t want to enable cluster feature from.
It is pretty simple, you choose a way to share your session, usually a file system like NFS or CIFS, or a Database like MySql or SQLServer or a memcached then you configure each application server with proper parameters to share the sessions and to access them if required.
I’m not going to give any details on how to do it here, just google with proper keywords and you’ll get answers very quickly.

IP source affinity to server

An easy way to maintain affinity between a user and a server is to use user’s IP address: this is called Source IP affinity.
There are a lot of issues doing that and I’m not going to detail them right now (TODO++: an other article to write).
The only thing you have to know is that source IP affinity is the latest method to use when you want to “stick” a user to a server.
Well, it’s true that it will solve our issue as long as the user use a single IP address or he never change his IP address during the session.

Application layer persistence

Since a web application server has to identify each users individually, to avoid serving content from a user to an other one, we may use this information, or at least try to reproduce the same behavior in the load-balancer to maintain persistence between a user and a server.
The information we’ll use is the Session Cookie, either set by the load-balancer itself or using one set up by the application server.

What is the difference between Persistence and Affinity

Affinity: this is when we use an information from a layer below the application layer to maintain a client request to a single server
Persistence: this is when we use Application layer information to stick a client to a single server
sticky session: a sticky session is a session maintained by persistence

The main advantage of the persistence over affinity is that it’s much more accurate, but sometimes, Persistence is not doable, so we must rely on affinity.

Using persistence, we mean that we’re 100% sure that a user will get redirected to a single server.
Using affinity, we mean that the user may be redirected to the same server…

What is the interraction with load-balancing???


In load-balancer you can choose between several algorithms to pick up a server from a web farm to forward your client requests to.
Some algorithm are deterministic, which means they can use a client side information to choose the server and always send the owner of this information to the same server. This is where you usually do Affinity ;) IE: “balance source”
Some algorithm are not deterministic, which means they choose the server based on internal information, whatever the client sent. This is where you don’t do any affinity nor persistence :) IE: “balance roundrobin” or “balance leastconn”
I don’t want to go too deep in details here, this can be the purpose of a new article about load-balancing algorithms…

You may be wondering: “we have not yet speak about persistence in this chapter”. That’s right, let’s do it.
As we saw previously, persistence means that the server can be chosen based on application layer information.
This means that persistence is an other way to choose a server from a farm, as load-balancing algorithm does.

Actually, session persistence has precedence over load-balancing algorithm.
Let’s show this on a diagram:

      client request
            |
            V
    HAProxy Frontend
            |
            V
     backend choice
            |
            V
    HAproxy Backend
            |
            V
Does the request contain
persistence information ---------
            |                   |
            | NO                |
            V                   |
    Server choice by            |  YES
load-balancing algorithm        |
            |                   |
            V                   |
   Forwarding request <----------
      to the server

Which means that when doing session persistence in a load balancer, the following happens:

  • the first user’s request comes in without session persistence information
  • the request bypass the session persistence server choice since it has no session persistence information
  • the request pass through the load-balancing algorithm, where a server is chosen and affected to the client
  • the server answers back, setting its own session information
  • depending on its configuration, the load-balancer can either use this session information or setup its own before sending the response back to the client
  • the client sends a second request, now with session information he learnt during the first request
  • the load-balancer choose the server based on the client side information
  • the request DOES NOT PASS THROUGH the load-balancing algorithm
  • the server answers the request

and so on…

At Exceliance we say that “Persistence is a exception to load-balancing“.
And the demonstration is just above.

Affinity configuration in HAProxy / Aloha load-balancer

The configuration below shows how to do affinity within HAProxy, based on client IP information:

frontend ft_web
  bind 0.0.0.0:80
  default_backend bk_web

backend bk_web
  balance source
  hash-type consistent # optional
  server s1 192.168.10.11:80 check
  server s2 192.168.10.21:80 check

Web application persistence

In order to provide persistence at application layer, we usually use Cookies.
As explained previously, there are two ways to provide persistence using cookies:

  • Let the load-balancer set up a cookie for the session.
  • Using application cookies, such as ASP.NET_SessionId, JSESSIONID, PHPSESSIONID, or any other chosen name

Session cookie setup by the Load-Balancer

The configuration below shows how to configure HAProxy / Aloha load balancer to inject a cookie in the client browser:

frontend ft_web
  bind 0.0.0.0:80
  default_backend bk_web

backend bk_web
  balance roundrobin
  cookie SERVERID insert indirect nocache
  server s1 192.168.10.11:80 check cookie s1
  server s2 192.168.10.21:80 check cookie s2

Two things to notice:

  1. the line “cookie SERVERID insert indirect nocache”:
    This line tells HAProxy to setup a cookie called SERVERID only if the user did not come with such cookie. It is going to append a “Cache-Control: nocache” as well, since this type of traffic is supposed to be personnal and we don’t want any shared cache on the internet to cache it
  2. the statement “cookie XXX” on the server line definition:
    It provides the value of the cookie inserted by HAProxy. When the client comes back, then HAProxy knows directly which server to choose for this client.

So what happens?

  1. At the first response, HAProxy will send the client the following header, if the server chosen by the load-balancing algorithm is s1:
    Set-Cookie: SERVERID=s1
  2. For the second request, the client will send a request containing the header below:
    Cookie: SERVERID=s1

Basically, this kind of configuration is compatible with active/active Aloha load-balancer cluster configuration.

Using application session cookie for persistence

The configuration below shows how to configure HAProxy / Aloha load balancer to use the cookie setup by the application server to maintain affinity between a server and a client:

frontend ft_web
  bind 0.0.0.0:80
  default_backend bk_web

backend bk_web
  balance roundrobin
  cookie JSESSIONID prefix indirect nocache
  server s1 192.168.10.11:80 check cookie s1
  server s2 192.168.10.21:80 check cookie s2

Just replace JSESSIONID by your application cookie. It can be anything, like the default ones from PHP and IIS: PHPSESSID and ASP.NET_SessionId.

So what happens?

  1. At the first response, the server will send the client the following header
    Set-Cookie: JSESSIONID=i12KJF23JKJ1EKJ21213KJ
  2. when passing through HAProxy, the cookie is modified like this:
    Set-Cookie: JSESSIONID=s1~i12KJF23JKJ1EKJ21213KJ

    Note that the Set-Cookie header has been prefixed by the server cookie value (“s1″ in this case) and a “~” is used as a separator between this information and the cookie value.
  3. For the second request, the client will send a request containing the header below:
    Cookie: JSESSIONID=s1~i12KJF23JKJ1EKJ21213KJ
  4. HAProxy will clean it up on the fly to set it up back like the origin:
    Cookie: JSESSIONID=i12KJF23JKJ1EKJ21213KJ

Basically, this kind of configuration is compatible with active/active Aloha load-balancer cluster configuration.

What happens when my server goes down???

When doing persistence, if a server goes down, then HAProxy will redispatch the user to an other server.
Since the user will get connected on a new server, then this one may not be aware of the session, so be redirected to the login page.
But this is not a load-balancer problem, this is related to the application server farm.

Links

Posted in HAProxy, layer7, optimization | Tagged , , , | 1 Comment

Use a load-balancer as a first row of defense against DDOS

We’ve seen recently more and more DOS and DDOS attacks. Some of them were very big, requiring thousands of computers…
But in most cases, this kind of attacks are made by a few computers aiming to make a service or website unavailable, either by sending it too many requests or by taking all its available resources, preventing regular users to use the service.
Some attacks targets known vulnerabilities of widely used applications.

In the present article, we’ll explain how to take advantage of an application delivery controller to protect your website and application against DOS, DDOS and vulnerability scans.

Why using a LB for such protection since a firewall and a Web Application Firewall (aka WAF) could already do the job?
Well, the Firewall is not aware of the application layer but would be useful to pretect against SYN flood attacks. That’s why we saw recently application layer firewalls: Web Application Firewalls, also known as WAF.
Well, since the load balancer is in front of the platform, it can be a good partner for the WAF, filtering out 99% of the attacks, which are managed by script kiddies. The WAF can then happily clean up the remaining attacks.
Well, maybe you don’t need a WAF and you want to take advantage of your Aloha and save some money ;) .

Note that you need an application layer load-balancer, like Aloha or OpenSource HAProxy to be efficient.

TCP syn flood attacks


The syn flood attacks consist in sending as many TCP syn packets as possible to a single server trying to saturate it or at least, saturating its uplink bandwith.

If you’re using the Aloha load-balancer, you’re already protected against this kind of attacks: the Aloha includes mechanism to protect you.
The TCP syn flood attack mitigation capacity may vary depending on your Aloha box.

It you’re running your own LB based on HAProxy or HAPee, you should have a look at the sysctl below (edit /etc/sysctl.conf or play with sysctl command):

# Protection SYN flood
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.tcp_max_syn_backlog = 1024 

Note: If the attack is very big and saturates your internet bandwith, the only solution is to ask your internet access provider to null route the attackers IPs on its core network.

Slowloris like attacks


For this kind of attack, the clients will send very slowly their requests to a server: header by header, or even worst character by character, waiting long time between each of them.
The server have to wait until the end of the request to process it and send back its response.
The purpose of this attack is to prevent regular users to use the service, since the attacker would be using all the available resources with very slow queries.

In order to protect your website against this kind of attack, just setup the HAProxy option “timeout http-request”.
You can set it up to 5s, which is long enough.
It tells HAProxy to let five seconds to a client to send its whole HTTP request, otherwise HAProxy would shut the connection with an error.

For example:

# On Aloha, the global section is already setup for you
# and the haproxy stats socket is available at /var/run/haproxy.stats
global
  stats socket ./haproxy.stats level admin

defaults
  option http-server-close
  mode http
  timeout http-request 5s
  timeout connect 5s
  timeout server 10s
  timeout client 30s

listen stats
  bind 0.0.0.0:8880
  stats enable
  stats hide-version
  stats uri     /
  stats realm   HAProxy\ Statistics
  stats auth    admin:admin

frontend ft_web
  bind 0.0.0.0:8080

  # Spalreadylit static and dynamic traffic since these requests have different impacts on the servers
  use_backend bk_web_static if { path_end .jpg .png .gif .css .js }

  default_backend bk_web

# Dynamic part of the application
backend bk_web
  balance roundrobin
  cookie MYSRV insert indirect nocache
  server srv1 192.168.1.2:80 check cookie srv1 maxconn 100
  server srv2 192.168.1.3:80 check cookie srv2 maxconn 100

# Static objects
backend bk_web_static
  balance roundrobin
  server srv1 192.168.1.2:80 check maxconn 1000
  server srv2 192.168.1.3:80 check maxconn 1000

To test this configuration, simply open a telnet to the frontend port and wait for 5 seconds:

telnet 127.0.0.1 8080
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
HTTP/1.0 408 Request Time-out
Cache-Control: no-cache
Connection: close
Content-Type: text/html

<h1>408 Request Time-out</h1>
Your browser didn't send a complete request in time.

Connection closed by foreign host.

Unfair users, AKA abusers


By unfair users, I mean users (or scripts) which have an abnormal behavior on your website:

  • too many connections opened
  • new connection rate too high
  • http request rate too high
  • bandwith usage too high
  • client not respecting RFCs (IE for SMTP)

How does a regular browser works?


Before trying to protect your website from weird behavior, we have to define what a “normal” behavior is!
This paragraphs gives the main lines of how a browser works and there may be some differences between browsers.

So, when one wants to browse a website, we use a browser: Chrome, Firefox, Internet Explorer, Opera are the most famous ones.
After typing the website name in the URL bar, the browser will look like for the IP address of your website.
Then it will establish a tcp connection to the server, downloading the main page, analyze its content and follow its links from the HTML code to get the objects required to build the page: javascript, css, images, etc…
To get the objects, it may open up to 6 or 7 TCP connections per domain name.
Once it has finished to download the objects, it starts aggregating everything then print out the page.

Limiting the number of connections per users


As seen before, a browser opens up 5 to 7 TCP connections to a website when it wants to download objetcs and they are opened quite quickly.
One can consider that somebody having more than 10 connections opened is not a regular user.
The configuration below shows how to do this limitation in the Aloha and HAProxy:

This configuration also applies to any kind of TCP based application.

The most important lines are from 25 to 32.

# On Aloha, the global section is already setup for you
# and the haproxy stats socket is available at /var/run/haproxy.stats
global
  stats socket ./haproxy.stats level admin

defaults
  option http-server-close
  mode http
  timeout http-request 5s
  timeout connect 5s
  timeout server 10s
  timeout client 30s

listen stats
  bind 0.0.0.0:8880
  stats enable
  stats hide-version
  stats uri     /
  stats realm   HAProxy\ Statistics
  stats auth    admin:admin

frontend ft_web
  bind 0.0.0.0:8080

  # Table definition  
  stick-table type ip size 100k expire 30s store conn_cur

  # Allow clean known IPs to bypass the filter
  tcp-request connection accept if { src -f /etc/haproxy/whitelist.lst }
  # Shut the new connection as long as the client has already 10 opened 
  tcp-request connection reject if { src_conn_cur ge 10 }
  tcp-request connection track-sc1 src

  # Split static and dynamic traffic since these requests have different impacts on the servers
  use_backend bk_web_static if { path_end .jpg .png .gif .css .js }

  default_backend bk_web

# Dynamic part of the application
backend bk_web
  balance roundrobin
  cookie MYSRV insert indirect nocache
  server srv1 192.168.1.2:80 check cookie srv1 maxconn 100
  server srv2 192.168.1.3:80 check cookie srv2 maxconn 100

# Static objects
backend bk_web_static
  balance roundrobin
  server srv1 192.168.1.2:80 check maxconn 1000
  server srv2 192.168.1.3:80 check maxconn 1000

  • NOTE: if several domain name points to your frontend, then you may want to increase the conn_cur limit. (Remember a browser opens its 5 to 7 TCP connections per domain name).
  • NOTE2: if several users are hidden behind the same IP (NAT or proxy), this configuration may have a negative impact for them. You can whitelist these IPs.

Testing the configuration

run an apache bench to open 10 connections and doing request on these connections:

ab -n 50000000 -c 10 http://127.0.0.1:8080/

Watch the table content on the haproxy stats socket:

echo "show table ft_web" | socat unix:./haproxy.stats -
# table: ft_web, type: ip, size:102400, used:1
0x7afa34: key=127.0.0.1 use=10 exp=29994 conn_cur=10

Let’s try to open an eleventh connection using telnet:

telnet 127.0.0.1 8080
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Connection closed by foreign host.

Basically, opened connections can keep on working, while a new one can’t be established.

Limiting the connection rate per user


In the previous chapter, we’ve seen how to protect ourselves from somebody who wants to open more than X connections at the same time.
Well, this is good, but something which may kill performance would to allow somebody to open and close a lot of tcp connections over a short period of time.
As we’ve seen previously, a browser will open up to 7 TCP connections in a very short period of time (a few seconds). One can consider that somebody having more than 20 connections opened over a period of 3 seconds is not a regular user.
The configuration below shows how to do this limitation in the Aloha and HAProxy:

This configuration also applies to any kind of TCP based application.

The most important lines are from 25 to 32.

# On Aloha, the global section is already setup for you
# and the haproxy stats socket is available at /var/run/haproxy.stats
global
  stats socket ./haproxy.stats level admin

defaults
  option http-server-close
  mode http
  timeout http-request 5s
  timeout connect 5s
  timeout server 10s
  timeout client 30s

listen stats
  bind 0.0.0.0:8880
  stats enable
  stats hide-version
  stats uri     /
  stats realm   HAProxy\ Statistics
  stats auth    admin:admin

frontend ft_web
  bind 0.0.0.0:8080

  # Table definition  
  stick-table type ip size 100k expire 30s store conn_rate(3s)

  # Allow clean known IPs to bypass the filter
  tcp-request connection accept if { src -f /etc/haproxy/whitelist.lst }
  # Shut the new connection as long as the client has already 10 opened 
  tcp-request connection reject if { src_conn_rate ge 10 }
  tcp-request connection track-sc1 src

  # Split static and dynamic traffic since these requests have different impacts on the servers
  use_backend bk_web_static if { path_end .jpg .png .gif .css .js }

  default_backend bk_web

# Dynamic part of the application
backend bk_web
  balance roundrobin
  cookie MYSRV insert indirect nocache
  server srv1 192.168.1.2:80 check cookie srv1 maxconn 100
  server srv2 192.168.1.3:80 check cookie srv2 maxconn 100

# Static objects
backend bk_web_static
  balance roundrobin
  server srv1 192.168.1.2:80 check maxconn 1000
  server srv2 192.168.1.3:80 check maxconn 1000

  • NOTE2: if several users are hidden behind the same IP (NAT or proxy), this configuration may have a negative impact for them. You can whitelist these IPs.

Testing the configuration


run 10 requests with ApacheBench, everything may be fine:

ab -n 10 -c 1 -r http://127.0.0.1:8080/

Using socat we can watch this traffic in the stick-table:

# table: ft_web, type: ip, size:102400, used:1
0x11faa3c: key=127.0.0.1 use=0 exp=28395 conn_rate(3000)=10

Running a telnet to run a eleventh request and the connections get closed:

telnet 127.0.0.1 8080
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Connection closed by foreign host.

Limiting the HTTP request rate


Even if in the previous examples, we were using HTTP as the protocol, we based our protection on layer 4 information: number or opening rate of TCP connections.
An attacker could respect the number of connection we would set by emulating the behavior of a regular browser.
Now, let’s go deeper and see what we can do on HTTP protocol.

The configuration below tracks HTTP request rate per user on the backend side, blocking abusers on the frontend side if the backend detects abuse.

# On Aloha, the global section is already setup for you
# and the haproxy stats socket is available at /var/run/haproxy.stats
global
  stats socket ./haproxy.stats level admin

defaults
  option http-server-close
  mode http
  timeout http-request 5s
  timeout connect 5s
  timeout server 10s
  timeout client 30s

listen stats
  bind 0.0.0.0:8880
  stats enable
  stats hide-version
  stats uri     /
  stats realm   HAProxy\ Statistics
  stats auth    admin:admin

frontend ft_web
  bind 0.0.0.0:8080

  # Use General Purpose Couter (gpc) 0 in SC1 as a global abuse counter
  # Monitors the number of request sent by an IP over a period of 10 seconds
  stick-table type ip size 1m expire 10s store gpc0,http_req_rate(10s)
  tcp-request connection track-sc1 src
  tcp-request connection reject if { src_get_gpc0 gt 0 }

  # Split static and dynamic traffic since these requests have different impacts on the servers
  use_backend bk_web_static if { path_end .jpg .png .gif .css .js }

  default_backend bk_web

# Dynamic part of the application
backend bk_web
  balance roundrobin
  cookie MYSRV insert indirect nocache

  # If the source IP sent 10 or more http request over the defined period, 
  # flag the IP as abuser on the frontend
  acl abuse src_http_req_rate(ft_web) ge 10
  acl flag_abuser src_inc_gpc0(ft_web)
  tcp-request content reject if abuse flag_abuser

  server srv1 192.168.1.2:80 check cookie srv1 maxconn 100
  server srv2 192.168.1.3:80 check cookie srv2 maxconn 100

# Static objects
backend bk_web_static
  balance roundrobin
  server srv1 192.168.1.2:80 check maxconn 1000
  server srv2 192.168.1.3:80 check maxconn 1000

  • NOTE: if several users are hidden behind the same IP (NAT or proxy), this configuration may have a negative impact for them. You can whitelist these IPs.

Testing the configuration

run 10 requests with ApacheBench, everything may be fine:

ab -n 10 -c 1 -r http://127.0.0.1:8080/

Using socat we can watch this traffic in the stick-table:

# table: ft_web, type: ip, size:1048576, used:1
0xbebbb0: key=127.0.0.1 use=0 exp=8169 gpc0=1 http_req_rate(10000)=10

Running a telnet to run a eleventh request and the connections get closed:

telnet 127.0.0.1 8080
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Connection closed by foreign host.

Detecting vulnerability scans

Vulnerability scans could generate different kind of errors which can be tracked by Aloha and HAProxy:

  • invalid and truncated requests
  • denied or tarpitted requests
  • failed authentications
  • 4xx error pages

HAProxy is able to monitor an error rate per user then can take decision based on it.

# On Aloha, the global section is already setup for you
# and the haproxy stats socket is available at /var/run/haproxy.stats
global
  stats socket ./haproxy.stats level admin

defaults
  option http-server-close
  mode http
  timeout http-request 5s
  timeout connect 5s
  timeout server 10s
  timeout client 30s

listen stats
  bind 0.0.0.0:8880
  stats enable
  stats hide-version
  stats uri     /
  stats realm   HAProxy\ Statistics
  stats auth    admin:admin

frontend ft_web
  bind 0.0.0.0:8080

  # Use General Purpose Couter 0 in SC1 as a global abuse counter
  # Monitors the number of errors generated by an IP over a period of 10 seconds
  stick-table type ip size 1m expire 10s store gpc0,http_err_rate(10s)
  tcp-request connection track-sc1 src
  tcp-request connection reject if { src_get_gpc0 gt 0 }

  # Split static and dynamic traffic since these requests have different impacts on the servers
  use_backend bk_web_static if { path_end .jpg .png .gif .css .js }

  default_backend bk_web

# Dynamic part of the application
backend bk_web
  balance roundrobin
  cookie MYSRV insert indirect nocache

  # If the source IP generated 10 or more http request over the defined period, 
  # flag the IP as abuser on the frontend
  acl abuse src_http_err_rate(ft_web) ge 10
  acl flag_abuser src_inc_gpc0(ft_web)
  tcp-request content reject if abuse flag_abuser

  server srv1 192.168.1.2:80 check cookie srv1 maxconn 100
  server srv2 192.168.1.3:80 check cookie srv2 maxconn 100

# Static objects
backend bk_web_static
  balance roundrobin
  server srv1 192.168.1.2:80 check maxconn 1000
  server srv2 192.168.1.3:80 check maxconn 1000

Testing the configuration

run an apache bench, pointing it on a purposely wrong URL:

ab -n 10 http://127.0.0.1:8080/dlskfjlkdsjlkfdsj

Watch the table content on the haproxy stats socket:

echo "show table ft_web" | socat unix:./haproxy.stats -
# table: ft_web, type: ip, size:1048576, used:1
0x8a9770: key=127.0.0.1 use=0 exp=5866 gpc0=1 http_err_rate(10000)=11

Let’s try to run the same ab command and let’s get the error:

apr_socket_recv: Connection reset by peer (104)

which means that HAProxy has blocked the IP address

Notes

  • We could combine configuration example above together to improve protection. This will be described later in an other article
  • The numbers provided in the examples may be different for your application and architecture. Bench your configuration properly before applying in production.

Related articles

Links

Posted in Aloha, HAProxy, security | Tagged , , | 9 Comments

Web traffic limitation

Synopsis

For different reason, we may want to limit the number of connections or the number of requests we allow to a web farm.
In example:

  • give more capacity to authenticated users compared to anonymous one
  • limit web farm users per virtualhost
  • protect your website from spiders
  • etc…

Basically, we’ll manage two webfarm, one with as much as capacity as we need, and an other one where we’ll redirect people we want to slow down.
The routing decision can be taken using a header, a cookie, a part of the url, source IP address, etc…

Configuration

The configuration below would do the job.

There are only two webservers in the farm, but we want to slow down some virtual host or old and almost never used applications in order to protect and let more capacity to the regular traffic.

you can play with the inspect-delay time to be more or less aggressive.

frontend www
  bind :80
  mode http
  acl spiderbots hdr_cnt(User-Agent) eq 0
  acl personnal hdr(Host) www.personnalwebsite.tld www.oldname.tld
  acl oldies path_beg /old /foo /bar
  use_backend limitated_www if spiderbots or personnal or oldies
  default_backend www

backend www
 mode http
 server be1  192.168.0.1:80 check maxconn 100
 server be1  192.168.0.2:80 check maxconn 100

backend limitated_www
 mode http
 acl too_fast be_sess_rate gt 10
 acl too_many be_conn gt 10
 tcp-request inspect-delay 3s
 tcp-request content accept if ! too_fast or ! too_many
 tcp-request content accept if WAIT_END
 server be1  192.168.0.1:80 check maxconn 100
 server be1  192.168.0.2:80 check maxconn 100

Results

Without the example above, an apache bench would be able to go up to 3600 req/s on the regular farm and only 9 req/s on the limitated one.

Related articles

Links

Posted in Aloha, HAProxy, optimization | Tagged , , , , | 2 Comments

Fight spam with early talking detection

Synopsis

A good way to improve efficiency against spammers is to use early talking detection:

  1. you own a SMTP relay platform and you want to improve its efficiency on spam fighting
  2. your current MTA has no early talking detection feature and you want to be able to add one.
  3. you want to offload the early talking detection feature from your SMTP server to an other device in your architecture.

What is early talking detection?

The picture below shows the SMTP “hello phase”, in 4 steps:
smtp_helo_phase

  • Step 1: the client gets connected to the SMTP server
  • Step 2: the server acknowledge the SMTP connection by a “220 service ready” message
  • Step 3: the client sends his identity (basicaly his domain name)
  • Step 4: the server welcomes the client and the client is now allowed to send a mail

Usually, spammers have no time to waste, so they do the TCP connection then send directly the HELO packet to the server.
The Aloha can hold the connection on the client side and monitors it for a few seconds.
Then you have 2 options:

  1. If the client “speaks” first, it means it’s a spammer, so the connection can be closed safely. No need to bother the server with it.
  2. If the client waits for the SMTP banner during the observation period, it means it looks to be a regular user. So we may accept its connection and let the client and server speaks together.

This way of listening on the connection to detect if client talks first is called early talking detection.

Diagram

In order to use the Aloha in front of a public SMTP relay platform, it’s recommanded to configure it in reverse-proxy transparent mode, also known as Destination NAT.
That way, the SMTP servers will be aware of the client IP address.
In that mode, the default gateway of the server must be the Aloha, or you must configure Policy Based Routing to redirect traffic from the SMTP server source port 25 to the Aloha.
smtp_diagram

Configuration

In the Aloha, use the Layer7 (HAProxy) load-balancing tab and apply the configuration below:

frontend ft_smtp
  mode tcp
  bind :25
  source 0.0.0.0 usesrc clientip
  log global
  option tcplog
# reject SMTP connection if client speaks first before 30s
  tcp-request inspect-delay 30s
  acl content_present req_len gt 0
  tcp-request content reject if content_present
  default_backend bk_smtp

backend bk_smtp
  mode tcp
  balance roundrobin
  log global
  option tcplog
# SMTP health check
  option smtpchk HELO mydomain.com
  default-server inter 3s rise 2 fall 3
  server smtp1 10.0.0.1:25 check
  server smtp2 10.0.0.2:25 check

Links

Updates:
(1) renaming and updating the article to describe early talking detection and not grey listing

Posted in Aloha, architecture, security | 1 Comment

Scaling out SSL

Synopsis

We’ve seen recently how we could scale up SSL performance.
But what about scaling out SSL performance?
Well, thanks to Aloha and HAProxy, it’s easy to manage smartly a farm of SSL accelerator servers, using persistence based on the SSL Session ID.
This way of load-balancing is smart, but in case of SSL accelerator failure, other servers in the farm would have a CPU overhead to generate SSL Session IDs for sessions re-balanced by the Aloha.

After a talk with (the famous) emericbr, Exceliance dev team leader, he decided to write a patch for stud to add a new feature: sharing SSL session between different stud processes.
That way, in case of SSL accelerator failure, the servers getting re-balanced sessions would not have to generate a new SSL session.

Emericbr’s patch is available here: https://github.com/bumptech/stud/pull/50
At the end of this article, you’ll learn how to use it.

Stud SSL Session shared caching

Description

As we’ve seen in our article on SSL performance, a good way to improve performance on SSL is to use a SSL Session ID cache.

The idea here, is to use this cache as well as sending updates into a shared cache one can consult to get the SSL Session ID and the data associated to it.

As a consequence, there are 2 levels of cache:

      * Level 1: local process cache, with the currently used SSL session
      * Level 2: shared cache, with the SSL session from all local cache

Way of working

The protocol understand 3 types of request:

      * New: When a process generates a new session, it updates its local cache then the shared cache
      * Get: When a client tries to resume a session and the process receiving it is not aware of it, then the process tries to get it from the shared cache
      * Del: When a session has expired or there is a bad SSL ID sent by a client, then the process will delete the session from the shared cache

Who does what?

Stud has a Father/Son architecture.
The Father starts up then starts up Sons. The Sons bind the TCP external ports, load the certificate and process the SSL requests.
Each son manages its local cache and send the updates to the shared cache. The Father manages the shared cache, receiving the changes and maintaining it up to date.

How are the updates exchanged?

Updates are sent either on Unicast or Multicast, on a specified UDP port.
Updates are compatible both IPv4 and IPv6.
Each packet are signed by an encrypted signature using the SSL certificate, to avoid cache poisoning.

What does a packet look like?


SSL Session ID ASN-1 of SSL Session structure Timestamp Signature
[32 bytes] [max 512 bytes] [4 bytes] [20 bytes]

Note: the SSL Session ID field is padded with 0 if required

Diagram

Let’s show this on a nice picture where each potato represents each process memory area.
stud_shared_cache
Here, the son on host 1 got a new SSL connection to process, since he could not find it in its cache and in the shared cache, he generated the asymmetric key, then push it to his shared cache and the father on host 2 which updates the shared cache for this host.
That way, if this user is routed to any stud son process, he would not have to compute again its asymmetric key.

Let’s try Stud shared cache

Installation:

git clone https://github.com/EmericBr/stud.git
cd stud
wget http://1wt.eu/tools/ebtree/ebtree-6.0.6.tar.gz
tar xvzf ebtree-6.0.6.tar.gz
ln -s ebtree-6.0.6 ebtree
make USE_SHARED_CACHE=1

Generate a key and a certificate, add them in a single file.

Now you can run stud:

sudo ./stud -n 2 -C 10000 -U 10.0.3.20,8888 -P 10.0.0.17 -f 10.0.3.20,443 -b 10.0.0.3,80 cert.pem

and run a test:

curl --noproxy \* --insecure -D - https://10.0.3.20:443/

And you can watch the synchronization packets:

$ sudo tcpdump -n -i any port 8888
[sudo] password for bassmann: 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes

17:47:10.557362 IP 10.0.3.20.8888 > 10.0.0.17.8888: UDP, length 176
17:49:04.592522 IP 10.0.3.20.8888 > 10.0.0.17.8888: UDP, length 176
17:49:05.476032 IP 10.0.3.20.8888 > 10.0.0.17.8888: UDP, length 176

Related links

Posted in benchmark, optimization, ssl | Tagged , , , | 3 Comments