At Exceliance, we do provide professional services around HAPRoxy: this includes HAProxy itself, of course, but as well the underlying OS tuning, advice and recommendation about the architecture and sometimes we can also help customers troubleshooting application layer issues.
We don’t fix issues for the customer, but using information provided by HAProxy, we are able to reduce the investigation area, saving customer’s time and money.
The story I’m relating today is issued of one of this PS.
One of our customer is an hosting company which hosts some very busy PHP / MySQL websites. They used successfully HAProxy in front of their application servers.
They used to have a single MySQL server which was some kind of SPOF and which had to handle several thousands requests per seconds.
Sometimes, they had issues with this DB: it was like the clients (hence the Web servers) can’t hangs when using the DB.
So they decided to use MySQL replication and build an active/passive cluster. They also decided to split reads (SELECT queries) and writes (DELETE, INSERT, UPDATE queries) at the application level.
Then they were able to move the MySQL servers behind HAProxy.
Enough for the introduction Today’s article will discuss about HAProxy and MySQL at high request rate, and an error some of you may already have encountered: TCP source port exhaustion (the famous high number of sockets in TIME_WAIT).
So basically, we have here a standard web platform which involves HAProxy to load-balance MySQL:
The MySQL Master server is used to send WRITE requests and the READ request are “weighted-ly” load-balanced (the slaves have a higher weight than the master) against all the MySQL servers.
One way of scaling MySQL, is to use the replication method: one MySQL server is designed as master and must manages all the write operations (DELETE, INSERT, UPDATE, etc…). for each operation, it notifies all the MySQL slave servers. We can use salves for reading only, offloading these type of requests from the master.
IMPORTANT NOTE: The replication method allows scalability of the read part, so if your application require much more writes, then this is not the method for you.
Of course, one MySQL slave server can be designed as master when the master fails! This also ensure MySQL high-availability.
So, where is the problem ???
This type of platform works very well for the majority of websites. The problem occurs when you start having a high rate of requests. By high, I mean several thousands per second.
TCP source port exhaustion
HAProxy works as a reverse-proxy and so uses its own IP address to get connected on the server.
Any system have available around 64K source TCP ports to get connected to a remote IP:port. Once a combination of “source IP:port => dst IP:port” is in use, it can’t be re-used.
First lesson: you can’t have more than 64K opened connections from a HAProxy box to a single remote IP:port couple. I think only people load-balancing MS Exchange RPC services or sharepoint with NTLM may one day reach this limit…
(well, it is possible to workaround this limit using some hacks we’ll explain later in this article)
Why does TCP port exhaustion occur with MySQL clients???
As I said, the MySQL request rate was a few thousands per second, so we never ever reach this limit of 64K simultaneous opened connection to the remote service…
What’s up then???
Well, there is an issue with MySQL client library: when a client sends its “QUIT” sequence, it performs a few internal operations before immediately shutting down the TCP connection, without waiting for the server to do it. A basic tcpdump will show it to you easily.
Note that you won’t be able to reproduce this issue on a loopback interface, because the server answers fast enough… You must use a LAN connection and 2 different servers.
Basically, here is the sequence currently performed by a MySQL client:
Mysql Client ==> "QUIT" sequence ==> Mysql Server Mysql Client ==> FIN ==> MySQL Server Mysql Client <== FIN ACK <== MySQL Server Mysql Client ==> ACK ==> MySQL Server
Which leads the client connection to remain unavailable for twice the MSL (Maximum Segment Life) time, which means 2 minutes.
Note: this type of close has no negative impact when the connection is made over a UNIX socket.
Explication of the issue (much better that I could explain it myself):
“There is no way for the person who sent the first FIN to get an ACK back for that last ACK. You might want to reread that now. The person that initially closed the connection enters the TIME_WAIT state; in case the other person didn’t really get the ACK and thinks the connection is still open. Typically, this lasts one to two minutes.” (Source)
Since the source port is unavailable for the system for 2 minutes, this means that over 534 MySQL requests per seconds you’re in danger of TCP source port exhaustion: 64000 (available ports) / 120 (number of seconds in 2 minutes) = 533.333.
This TCP port exhaustion appears on the MySQL client server itself, but as well on the HAProxy box because it forwards the client traffic to the server… And since we have many web servers, it happens much faster on the HAProxy box !!!!
Remember: at spike traffic, my customer had a few thousands requests/s….
How to avoid TCP source port exhaustion?
Here comes THE question!!!!
First, a “clean” sequence should be:
Mysql Client ==> "QUIT" sequence ==> Mysql Server Mysql Client <== FIN <== MySQL Server Mysql Client ==> FIN ACK ==> MySQL Server Mysql Client <== ACK <== MySQL Server
Actually, this sequence happens when both MySQL client and server are hosted on the same box and uses the loopback interface, that’s why I said sooner that if you want to reproduce the issue you must add “latency” between the client and the server and so use 2 boxes over the LAN.
So, until MySQL rewrite the code to follow the sequence above, there won’t be any improvement here!!!!
Increasing source port range
By default, on a Linux box, you have around 28K source ports available (for a single destination IP:port):
$ sysctl net.ipv4.ip_local_port_range net.ipv4.ip_local_port_range = 32768 61000
In order to get 64K source ports, just run:
$ sudo sysctl net.ipv4.ip_local_port_range="1025 65000"
And don’t forget to update your /etc/sysctl.conf file!!!
Note: this should definitively be applied also on the web servers….
Allow usage of source port in TIME_WAIT
A few sysctls can be used to tell the kernel to reuse faster the connection in TIME_WAIT:
tw_reuse can be used safely, be but careful with tw_recycle… It could have side effects.
anyway, these sysctls were already properly setup (value = 1) on both HAProxy and web servers.
Note: this should definitively be applied also on the web servers….
Note 2: tw_reuse should definitively be applied also on the web servers….
Using multiple IPs to get connected to a single server
In HAProxy configuration, you can precise on the server line the source IP address to use to get connected to a server, so just add more server lines with different IPs.
In the example below, the IPs 10.0.0.100 and 10.0.0.101 are configured on the HAProxy box:
[...] server mysql1 10.0.0.1:3306 check source 10.0.0.100 server mysql1_bis 10.0.0.1:3306 check source 10.0.0.101 [...]
This allows us to open up to 128K source TCP port…
The kernel is responsible to manage to affect a new TCP port when HAProxy requests it. Dispite improving things a bit, we still reach some sour port exhaustion… We could not get over 80K connections in TIME_WAIT with 4 source IPs…
Let HAProxy manage TCP source ports
You can let HAProxy decides which source port to use when opening a new TCP connection, on behalf of the kernel. HAProxy has a purposely built functions which make it more efficient that the kernel for such task.
Let’s update the configuration above:
[...] server mysql1 10.0.0.1:3306 check source 10.0.0.100:1025-65000 server mysql1_bis 10.0.0.1:3306 check source 10.0.0.101:1025-65000 [...]
We managed to get 170K+ connections in TIME_WAIT with 4 source IPs… and not source port exhaustion anymore !!!!
Use a memcache
Fortunately, the devs from this customer are skilled and write flexible code
So they managed to move some requests from the MySQL DB to a memcache, opening much less connections.
Use MySQL persistant connections
This could prevent fine Load-Balancing on the Read-Only farm, but it would be very efficient on the MySQL master server.
- If you see some SOCKERR information messages in HAProxy logs (mainly on health check), you may be running out of TCP source port.
- Have skilled developers who writes flexible code, where moving from a DB to an other is made easy
- This kind of issue can happens only with protocols or applications which make the client closing the connection first
- This issue can’t happen on HAProxy in HTTP mode, since it let the server closes the connection before sending a TCP RST