Skip to content

[Z Blog] Tuning Linux for benchmarking

Richard Hightower edited this page Apr 23, 2015 · 13 revisions

Let's assume we want to tune linux to handle a lot of incoming network connections.

You want to make sure you can open enough file handles.

Increasing file handles.

Check system limit:

cat /proc/sys/fs/file-max

For this test we want 2 million open connections and it is likely that we will be running more than one process that is opening connections (simulators, load testers for smoke testing, ha_proxy, nginx, plus Java services).

If you do not have 2 million file handles that you can open, edit the sysctl conf file as follows.

$ sudo nano /etc/sysctl.conf

Now check to see if the soft and hard user limits for files are set.

# ulimit -Hn
# ulimit -Sn

Let's go ahead and check the various limits and see if we can open enough files from a user perpective.

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1031032
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 30000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1031032
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Most of the parameters look good for how we are going to use the OS except for the open files.

Add the following to limits.conf.

*      hard   nofile 1000000
*      soft   nofile 1000000
$ sudo nano  /etc/security/limits.conf

Follow this guide for tuning network parameters.

Linux Host tuning

TCP Tuning

Ideas adapted from: TCP Tuning and Linux TPCIP Tuning and TIME_WAIT and port reuse and Linux Networking Tuning.

Edit /etc/sysctl.conf file. Reload with sysctl -p.

We are going to crank up the min buffers, max buffers. 16MB for 1GE, and 32M or 54M for 10GE

Next we will increase the NIC (network card) interface queue. If the RTT is more than 50 ms, a value of 5,000-10,000 is recommended.

To increase txqueuelen, do the following:

  • TCP_FIN_TIMEOUT - elapsed time before TCP/IP can release a closed connection, and frees its resources.
  • Smaller TCP_FIN_TIMEOUT means TCP/IP can release closed connection resources faster (like the port).
  • TCP_KEEPALIVE_INTERVAL - wait time between isAlive interval probes
  • TCP_KEEPALIVE_PROBES - number of probes before timing out
  • TCP_TW_RECYCLE - Turn on fast recycling of TIME_WAIT sockets (closed waiting to be reused)
  • TCP_TW_REUSE - Safer version of TCP_TW_RECYCLE, good for short connections (server behind a load balancer)

Setting 'sysctl net.ipv4.tcp_available_congestion_control', use Cubic which is the default use this or htcp.

Find more tuning ideas at ip-sysctl.txt, which is part of the Linux distribution.

Ephemeral Ports

Especially when you are running wrk on a client, use ephemeral ports available to your application. Default ephemeral port range is 32768 to 61000. This can be increased. Since the ports will be reused, we will have 47,535 of them.

/etc/sysctl.d/

net.ipv4.ip_local_port_range = 18000    65535

TIME_WAIT state

Connection stays in the TIME_WAIT state for twice msl. Default msl is 60 seconds, thus TIME_WAIT timeout value at 2 minutes. Since we are going to be benchmarking with wrk, we want to spare our ports.

net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = 1

nf_conntrack_tcp_timeout_established

The established connection timeout ESTABLISHED state, and a connection should get out of this state when a FIN packet goes through in either direction. nf_conntrack_tcp_timeout_established is 432000 seconds by default. We don't want to lose connections and ports that were never able to do the three way hang shake. We are benchmarking, losing enough ports and resources could be bad.

net.netfilter.nf_conntrack_tcp_timeout_established=60

Window size after idle

Related to the above is the sysctl setting net.ipv4.tcp_slow_start_after_idle.

net.ipv4.tcp_slow_start_after_idle=0

/etc/sysctl.conf

# /etc/sysctl.conf
# Increase system file descriptor limit
fs.file-max = 100000

# Discourage Linux from swapping idle processes to disk (default = 60)
vm.swappiness = 10

# Increase ephermeral IP ports
net.ipv4.ip_local_port_range = 10000 65000

# Increase Linux autotuning TCP buffer limits
# Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104) for 10GE
# Don't set tcp_mem itself! Let the kernel scale it based on RAM.
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Make room for more TIME_WAIT sockets due to more clients,
# and allow them to be reused if we run out of sockets
# Also increase the max packet backlog
net.core.netdev_max_backlog = 50000
net.ipv4.tcp_max_syn_backlog = 30000
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 10

# Disable TCP slow start on idle connections
net.ipv4.tcp_slow_start_after_idle = 0

# If your servers talk UDP, also up these limits
net.ipv4.udp_rmem_min = 8192
net.ipv4.udp_wmem_min = 8192

# Disable source routing and redirects
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0

# Log packets with impossible addresses for security
net.ipv4.conf.all.log_martians = 1

/etc/security/limits.conf

# allow all users to open 100000 files
# alternatively, replace * with an explicit username
* soft nofile 100000
* hard nofile 100000

http://www.nateware.com/linux-network-tuning-for-2013.html#.VTdagocUr8u

baselining linux server

Sometimes you get server instance without knowing what you got. It happens.

checking server strength

more /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 45
model name	: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
stepping	: 7
microcode	: 0x70d
cpu MHz		: 1999.832
cache size	: 20480 KB
cpu cores	: 8

So this is a 2012 processor. Must have been a sale at ebay. https://cpubenchmark.net/cpu.php?cpu=Intel+Xeon+E5-2650+%40+2.00GHz&id=1218&cpuCount=2 http://ark.intel.com/products/64590/Intel-Xeon-Processor-E5-2650-20M-Cache-2_00-GHz-8_00-GTs-Intel-QPI

Ok. At least if I decide to do some benchmarking in EC2, I can sort of pick what size EC2 instance I am going to use. This box has 32 CPUs. CPU should not be an issue with this app, but if it is we can split it up.

Baselining performance.

I like to make sure that I have things setup properly. I have custom HTTP code both client and server, but I like to make sure that I have a decent OS setup before I waste too much time tweaking my code.

I do this with wrk and nginx.

I install Nginx on the box that is going to be the server. I install wrk on the client box.

$ wrk -c 20000 -d 10s http://10.5.99.62/index.html --timeout 1000s -t 12
Running 10s test @ http://10.5.99.62/index.html
  12 threads and 20000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    85.74ms  330.97ms   6.40s    98.01%
    Req/Sec    11.26k     4.50k   50.62k    77.00%
  1345589 requests in 10.05s, 1.04GB read
  Socket errors: connect 0, read 74, write 0, timeout 0
Requests/sec: 133925.85
Transfer/sec:    105.62MB

I changed the nginx worker pool to use 16 processors since we have 32 to use.

/etc/nginx$ cat nginx.conf 
user www-data;
worker_processes 16;
pid /run/nginx.pid;

events {
	worker_connections 768;
	# multi_accept on;
}

Next up Vertx install

Install Java on Ubuntu.

https://www.digitalocean.com/community/tutorials/how-to-install-java-on-ubuntu-with-apt-get

Install Vertx

http://vertx.io/install.html

Running vertx standalone

$ cat server.js
var vertx = require('vertx');

vertx.createHttpServer().requestHandler(function(req) {
  req.response.end("Hello World!");
}).listen(9090);


$ /opt/vertx/vert.x-2.1.5/bin/vertx  run  server.js -instances 16

Client

$ wrk -c 20000 -d 10s http://10.5.99.62:9090/ --timeout 1000s -t 20
Running 10s test @ http://10.5.99.62:9090/
  20 threads and 20000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   104.42ms  617.70ms   6.06s    97.13%
    Req/Sec    12.57k    11.13k   79.82k    89.77%
  2507087 requests in 10.10s, 121.94MB read
  Socket errors: connect 0, read 542, write 0, timeout 0
Requests/sec: 248,333.64
Transfer/sec:     12.08MB

Tutorials

__

Docs

Getting Started

Basics

Concepts

REST

Callbacks and Reactor

Event Bus

Advanced

Integration

QBit case studies

QBit 2 Roadmap

-- Related Projects

Kafka training, Kafka consulting, Cassandra training, Cassandra consulting, Spark training, Spark consulting

Clone this wiki locally