Skip to content

Commit b49960a

Browse files
edumazetdavem330
authored andcommitted
tcp: change tcp_adv_win_scale and tcp_rmem[2]
tcp_adv_win_scale default value is 2, meaning we expect a good citizen skb to have skb->len / skb->truesize ratio of 75% (3/4) In 2.6 kernels we (mis)accounted for typical MSS=1460 frame : 1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392. So these skbs were considered as not bloated. With recent truesize fixes, a typical MSS=1460 frame truesize is now the more precise : 2048 + 256 = 2304. But 2304 * 3/4 = 1728. So these skb are not good citizen anymore, because 1460 < 1728 (GRO can escape this problem because it build skbs with a too low truesize.) This also means tcp advertises a too optimistic window for a given allocated rcvspace : When receiving frames, sk_rmem_alloc can hit sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often, especially when application is slow to drain its receive queue or in case of losses (netperf is fast, scp is slow). This is a major latency source. We should adjust the len/truesize ratio to 50% instead of 75% This patch : 1) changes tcp_adv_win_scale default to 1 instead of 2 2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account better truesize tracking and to allow autotuning tcp receive window to reach same value than before. Note that same amount of kernel memory is consumed compared to 2.6 kernels. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 84768ed commit b49960a

File tree

3 files changed

+8
-7
lines changed

3 files changed

+8
-7
lines changed

Documentation/networking/ip-sysctl.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ tcp_adv_win_scale - INTEGER
147147
(if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale),
148148
if it is <= 0.
149149
Possible values are [-31, 31], inclusive.
150-
Default: 2
150+
Default: 1
151151

152152
tcp_allowed_congestion_control - STRING
153153
Show/set the congestion control choices available to non-privileged
@@ -410,7 +410,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
410410
net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables
411411
automatic tuning of that socket's receive buffer size, in which
412412
case this value is ignored.
413-
Default: between 87380B and 4MB, depending on RAM size.
413+
Default: between 87380B and 6MB, depending on RAM size.
414414

415415
tcp_sack - BOOLEAN
416416
Enable select acknowledgments (SACKS).

net/ipv4/tcp.c

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3243,7 +3243,7 @@ void __init tcp_init(void)
32433243
{
32443244
struct sk_buff *skb = NULL;
32453245
unsigned long limit;
3246-
int max_share, cnt;
3246+
int max_rshare, max_wshare, cnt;
32473247
unsigned int i;
32483248
unsigned long jiffy = jiffies;
32493249

@@ -3303,15 +3303,16 @@ void __init tcp_init(void)
33033303
tcp_init_mem(&init_net);
33043304
/* Set per-socket limits to no more than 1/128 the pressure threshold */
33053305
limit = nr_free_buffer_pages() << (PAGE_SHIFT - 7);
3306-
max_share = min(4UL*1024*1024, limit);
3306+
max_wshare = min(4UL*1024*1024, limit);
3307+
max_rshare = min(6UL*1024*1024, limit);
33073308

33083309
sysctl_tcp_wmem[0] = SK_MEM_QUANTUM;
33093310
sysctl_tcp_wmem[1] = 16*1024;
3310-
sysctl_tcp_wmem[2] = max(64*1024, max_share);
3311+
sysctl_tcp_wmem[2] = max(64*1024, max_wshare);
33113312

33123313
sysctl_tcp_rmem[0] = SK_MEM_QUANTUM;
33133314
sysctl_tcp_rmem[1] = 87380;
3314-
sysctl_tcp_rmem[2] = max(87380, max_share);
3315+
sysctl_tcp_rmem[2] = max(87380, max_rshare);
33153316

33163317
pr_info("Hash tables configured (established %u bind %u)\n",
33173318
tcp_hashinfo.ehash_mask + 1, tcp_hashinfo.bhash_size);

net/ipv4/tcp_input.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ int sysctl_tcp_ecn __read_mostly = 2;
8585
EXPORT_SYMBOL(sysctl_tcp_ecn);
8686
int sysctl_tcp_dsack __read_mostly = 1;
8787
int sysctl_tcp_app_win __read_mostly = 31;
88-
int sysctl_tcp_adv_win_scale __read_mostly = 2;
88+
int sysctl_tcp_adv_win_scale __read_mostly = 1;
8989
EXPORT_SYMBOL(sysctl_tcp_adv_win_scale);
9090

9191
int sysctl_tcp_stdurg __read_mostly;

0 commit comments

Comments
 (0)