-
Notifications
You must be signed in to change notification settings - Fork 0
/
rfc3168.txt
3531 lines (2630 loc) · 167 KB
/
rfc3168.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Network Working Group K. Ramakrishnan
Request for Comments: 3168 TeraOptic Networks
Updates: 2474, 2401, 793 S. Floyd
Obsoletes: 2481 ACIRI
Category: Standards Track D. Black
EMC
September 2001
The Addition of Explicit Congestion Notification (ECN) to IP
Status of this Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2001). All Rights Reserved.
Abstract
This memo specifies the incorporation of ECN (Explicit Congestion
Notification) to TCP and IP, including ECN's use of two bits in the
IP header.
Table of Contents
1. Introduction.................................................. 3
2. Conventions and Acronyms...................................... 5
3. Assumptions and General Principles............................ 5
4. Active Queue Management (AQM)................................. 6
5. Explicit Congestion Notification in IP........................ 6
5.1. ECN as an Indication of Persistent Congestion............... 10
5.2. Dropped or Corrupted Packets................................ 11
5.3. Fragmentation............................................... 11
6. Support from the Transport Protocol........................... 12
6.1. TCP......................................................... 13
6.1.1 TCP Initialization......................................... 14
6.1.1.1. Middlebox Issues........................................ 16
6.1.1.2. Robust TCP Initialization with an Echoed Reserved Field. 17
6.1.2. The TCP Sender............................................ 18
6.1.3. The TCP Receiver.......................................... 19
6.1.4. Congestion on the ACK-path................................ 20
6.1.5. Retransmitted TCP packets................................. 20
Ramakrishnan, et al. Standards Track [Page 1]
RFC 3168 The Addition of ECN to IP September 2001
6.1.6. TCP Window Probes......................................... 22
7. Non-compliance by the End Nodes............................... 22
8. Non-compliance in the Network................................. 24
8.1. Complications Introduced by Split Paths..................... 25
9. Encapsulated Packets.......................................... 25
9.1. IP packets encapsulated in IP............................... 25
9.1.1. The Limited-functionality and Full-functionality Options.. 27
9.1.2. Changes to the ECN Field within an IP Tunnel.............. 28
9.2. IPsec Tunnels............................................... 29
9.2.1. Negotiation between Tunnel Endpoints...................... 31
9.2.1.1. ECN Tunnel Security Association Database Field.......... 32
9.2.1.2. ECN Tunnel Security Association Attribute............... 32
9.2.1.3. Changes to IPsec Tunnel Header Processing............... 33
9.2.2. Changes to the ECN Field within an IPsec Tunnel........... 35
9.2.3. Comments for IPsec Support................................ 35
9.3. IP packets encapsulated in non-IP Packet Headers............ 36
10. Issues Raised by Monitoring and Policing Devices............. 36
11. Evaluations of ECN........................................... 37
11.1. Related Work Evaluating ECN................................ 37
11.2. A Discussion of the ECN nonce.............................. 37
11.2.1. The Incremental Deployment of ECT(1) in Routers.......... 38
12. Summary of changes required in IP and TCP.................... 38
13. Conclusions.................................................. 40
14. Acknowledgements............................................. 41
15. References................................................... 41
16. Security Considerations...................................... 45
17. IPv4 Header Checksum Recalculation........................... 45
18. Possible Changes to the ECN Field in the Network............. 45
18.1. Possible Changes to the IP Header.......................... 46
18.1.1. Erasing the Congestion Indication........................ 46
18.1.2. Falsely Reporting Congestion............................. 47
18.1.3. Disabling ECN-Capability................................. 47
18.1.4. Falsely Indicating ECN-Capability........................ 47
18.2. Information carried in the Transport Header................ 48
18.3. Split Paths................................................ 49
19. Implications of Subverting End-to-End Congestion Control..... 50
19.1. Implications for the Network and for Competing Flows....... 50
19.2. Implications for the Subverted Flow........................ 53
19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion
Control.................................................... 54
20. The Motivation for the ECT Codepoints........................ 54
20.1. The Motivation for an ECT Codepoint........................ 54
20.2. The Motivation for two ECT Codepoints...................... 55
21. Why use Two Bits in the IP Header?........................... 57
22. Historical Definitions for the IPv4 TOS Octet................ 58
23. IANA Considerations.......................................... 60
23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet................. 60
23.2. TCP Header Flags........................................... 61
Ramakrishnan, et al. Standards Track [Page 2]
RFC 3168 The Addition of ECN to IP September 2001
23.3. IPSEC Security Association Attributes....................... 62
24. Authors' Addresses........................................... 62
25. Full Copyright Statement..................................... 63
1. Introduction
We begin by describing TCP's use of packet drops as an indication of
congestion. Next we explain that with the addition of active queue
management (e.g., RED) to the Internet infrastructure, where routers
detect congestion before the queue overflows, routers are no longer
limited to packet drops as an indication of congestion. Routers can
instead set the Congestion Experienced (CE) codepoint in the IP
header of packets from ECN-capable transports. We describe when the
CE codepoint is to be set in routers, and describe modifications
needed to TCP to make it ECN-capable. Modifications to other
transport protocols (e.g., unreliable unicast or multicast, reliable
multicast, other reliable unicast transport protocols) could be
considered as those protocols are developed and advance through the
standards process. We also describe in this document the issues
involving the use of ECN within IP tunnels, and within IPsec tunnels
in particular.
One of the guiding principles for this document is that, to the
extent possible, the mechanisms specified here be incrementally
deployable. One challenge to the principle of incremental deployment
has been the prior existence of some IP tunnels that were not
compatible with the use of ECN. As ECN becomes deployed, non-
compatible IP tunnels will have to be upgraded to conform to this
document.
This document obsoletes RFC 2481, "A Proposal to add Explicit
Congestion Notification (ECN) to IP", which defined ECN as an
Experimental Protocol for the Internet Community. This document also
updates RFC 2474, "Definition of the Differentiated Services Field
(DS Field) in the IPv4 and IPv6 Headers", in defining the ECN field
in the IP header, RFC 2401, "Security Architecture for the Internet
Protocol" to change the handling of IPv4 TOS Byte and IPv6 Traffic
Class Octet in tunnel mode header construction to be compatible with
the use of ECN, and RFC 793, "Transmission Control Protocol", in
defining two new flags in the TCP header.
TCP's congestion control and avoidance algorithms are based on the
notion that the network is a black-box [Jacobson88, Jacobson90]. The
network's state of congestion or otherwise is determined by end-
systems probing for the network state, by gradually increasing the
load on the network (by increasing the window of packets that are
outstanding in the network) until the network becomes congested and a
packet is lost. Treating the network as a "black-box" and treating
Ramakrishnan, et al. Standards Track [Page 3]
RFC 3168 The Addition of ECN to IP September 2001
loss as an indication of congestion in the network is appropriate for
pure best-effort data carried by TCP, with little or no sensitivity
to delay or loss of individual packets. In addition, TCP's
congestion management algorithms have techniques built-in (such as
Fast Retransmit and Fast Recovery) to minimize the impact of losses,
from a throughput perspective. However, these mechanisms are not
intended to help applications that are in fact sensitive to the delay
or loss of one or more individual packets. Interactive traffic such
as telnet, web-browsing, and transfer of audio and video data can be
sensitive to packet losses (especially when using an unreliable data
delivery transport such as UDP) or to the increased latency of the
packet caused by the need to retransmit the packet after a loss (with
the reliable data delivery semantics provided by TCP).
Since TCP determines the appropriate congestion window to use by
gradually increasing the window size until it experiences a dropped
packet, this causes the queues at the bottleneck router to build up.
With most packet drop policies at the router that are not sensitive
to the load placed by each individual flow (e.g., tail-drop on queue
overflow), this means that some of the packets of latency-sensitive
flows may be dropped. In addition, such drop policies lead to
synchronization of loss across multiple flows.
Active queue management mechanisms detect congestion before the queue
overflows, and provide an indication of this congestion to the end
nodes. Thus, active queue management can reduce unnecessary queuing
delay for all traffic sharing that queue. The advantages of active
queue management are discussed in RFC 2309 [RFC2309]. Active queue
management avoids some of the bad properties of dropping on queue
overflow, including the undesirable synchronization of loss across
multiple flows. More importantly, active queue management means that
transport protocols with mechanisms for congestion control (e.g.,
TCP) do not have to rely on buffer overflow as the only indication of
congestion.
Active queue management mechanisms may use one of several methods for
indicating congestion to end-nodes. One is to use packet drops, as is
currently done. However, active queue management allows the router to
separate policies of queuing or dropping packets from the policies
for indicating congestion. Thus, active queue management allows
routers to use the Congestion Experienced (CE) codepoint in a packet
header as an indication of congestion, instead of relying solely on
packet drops. This has the potential of reducing the impact of loss
on latency-sensitive flows.
Ramakrishnan, et al. Standards Track [Page 4]
RFC 3168 The Addition of ECN to IP September 2001
There exist some middleboxes (firewalls, load balancers, or intrusion
detection systems) in the Internet that either drop a TCP SYN packet
configured to negotiate ECN, or respond with a RST. This document
specifies procedures that TCP implementations may use to provide
robust connectivity even in the presence of such equipment.
2. Conventions and Acronyms
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [RFC2119].
3. Assumptions and General Principles
In this section, we describe some of the important design principles
and assumptions that guided the design choices in this proposal.
* Because ECN is likely to be adopted gradually, accommodating
migration is essential. Some routers may still only drop packets
to indicate congestion, and some end-systems may not be ECN-
capable. The most viable strategy is one that accommodates
incremental deployment without having to resort to "islands" of
ECN-capable and non-ECN-capable environments.
* New mechanisms for congestion control and avoidance need to co-
exist and cooperate with existing mechanisms for congestion
control. In particular, new mechanisms have to co-exist with
TCP's current methods of adapting to congestion and with
routers' current practice of dropping packets in periods of
congestion.
* Congestion may persist over different time-scales. The time
scales that we are concerned with are congestion events that may
last longer than a round-trip time.
* The number of packets in an individual flow (e.g., TCP
connection or an exchange using UDP) may range from a small
number of packets to quite a large number. We are interested in
managing the congestion caused by flows that send enough packets
so that they are still active when network feedback reaches
them.
* Asymmetric routing is likely to be a normal occurrence in the
Internet. The path (sequence of links and routers) followed by
data packets may be different from the path followed by the
acknowledgment packets in the reverse direction.
Ramakrishnan, et al. Standards Track [Page 5]
RFC 3168 The Addition of ECN to IP September 2001
* Many routers process the "regular" headers in IP packets more
efficiently than they process the header information in IP
options. This suggests keeping congestion experienced
information in the regular headers of an IP packet.
* It must be recognized that not all end-systems will cooperate in
mechanisms for congestion control. However, new mechanisms
shouldn't make it easier for TCP applications to disable TCP
congestion control. The benefit of lying about participating in
new mechanisms such as ECN-capability should be small.
4. Active Queue Management (AQM)
Random Early Detection (RED) is one mechanism for Active Queue
Management (AQM) that has been proposed to detect incipient
congestion [FJ93], and is currently being deployed in the Internet
[RFC2309]. AQM is meant to be a general mechanism using one of
several alternatives for congestion indication, but in the absence of
ECN, AQM is restricted to using packet drops as a mechanism for
congestion indication. AQM drops packets based on the average queue
length exceeding a threshold, rather than only when the queue
overflows. However, because AQM may drop packets before the queue
actually overflows, AQM is not always forced by memory limitations to
discard the packet.
AQM can set a Congestion Experienced (CE) codepoint in the packet
header instead of dropping the packet, when such a field is provided
in the IP header and understood by the transport protocol. The use
of the CE codepoint with ECN allows the receiver(s) to receive the
packet, avoiding the potential for excessive delays due to
retransmissions after packet losses. We use the term 'CE packet' to
denote a packet that has the CE codepoint set.
5. Explicit Congestion Notification in IP
This document specifies that the Internet provide a congestion
indication for incipient congestion (as in RED and earlier work
[RJ90]) where the notification can sometimes be through marking
packets rather than dropping them. This uses an ECN field in the IP
header with two bits, making four ECN codepoints, '00' to '11'. The
ECN-Capable Transport (ECT) codepoints '10' and '01' are set by the
data sender to indicate that the end-points of the transport protocol
are ECN-capable; we call them ECT(0) and ECT(1) respectively. The
phrase "the ECT codepoint" in this documents refers to either of the
two ECT codepoints. Routers treat the ECT(0) and ECT(1) codepoints
as equivalent. Senders are free to use either the ECT(0) or the
ECT(1) codepoint to indicate ECT, on a packet-by-packet basis.
Ramakrishnan, et al. Standards Track [Page 6]
RFC 3168 The Addition of ECN to IP September 2001
The use of both the two codepoints for ECT, ECT(0) and ECT(1), is
motivated primarily by the desire to allow mechanisms for the data
sender to verify that network elements are not erasing the CE
codepoint, and that data receivers are properly reporting to the
sender the receipt of packets with the CE codepoint set, as required
by the transport protocol. Guidelines for the senders and receivers
to differentiate between the ECT(0) and ECT(1) codepoints will be
addressed in separate documents, for each transport protocol. In
particular, this document does not address mechanisms for TCP end-
nodes to differentiate between the ECT(0) and ECT(1) codepoints.
Protocols and senders that only require a single ECT codepoint SHOULD
use ECT(0).
The not-ECT codepoint '00' indicates a packet that is not using ECN.
The CE codepoint '11' is set by a router to indicate congestion to
the end nodes. Routers that have a packet arriving at a full queue
drop the packet, just as they do in the absence of ECN.
+-----+-----+
| ECN FIELD |
+-----+-----+
ECT CE [Obsolete] RFC 2481 names for the ECN bits.
0 0 Not-ECT
0 1 ECT(1)
1 0 ECT(0)
1 1 CE
Figure 1: The ECN Field in IP.
The use of two ECT codepoints essentially gives a one-bit ECN nonce
in packet headers, and routers necessarily "erase" the nonce when
they set the CE codepoint [SCWA99]. For example, routers that erased
the CE codepoint would face additional difficulty in reconstructing
the original nonce, and thus repeated erasure of the CE codepoint
would be more likely to be detected by the end-nodes. The ECN nonce
also can address the problem of misbehaving transport receivers lying
to the transport sender about whether or not the CE codepoint was set
in a packet. The motivations for the use of two ECT codepoints is
discussed in more detail in Section 20, along with some discussion of
alternate possibilities for the fourth ECT codepoint (that is, the
codepoint '01'). Backwards compatibility with earlier ECN
implementations that do not understand the ECT(1) codepoint is
discussed in Section 11.
In RFC 2481 [RFC2481], the ECN field was divided into the ECN-Capable
Transport (ECT) bit and the CE bit. The ECN field with only the
ECN-Capable Transport (ECT) bit set in RFC 2481 corresponds to the
ECT(0) codepoint in this document, and the ECN field with both the
Ramakrishnan, et al. Standards Track [Page 7]
RFC 3168 The Addition of ECN to IP September 2001
ECT and CE bit in RFC 2481 corresponds to the CE codepoint in this
document. The '01' codepoint was left undefined in RFC 2481, and
this is the reason for recommending the use of ECT(0) when only a
single ECT codepoint is needed.
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| DS FIELD, DSCP | ECN FIELD |
+-----+-----+-----+-----+-----+-----+-----+-----+
DSCP: differentiated services codepoint
ECN: Explicit Congestion Notification
Figure 2: The Differentiated Services and ECN Fields in IP.
Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field.
The IPv4 TOS octet corresponds to the Traffic Class octet in IPv6,
and the ECN field is defined identically in both cases. The
definitions for the IPv4 TOS octet [RFC791] and the IPv6 Traffic
Class octet have been superseded by the six-bit DS (Differentiated
Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed in
[RFC2474] as Currently Unused, and are specified in RFC 2780 as
approved for experimental use for ECN. Section 22 gives a brief
history of the TOS octet.
Because of the unstable history of the TOS octet, the use of the ECN
field as specified in this document cannot be guaranteed to be
backwards compatible with those past uses of these two bits that
pre-date ECN. The potential dangers of this lack of backwards
compatibility are discussed in Section 22.
Upon the receipt by an ECN-Capable transport of a single CE packet,
the congestion control algorithms followed at the end-systems MUST be
essentially the same as the congestion control response to a *single*
dropped packet. For example, for ECN-Capable TCP the source TCP is
required to halve its congestion window for any window of data
containing either a packet drop or an ECN indication.
One reason for requiring that the congestion-control response to the
CE packet be essentially the same as the response to a dropped packet
is to accommodate the incremental deployment of ECN in both end-
systems and in routers. Some routers may drop ECN-Capable packets
(e.g., using the same AQM policies for congestion detection) while
other routers set the CE codepoint, for equivalent levels of
congestion. Similarly, a router might drop a non-ECN-Capable packet
but set the CE codepoint in an ECN-Capable packet, for equivalent
Ramakrishnan, et al. Standards Track [Page 8]
RFC 3168 The Addition of ECN to IP September 2001
levels of congestion. If there were different congestion control
responses to a CE codepoint than to a packet drop, this could result
in unfair treatment for different flows.
An additional goal is that the end-systems should react to congestion
at most once per window of data (i.e., at most once per round-trip
time), to avoid reacting multiple times to multiple indications of
congestion within a round-trip time.
For a router, the CE codepoint of an ECN-Capable packet SHOULD only
be set if the router would otherwise have dropped the packet as an
indication of congestion to the end nodes. When the router's buffer
is not yet full and the router is prepared to drop a packet to inform
end nodes of incipient congestion, the router should first check to
see if the ECT codepoint is set in that packet's IP header. If so,
then instead of dropping the packet, the router MAY instead set the
CE codepoint in the IP header.
An environment where all end nodes were ECN-Capable could allow new
criteria to be developed for setting the CE codepoint, and new
congestion control mechanisms for end-node reaction to CE packets.
However, this is a research issue, and as such is not addressed in
this document.
When a CE packet (i.e., a packet that has the CE codepoint set) is
received by a router, the CE codepoint is left unchanged, and the
packet is transmitted as usual. When severe congestion has occurred
and the router's queue is full, then the router has no choice but to
drop some packet when a new packet arrives. We anticipate that such
packet losses will become relatively infrequent when a majority of
end-systems become ECN-Capable and participate in TCP or other
compatible congestion control mechanisms. In an ECN-Capable
environment that is adequately-provisioned, packet losses should
occur primarily during transients or in the presence of non-
cooperating sources.
The above discussion of when CE may be set instead of dropping a
packet applies by default to all Differentiated Services Per-Hop
Behaviors (PHBs) [RFC 2475]. Specifications for PHBs MAY provide
more specifics on how a compliant implementation is to choose between
setting CE and dropping a packet, but this is NOT REQUIRED. A router
MUST NOT set CE instead of dropping a packet when the drop that would
occur is caused by reasons other than congestion or the desire to
indicate incipient congestion to end nodes (e.g., a diffserv edge
node may be configured to unconditionally drop certain classes of
traffic to prevent them from entering its diffserv domain).
Ramakrishnan, et al. Standards Track [Page 9]
RFC 3168 The Addition of ECN to IP September 2001
We expect that routers will set the CE codepoint in response to
incipient congestion as indicated by the average queue size, using
the RED algorithms suggested in [FJ93, RFC2309]. To the best of our
knowledge, this is the only proposal currently under discussion in
the IETF for routers to drop packets proactively, before the buffer
overflows. However, this document does not attempt to specify a
particular mechanism for active queue management, leaving that
endeavor, if needed, to other areas of the IETF. While ECN is
inextricably tied up with the need to have a reasonable active queue
management mechanism at the router, the reverse does not hold; active
queue management mechanisms have been developed and deployed
independent of ECN, using packet drops as indications of congestion
in the absence of ECN in the IP architecture.
5.1. ECN as an Indication of Persistent Congestion
We emphasize that a *single* packet with the CE codepoint set in an
IP packet causes the transport layer to respond, in terms of
congestion control, as it would to a packet drop. The instantaneous
queue size is likely to see considerable variations even when the
router does not experience persistent congestion. As such, it is
important that transient congestion at a router, reflected by the
instantaneous queue size reaching a threshold much smaller than the
capacity of the queue, not trigger a reaction at the transport layer.
Therefore, the CE codepoint should not be set by a router based on
the instantaneous queue size.
For example, since the ATM and Frame Relay mechanisms for congestion
indication have typically been defined without an associated notion
of average queue size as the basis for determining that an
intermediate node is congested, we believe that they provide a very
noisy signal. The TCP-sender reaction specified in this document for
ECN is NOT the appropriate reaction for such a noisy signal of
congestion notification. However, if the routers that interface to
the ATM network have a way of maintaining the average queue at the
interface, and use it to come to a reliable determination that the
ATM subnet is congested, they may use the ECN notification that is
defined here.
We continue to encourage experiments in techniques at layer 2 (e.g.,
in ATM switches or Frame Relay switches) to take advantage of ECN.
For example, using a scheme such as RED (where packet marking is
based on the average queue length exceeding a threshold), layer 2
devices could provide a reasonably reliable indication of congestion.
When all the layer 2 devices in a path set that layer's own
Congestion Experienced codepoint (e.g., the EFCI bit for ATM, the
FECN bit in Frame Relay) in this reliable manner, then the interface
router to the layer 2 network could copy the state of that layer 2
Ramakrishnan, et al. Standards Track [Page 10]
RFC 3168 The Addition of ECN to IP September 2001
Congestion Experienced codepoint into the CE codepoint in the IP
header. We recognize that this is not the current practice, nor is
it in current standards. However, encouraging experimentation in this
manner may provide the information needed to enable evolution of
existing layer 2 mechanisms to provide a more reliable means of
congestion indication, when they use a single bit for indicating
congestion.
5.2. Dropped or Corrupted Packets
For the proposed use for ECN in this document (that is, for a
transport protocol such as TCP for which a dropped data packet is an
indication of congestion), end nodes detect dropped data packets, and
the congestion response of the end nodes to a dropped data packet is
at least as strong as the congestion response to a received CE
packet. To ensure the reliable delivery of the congestion indication
of the CE codepoint, an ECT codepoint MUST NOT be set in a packet
unless the loss of that packet in the network would be detected by
the end nodes and interpreted as an indication of congestion.
Transport protocols such as TCP do not necessarily detect all packet
drops, such as the drop of a "pure" ACK packet; for example, TCP does
not reduce the arrival rate of subsequent ACK packets in response to
an earlier dropped ACK packet. Any proposal for extending ECN-
Capability to such packets would have to address issues such as the
case of an ACK packet that was marked with the CE codepoint but was
later dropped in the network. We believe that this aspect is still
the subject of research, so this document specifies that at this
time, "pure" ACK packets MUST NOT indicate ECN-Capability.
Similarly, if a CE packet is dropped later in the network due to
corruption (bit errors), the end nodes should still invoke congestion
control, just as TCP would today in response to a dropped data
packet. This issue of corrupted CE packets would have to be
considered in any proposal for the network to distinguish between
packets dropped due to corruption, and packets dropped due to
congestion or buffer overflow. In particular, the ubiquitous
deployment of ECN would not, in and of itself, be a sufficient
development to allow end-nodes to interpret packet drops as
indications of corruption rather than congestion.
5.3. Fragmentation
ECN-capable packets MAY have the DF (Don't Fragment) bit set.
Reassembly of a fragmented packet MUST NOT lose indications of
congestion. In other words, if any fragment of an IP packet to be
reassembled has the CE codepoint set, then one of two actions MUST be
taken:
Ramakrishnan, et al. Standards Track [Page 11]
RFC 3168 The Addition of ECN to IP September 2001
* Set the CE codepoint on the reassembled packet. However, this
MUST NOT occur if any of the other fragments contributing to
this reassembly carries the Not-ECT codepoint.
* The packet is dropped, instead of being reassembled, for any
other reason.
If both actions are applicable, either MAY be chosen. Reassembly of
a fragmented packet MUST NOT change the ECN codepoint when all of the
fragments carry the same codepoint.
We would note that because RFC 2481 did not specify reassembly
behavior, older ECN implementations conformant with that Experimental
RFC do not necessarily perform reassembly correctly, in terms of
preserving the CE codepoint in a fragment. The sender could avoid
the consequences of this behavior by setting the DF bit in ECN-
Capable packets.
Situations may arise in which the above reassembly specification is
insufficiently precise. For example, if there is a malicious or
broken entity in the path at or after the fragmentation point, packet
fragments could carry a mixture of ECT(0), ECT(1), and/or Not-ECT
codepoints. The reassembly specification above does not place
requirements on reassembly of fragments in this case. In situations
where more precise reassembly behavior would be required, protocol
specifications SHOULD instead specify that DF MUST be set in all
ECN-capable packets sent by the protocol.
6. Support from the Transport Protocol
ECN requires support from the transport protocol, in addition to the
functionality given by the ECN field in the IP packet header. The
transport protocol might require negotiation between the endpoints
during setup to determine that all of the endpoints are ECN-capable,
so that the sender can set the ECT codepoint in transmitted packets.
Second, the transport protocol must be capable of reacting
appropriately to the receipt of CE packets. This reaction could be
in the form of the data receiver informing the data sender of the
received CE packet (e.g., TCP), of the data receiver unsubscribing to
a layered multicast group (e.g., RLM [MJV96]), or of some other
action that ultimately reduces the arrival rate of that flow on that
congested link. CE packets indicate persistent rather than transient
congestion (see Section 5.1), and hence reactions to the receipt of
CE packets should be those appropriate for persistent congestion.
This document only addresses the addition of ECN Capability to TCP,
leaving issues of ECN in other transport protocols to further
research. For TCP, ECN requires three new pieces of functionality:
Ramakrishnan, et al. Standards Track [Page 12]
RFC 3168 The Addition of ECN to IP September 2001
negotiation between the endpoints during connection setup to
determine if they are both ECN-capable; an ECN-Echo (ECE) flag in the
TCP header so that the data receiver can inform the data sender when
a CE packet has been received; and a Congestion Window Reduced (CWR)
flag in the TCP header so that the data sender can inform the data
receiver that the congestion window has been reduced. The support
required from other transport protocols is likely to be different,
particularly for unreliable or reliable multicast transport
protocols, and will have to be determined as other transport
protocols are brought to the IETF for standardization.
In a mild abuse of terminology, in this document we refer to `TCP
packets' instead of `TCP segments'.
6.1. TCP
The following sections describe in detail the proposed use of ECN in
TCP. This proposal is described in essentially the same form in
[Floyd94]. We assume that the source TCP uses the standard congestion
control algorithms of Slow-start, Fast Retransmit and Fast Recovery
[RFC2581].
This proposal specifies two new flags in the Reserved field of the
TCP header. The TCP mechanism for negotiating ECN-Capability uses
the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved
field of the TCP header is designated as the ECN-Echo flag. The
location of the 6-bit Reserved field in the TCP header is shown in
Figure 4 of RFC 793 [RFC793] (and is reproduced below for
completeness). This specification of the ECN Field leaves the
Reserved field as a 4-bit field using bits 4-7.
To enable the TCP receiver to determine when to stop setting the
ECN-Echo flag, we introduce a second new flag in the TCP header, the
CWR flag. The CWR flag is assigned to Bit 8 in the Reserved field of
the TCP header.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | U | A | P | R | S | F |
| Header Length | Reserved | R | C | S | S | Y | I |
| | | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 3: The old definition of bytes 13 and 14 of the TCP
header.
Ramakrishnan, et al. Standards Track [Page 13]
RFC 3168 The Addition of ECN to IP September 2001
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | C | E | U | A | P | R | S | F |
| Header Length | Reserved | W | C | R | C | S | S | Y | I |
| | | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 4: The new definition of bytes 13 and 14 of the TCP
Header.
Thus, ECN uses the ECT and CE flags in the IP header (as shown in
Figure 1) for signaling between routers and connection endpoints, and
uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure
4) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection,
a typical sequence of events in an ECN-based reaction to congestion
is as follows:
* An ECT codepoint is set in packets transmitted by the sender to
indicate that ECN is supported by the transport entities for
these packets.
* An ECN-capable router detects impending congestion and detects
that an ECT codepoint is set in the packet it is about to drop.
Instead of dropping the packet, the router chooses to set the CE
codepoint in the IP header and forwards the packet.
* The receiver receives the packet with the CE codepoint set, and
sets the ECN-Echo flag in its next TCP ACK sent to the sender.
* The sender receives the TCP ACK with ECN-Echo set, and reacts to
the congestion as if a packet had been dropped.
* The sender sets the CWR flag in the TCP header of the next
packet sent to the receiver to acknowledge its receipt of and
reaction to the ECN-Echo flag.
The negotiation for using ECN by the TCP transport entities and the
use of the ECN-Echo and CWR flags is described in more detail in the
sections below.
6.1.1 TCP Initialization
In the TCP connection setup phase, the source and destination TCPs
exchange information about their willingness to use ECN. Subsequent
to the completion of this negotiation, the TCP sender sets an ECT
codepoint in the IP header of data packets to indicate to the network
that the transport is capable and willing to participate in ECN for
this packet. This indicates to the routers that they may mark this
Ramakrishnan, et al. Standards Track [Page 14]
RFC 3168 The Addition of ECN to IP September 2001
packet with the CE codepoint, if they would like to use that as a
method of congestion notification. If the TCP connection does not
wish to use ECN notification for a particular packet, the sending TCP
sets the ECN codepoint to not-ECT, and the TCP receiver ignores the
CE codepoint in the received packet.
For this discussion, we designate the initiating host as Host A and
the responding host as Host B. We call a SYN packet with the ECE and
CWR flags set an "ECN-setup SYN packet", and we call a SYN packet
with at least one of the ECE and CWR flags not set a "non-ECN-setup
SYN packet". Similarly, we call a SYN-ACK packet with only the ECE
flag set but the CWR flag not set an "ECN-setup SYN-ACK packet", and
we call a SYN-ACK packet with any other configuration of the ECE and
CWR flags a "non-ECN-setup SYN-ACK packet".
Before a TCP connection can use ECN, Host A sends an ECN-setup SYN
packet, and Host B sends an ECN-setup SYN-ACK packet. For a SYN
packet, the setting of both ECE and CWR in the ECN-setup SYN packet
is defined as an indication that the sending TCP is ECN-Capable,
rather than as an indication of congestion or of response to
congestion. More precisely, an ECN-setup SYN packet indicates that
the TCP implementation transmitting the SYN packet will participate
in ECN as both a sender and receiver. Specifically, as a receiver,
it will respond to incoming data packets that have the CE codepoint
set in the IP header by setting ECE in outgoing TCP Acknowledgement
(ACK) packets. As a sender, it will respond to incoming packets that
have ECE set by reducing the congestion window and setting CWR when
appropriate. An ECN-setup SYN packet does not commit the TCP sender
to setting the ECT codepoint in any or all of the packets it may
transmit. However, the commitment to respond appropriately to
incoming packets with the CE codepoint set remains even if the TCP
sender in a later transmission, within this TCP connection, sends a
SYN packet without ECE and CWR set.
When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag
but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an
indication that the TCP transmitting the SYN-ACK packet is ECN-
Capable. As with the SYN packet, an ECN-setup SYN-ACK packet does
not commit the TCP host to setting the ECT codepoint in transmitted
packets.
The following rules apply to the sending of ECN-setup packets within
a TCP connection, where a TCP connection is defined by the standard
rules for TCP connection establishment and termination.
* If a host has received an ECN-setup SYN packet, then it MAY send
an ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an
ECN-setup SYN-ACK packet.
Ramakrishnan, et al. Standards Track [Page 15]
RFC 3168 The Addition of ECN to IP September 2001
* A host MUST NOT set ECT on data packets unless it has sent at
least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has
received at least one ECN-setup SYN or ECN-setup SYN-ACK packet,
and has sent no non-ECN-setup SYN or non-ECN-setup SYN-ACK
packet. If a host has received at least one non-ECN-setup SYN
or non-ECN-setup SYN-ACK packet, then it SHOULD NOT set ECT on
data packets.
* If a host ever sets the ECT codepoint on a data packet, then
that host MUST correctly set/clear the CWR TCP bit on all
subsequent packets in the connection.
* If a host has sent at least one ECN-setup SYN or ECN-setup SYN-
ACK packet, and has received no non-ECN-setup SYN or non-ECN-
setup SYN-ACK packet, then if that host receives TCP data
packets with ECT and CE codepoints set in the IP header, then
that host MUST process these packets as specified for an ECN-
capable connection.
* A host that is not willing to use ECN on a TCP connection SHOULD
clear both the ECE and CWR flags in all non-ECN-setup SYN and/or
SYN-ACK packets that it sends to indicate this unwillingness.
Receivers MUST correctly handle all forms of the non-ECN-setup
SYN and SYN-ACK packets.
* A host MUST NOT set ECT on SYN or SYN-ACK packets.
A TCP client enters TIME-WAIT state after receiving a FIN-ACK, and
transitions to CLOSED state after a timeout. Many TCP
implementations create a new TCP connection if they receive an in-
window SYN packet during TIME-WAIT state. When a TCP host enters
TIME-WAIT or CLOSED state, it should ignore any previous state about
the negotiation of ECN for that connection.
6.1.1.1. Middlebox Issues
ECN introduces the use of the ECN-Echo and CWR flags in the TCP
header (as shown in Figure 3) for initialization. There exist some
faulty firewalls, load balancers, and intrusion detection systems in
the Internet that either drop an ECN-setup SYN packet or respond with
a RST, in the belief that such a packet (with these bits set) is a
signature for a port-scanning tool that could be used in a denial-
of-service attack. Some of the offending equipment has been
identified, and a web page [FIXES] contains a list of non-compliant
products and the fixes posted by the vendors, where these are
available. The TBIT web page [TBIT] lists some of the web servers
affected by this faulty equipment. We mention this in this document
as a warning to the community of this problem.
Ramakrishnan, et al. Standards Track [Page 16]
RFC 3168 The Addition of ECN to IP September 2001
To provide robust connectivity even in the presence of such faulty
equipment, a host that receives a RST in response to the transmission
of an ECN-setup SYN packet MAY resend a SYN with CWR and ECE cleared.
This could result in a TCP connection being established without using
ECN.
A host that receives no reply to an ECN-setup SYN within the normal
SYN retransmission timeout interval MAY resend the SYN and any
subsequent SYN retransmissions with CWR and ECE cleared. To overcome
normal packet loss that results in the original SYN being lost, the
originating host may retransmit one or more ECN-setup SYN packets
before giving up and retransmitting the SYN with the CWR and ECE bits
cleared.
We note that in this case, the following example scenario is
possible:
(1) Host A: Sends an ECN-setup SYN.
(2) Host B: Sends an ECN-setup SYN/ACK, packet is dropped or delayed.
(3) Host A: Sends a non-ECN-setup SYN.
(4) Host B: Sends a non-ECN-setup SYN/ACK.
We note that in this case, following the procedures above, neither
Host A nor Host B may set the ECT bit on data packets. Further, an
important consequence of the rules for ECN setup and usage in Section
6.1.1 is that a host is forbidden from using the reception of ECT
data packets as an implicit signal that the other host is ECN-
capable.
6.1.1.2. Robust TCP Initialization with an Echoed Reserved Field
There is the question of why we chose to have the TCP sending the SYN
set two ECN-related flags in the Reserved field of the TCP header for
the SYN packet, while the responding TCP sending the SYN-ACK sets
only one ECN-related flag in the SYN-ACK packet. This asymmetry is
necessary for the robust negotiation of ECN-capability with some
deployed TCP implementations. There exists at least one faulty TCP
implementation in which TCP receivers set the Reserved field of the
TCP header in ACK packets (and hence the SYN-ACK) simply to reflect
the Reserved field of the TCP header in the received data packet.
Because the TCP SYN packet sets the ECN-Echo and CWR flags to
indicate ECN-capability, while the SYN-ACK packet sets only the ECN-
Echo flag, the sending TCP correctly interprets a receiver's
reflection of its own flags in the Reserved field as an indication
that the receiver is not ECN-capable. The sending TCP is not mislead
by a faulty TCP implementation sending a SYN-ACK packet that simply
reflects the Reserved field of the incoming SYN packet.
Ramakrishnan, et al. Standards Track [Page 17]
RFC 3168 The Addition of ECN to IP September 2001
6.1.2. The TCP Sender
For a TCP connection using ECN, new data packets are transmitted with
an ECT codepoint set in the IP header. When only one ECT codepoint
is needed by a sender for all packets sent on a TCP connection,
ECT(0) SHOULD be used. If the sender receives an ECN-Echo (ECE) ACK
packet (that is, an ACK packet with the ECN-Echo flag set in the TCP
header), then the sender knows that congestion was encountered in the
network on the path from the sender to the receiver. The indication
of congestion should be treated just as a congestion loss in non-
ECN-Capable TCP. That is, the TCP source halves the congestion window
"cwnd" and reduces the slow start threshold "ssthresh". The sending
TCP SHOULD NOT increase the congestion window in response to the
receipt of an ECN-Echo ACK packet.
TCP should not react to congestion indications more than once every
window of data (or more loosely, more than once every round-trip
time). That is, the TCP sender's congestion window should be reduced
only once in response to a series of dropped and/or CE packets from a
single window of data. In addition, the TCP source should not
decrease the slow-start threshold, ssthresh, if it has been decreased
within the last round trip time. However, if any retransmitted
packets are dropped, then this is interpreted by the source TCP as a
new instance of congestion.
After the source TCP reduces its congestion window in response to a
CE packet, incoming acknowledgments that continue to arrive can
"clock out" outgoing packets as allowed by the reduced congestion
window. If the congestion window consists of only one MSS (maximum
segment size), and the sending TCP receives an ECN-Echo ACK packet,
then the sending TCP should in principle still reduce its congestion
window in half. However, the value of the congestion window is
bounded below by a value of one MSS. If the sending TCP were to
continue to send, using a congestion window of 1 MSS, this results in
the transmission of one packet per round-trip time. It is necessary
to still reduce the sending rate of the TCP sender even further, on
receipt of an ECN-Echo packet when the congestion window is one. We
use the retransmit timer as a means of reducing the rate further in
this circumstance. Therefore, the sending TCP MUST reset the
retransmit timer on receiving the ECN-Echo packet when the congestion
window is one. The sending TCP will then be able to send a new
packet only when the retransmit timer expires.