Skip to content

Commit 1fa3314

Browse files
idoschPaolo Abeni
authored andcommitted
ipv4: Centralize TOS matching
The TOS field in the IPv4 flow information structure ('flowi4_tos') is matched by the kernel against the TOS selector in IPv4 rules and routes. The field is initialized differently by different call sites. Some treat it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC 791 TOS and initialize it using IPTOS_RT_MASK. What is common to all these call sites is that they all initialize the lower three DSCP bits, which fits the TOS definition in the initial IPv4 specification (RFC 791). Therefore, the kernel only allows configuring IPv4 FIB rules that match on the lower three DSCP bits which are always guaranteed to be initialized by all call sites: # ip -4 rule add tos 0x1c table 100 # ip -4 rule add tos 0x3c table 100 Error: Invalid tos. While this works, it is unlikely to be very useful. RFC 791 that initially defined the TOS and IP precedence fields was updated by RFC 2474 over twenty five years ago where these fields were replaced by a single six bits DSCP field. Extending FIB rules to match on DSCP can be done by adding a new DSCP selector while maintaining the existing semantics of the TOS selector for applications that rely on that. A prerequisite for allowing FIB rules to match on DSCP is to adjust all the call sites to initialize the high order DSCP bits and remove their masking along the path to the core where the field is matched on. However, making this change alone will result in a behavior change. For example, a forwarded IPv4 packet with a DS field of 0xfc will no longer match a FIB rule that was configured with 'tos 0x1c'. This behavior change can be avoided by masking the upper three DSCP bits in 'flowi4_tos' before comparing it against the TOS selectors in FIB rules and routes. Implement the above by adding a new function that checks whether a given DSCP value matches the one specified in the IPv4 flow information structure and invoke it from the three places that currently match on 'flowi4_tos'. Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK since the latter is not uAPI and we should be able to remove it at some point. Include <linux/ip.h> in <linux/in_route.h> since the former defines IPTOS_TOS_MASK which is used in the definition of RT_TOS() in <linux/in_route.h>. No regressions in FIB tests: # ./fib_tests.sh [...] Tests passed: 218 Tests failed: 0 And FIB rule tests: # ./fib_rule_tests.sh [...] Tests passed: 116 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
1 parent 548a202 commit 1fa3314

File tree

5 files changed

+11
-5
lines changed

5 files changed

+11
-5
lines changed

include/net/ip_fib.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
#include <linux/percpu.h>
2323
#include <linux/notifier.h>
2424
#include <linux/refcount.h>
25+
#include <linux/in_route.h>
2526

2627
struct fib_config {
2728
u8 fc_dst_len;
@@ -434,6 +435,11 @@ static inline bool fib4_rules_early_flow_dissect(struct net *net,
434435

435436
#endif /* CONFIG_IP_MULTIPLE_TABLES */
436437

438+
static inline bool fib_dscp_masked_match(dscp_t dscp, const struct flowi4 *fl4)
439+
{
440+
return dscp == inet_dsfield_to_dscp(RT_TOS(fl4->flowi4_tos));
441+
}
442+
437443
/* Exported by fib_frontend.c */
438444
extern const struct nla_policy rtm_ipv4_policy[];
439445
void ip_fib_init(void);

include/uapi/linux/in_route.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
#ifndef _LINUX_IN_ROUTE_H
33
#define _LINUX_IN_ROUTE_H
44

5+
#include <linux/ip.h>
6+
57
/* IPv4 routing cache flags */
68

79
#define RTCF_DEAD RTNH_F_DEAD

net/ipv4/fib_rules.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ INDIRECT_CALLABLE_SCOPE int fib4_rule_match(struct fib_rule *rule,
186186
((daddr ^ r->dst) & r->dstmask))
187187
return 0;
188188

189-
if (r->dscp && r->dscp != inet_dsfield_to_dscp(fl4->flowi4_tos))
189+
if (r->dscp && !fib_dscp_masked_match(r->dscp, fl4))
190190
return 0;
191191

192192
if (rule->ip_proto && (rule->ip_proto != fl4->flowi4_proto))

net/ipv4/fib_semantics.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2066,8 +2066,7 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res)
20662066

20672067
if (fa->fa_slen != slen)
20682068
continue;
2069-
if (fa->fa_dscp &&
2070-
fa->fa_dscp != inet_dsfield_to_dscp(flp->flowi4_tos))
2069+
if (fa->fa_dscp && !fib_dscp_masked_match(fa->fa_dscp, flp))
20712070
continue;
20722071
if (fa->tb_id != tb->tb_id)
20732072
continue;

net/ipv4/fib_trie.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1580,8 +1580,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
15801580
if (index >= (1ul << fa->fa_slen))
15811581
continue;
15821582
}
1583-
if (fa->fa_dscp &&
1584-
inet_dscp_to_dsfield(fa->fa_dscp) != flp->flowi4_tos)
1583+
if (fa->fa_dscp && !fib_dscp_masked_match(fa->fa_dscp, flp))
15851584
continue;
15861585
/* Paired with WRITE_ONCE() in fib_release_info() */
15871586
if (READ_ONCE(fi->fib_dead))

0 commit comments

Comments
 (0)