Skip to content

Commit

Permalink
Merge branch 'fix-indirect-flow_block-infrastructure'
Browse files Browse the repository at this point in the history
Pablo Neira Ayuso says:

====================
the indirect flow_block infrastructure, revisited

This series fixes b5140a3 ("netfilter: flowtable: add indr block
setup support") that adds support for the indirect block for the
flowtable. This patch crashes the kernel with the TC CT action.

[  630.908086] BUG: kernel NULL pointer dereference, address: 00000000000000f0
[  630.908233] #PF: error_code(0x0000) - not-present page
[  630.908304] PGD 800000104addd067 P4D 800000104addd067 PUD 104311d067 PMD 0
[  630.908380] Oops: 0000 [#1] SMP PTI [  630.908615] RIP: 0010:nf_flow_table_indr_block_cb+0xc0/0x190 [nf_flow_table]
[  630.908690] Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 4c 89 75 a0 4c 89 65 a8 4d 89 ee 49 89 dd 4c 89 fe 48 c7 c7 b7 64 36 a0 31 c0 e8 ce ed d8 e0 <49> 8b b7 f0 00 00 00 48 c7 c7 c8 64      36 a0 31 c0 e8 b9 ed d8 e0 49[  630.908790] RSP: 0018:ffffc9000895f8c0 EFLAGS: 00010246
[...]
[  630.910774] Call Trace:
[  630.911192]  ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core]
[  630.911621]  ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core]
[  630.912040]  ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core]
[  630.912443]  flow_block_cmd+0x51/0x80
[  630.912844]  __flow_indr_block_cb_register+0x26c/0x510
[  630.913265]  mlx5e_nic_rep_netdevice_event+0x9e/0x110 [mlx5_core]
[  630.913665]  notifier_call_chain+0x53/0xa0
[  630.914063]  raw_notifier_call_chain+0x16/0x20
[  630.914466]  call_netdevice_notifiers_info+0x39/0x90
[  630.914859]  register_netdevice+0x484/0x550
[  630.915256]  __ip_tunnel_create+0x12b/0x1f0 [ip_tunnel]
[  630.915661]  ip_tunnel_init_net+0x116/0x180 [ip_tunnel]
[  630.916062]  ipgre_tap_init_net+0x22/0x30 [ip_gre]
[  630.916458]  ops_init+0x44/0x110
[  630.916851]  register_pernet_operations+0x112/0x200

A workaround patch to cure this crash has been proposed. However, there
is another problem: The indirect flow_block still does not work for the
new TC CT action. The problem is that the existing flow_indr_block_entry
callback assumes you can look up for the flowtable from the netdevice to
get the flow_block. This flow_block allows you to offload the flows via
TC_SETUP_CLSFLOWER. Unfortunately, it is not possible to get the
flow_block from the TC CT flowtables because they are _not_ bound to any
specific netdevice.

= What is the indirect flow_block infrastructure?

The indirect flow_block infrastructure allows drivers to offload
tc/netfilter rules that belong to software tunnel netdevices, e.g.
vxlan.

This indirect flow_block infrastructure relates tunnel netdevices with
drivers because there is no obvious way to relate these two things
from the control plane.

= How does the indirect flow_block work before this patchset?

Front-ends register the indirect block callback through
flow_indr_add_block_cb() if they support for offloading tunnel
netdevices.

== Setting up an indirect block

1) Drivers track tunnel netdevices via NETDEV_{REGISTER,UNREGISTER} events.
   If there is a new tunnel netdevice that the driver can offload, then the
   driver invokes __flow_indr_block_cb_register() with the new tunnel
   netdevice and the driver callback. The __flow_indr_block_cb_register()
   call iterates over the list of the front-end callbacks.

2) The front-end callback sets up the flow_block_offload structure and it
   invokes the driver callback to set up the flow_block.

3) The driver callback now registers the flow_block structure and it
   returns the flow_block back to the front-end.

4) The front-end gets the flow_block object and it is now ready to
   offload rules for this tunnel netdevice.

A simplified callgraph is represented below.

        Front-end                      Driver

                                   NETDEV_REGISTER
                                         |
                     __flow_indr_block_cb_register(netdev, cb_priv, driver_cb)
                                         | [1]
            .--------------frontend_indr_block_cb(cb_priv, driver_cb)
            |
            .
   setup_flow_block_offload(bo)
            | [2]
       driver_cb(bo, cb_priv) -----------.
                                         |
                                         \/
                                  set up flow_blocks [3]
                                         |
      add rules to flow_block <----------
      TC_SETUP_CLSFLOWER [4]

== Releasing the indirect flow_block

There are two possibilities, either tunnel netdevice is removed or
a netdevice (port representor) is removed.

=== Tunnel netdevice is removed

Driver waits for the NETDEV_UNREGISTER event that announces the tunnel
netdevice removal. Then, it calls __flow_indr_block_cb_unregister() to
remove the flow_block and rules.  Callgraph is very similar to the one
described above.

=== Netdevice is removed (port representor)

Driver calls __flow_indr_block_cb_unregister() to remove the existing
netfilter/tc rule that belong to the tunnel netdevice.

= How does the indirect flow_block work after this patchset?

Drivers register the indirect flow_block setup callback through
flow_indr_dev_register() if they support for offloading tunnel
netdevices.

== Setting up an indirect flow_block

1) Frontends check if dev->netdev_ops->ndo_setup_tc is unset. If so,
   frontends call flow_indr_dev_setup_offload(). This call invokes
   the drivers' indirect flow_block setup callback.

2) The indirect flow_block setup callback sets up a flow_block structure
   which relates the tunnel netdevice and the driver.

3) The front-end uses flow_block and offload the rules.

Note that the operational to set up (non-indirect) flow_block is very
similar.

== Releasing the indirect flow_block

=== Tunnel netdevice is removed

This calls flow_indr_dev_setup_offload() to set down the flow_block and
remove the offloaded rules. This alternate path is exercised if
dev->netdev_ops->ndo_setup_tc is unset.

=== Netdevice is removed (port representor)

If a netdevice is removed, then it might need to to clean up the
offloaded tc/netfilter rules that belongs to the tunnel netdevice:

1) The driver invokes flow_indr_dev_unregister() when a netdevice is
   removed.

2) This call iterates over the existing indirect flow_blocks
   and it invokes the cleanup callback to let the front-end remove the
   tc/netfilter rules. The cleanup callback already provides the
   flow_block that the front-end needs to clean up.

        Front-end                      Driver

                                         |
                            flow_indr_dev_unregister(...)
                                         |
                         iterate over list of indirect flow_block
                               and invoke cleanup callback
                                         |
            .-----------------------------
            |
            .
   frontend_flow_block_cleanup(flow_block)
            .
            |
           \/
   remove rules to flow_block
      TC_SETUP_CLSFLOWER

= About this patchset

This patchset aims to address the existing TC CT problem while
simplifying the indirect flow_block infrastructure. Saving 300 LoC in
the flow_offload core and the drivers. The operational gets aligned with
the (non-indirect) flow_blocks logic. Patchset is composed of:

Patch #1 add nf_flow_table_gc_cleanup() which is required by the
         netfilter's flowtable new indirect flow_block approach.

Patch #2 adds the flow_block_indr object which is actually part of
         of the flow_block object. This stores the indirect flow_block
         metadata such as the tunnel netdevice owner and the cleanup
         callback (in case the tunnel netdevice goes away).

         This patch adds flow_indr_dev_{un}register() to allow drivers
         to offer netdevice tunnel hardware offload to the front-ends.
         Then, front-ends call flow_indr_dev_setup_offload() to invoke
         the drivers to set up the (indirect) flow_block.

Patch #3 add the tcf_block_offload_init() helper function, this is
         a preparation patch to adapt the tc front-end to use this
         new indirect flow_block infrastructure.

Patch #4 updates the tc and netfilter front-ends to use the new
         indirect flow_block infrastructure.

Patch #5 updates the mlx5 driver to use the new indirect flow_block
         infrastructure.

Patch #6 updates the nfp driver to use the new indirect flow_block
         infrastructure.

Patch #7 updates the bnxt driver to use the new indirect flow_block
         infrastructure.

Patch #8 removes the indirect flow_block infrastructure version 1,
         now that frontends and drivers have been translated to
         version 2 (coming in this patchset).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
davem330 committed Jun 1, 2020
2 parents a01c245 + 709ffbe commit 2aec17f
Show file tree
Hide file tree
Showing 16 changed files with 249 additions and 595 deletions.
1 change: 0 additions & 1 deletion drivers/net/ethernet/broadcom/bnxt/bnxt.h
Original file line number Diff line number Diff line change
Expand Up @@ -1870,7 +1870,6 @@ struct bnxt {
u8 dsn[8];
struct bnxt_tc_info *tc_info;
struct list_head tc_indr_block_list;
struct notifier_block tc_netdev_nb;
struct dentry *debugfs_pdev;
struct device *hwmon_dev;
};
Expand Down
51 changes: 12 additions & 39 deletions drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
Original file line number Diff line number Diff line change
Expand Up @@ -1939,53 +1939,25 @@ static int bnxt_tc_setup_indr_block(struct net_device *netdev, struct bnxt *bp,
return 0;
}

static int bnxt_tc_setup_indr_cb(struct net_device *netdev, void *cb_priv,
enum tc_setup_type type, void *type_data)
{
switch (type) {
case TC_SETUP_BLOCK:
return bnxt_tc_setup_indr_block(netdev, cb_priv, type_data);
default:
return -EOPNOTSUPP;
}
}

static bool bnxt_is_netdev_indr_offload(struct net_device *netdev)
{
return netif_is_vxlan(netdev);
}

static int bnxt_tc_indr_block_event(struct notifier_block *nb,
unsigned long event, void *ptr)
static int bnxt_tc_setup_indr_cb(struct net_device *netdev, void *cb_priv,
enum tc_setup_type type, void *type_data)
{
struct net_device *netdev;
struct bnxt *bp;
int rc;

netdev = netdev_notifier_info_to_dev(ptr);
if (!bnxt_is_netdev_indr_offload(netdev))
return NOTIFY_OK;

bp = container_of(nb, struct bnxt, tc_netdev_nb);
return -EOPNOTSUPP;

switch (event) {
case NETDEV_REGISTER:
rc = __flow_indr_block_cb_register(netdev, bp,
bnxt_tc_setup_indr_cb,
bp);
if (rc)
netdev_info(bp->dev,
"Failed to register indirect blk: dev: %s\n",
netdev->name);
break;
case NETDEV_UNREGISTER:
__flow_indr_block_cb_unregister(netdev,
bnxt_tc_setup_indr_cb,
bp);
switch (type) {
case TC_SETUP_BLOCK:
return bnxt_tc_setup_indr_block(netdev, cb_priv, type_data);
default:
break;
}

return NOTIFY_DONE;
return -EOPNOTSUPP;
}

static const struct rhashtable_params bnxt_tc_flow_ht_params = {
Expand Down Expand Up @@ -2074,8 +2046,8 @@ int bnxt_init_tc(struct bnxt *bp)

/* init indirect block notifications */
INIT_LIST_HEAD(&bp->tc_indr_block_list);
bp->tc_netdev_nb.notifier_call = bnxt_tc_indr_block_event;
rc = register_netdevice_notifier(&bp->tc_netdev_nb);

rc = flow_indr_dev_register(bnxt_tc_setup_indr_cb, bp);
if (!rc)
return 0;

Expand All @@ -2101,7 +2073,8 @@ void bnxt_shutdown_tc(struct bnxt *bp)
if (!bnxt_tc_flower_enabled(bp))
return;

unregister_netdevice_notifier(&bp->tc_netdev_nb);
flow_indr_dev_unregister(bnxt_tc_setup_indr_cb, bp,
bnxt_tc_setup_indr_block_cb);
rhashtable_destroy(&tc_info->flow_table);
rhashtable_destroy(&tc_info->l2_table);
rhashtable_destroy(&tc_info->decap_l2_table);
Expand Down
81 changes: 8 additions & 73 deletions drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
Original file line number Diff line number Diff line change
Expand Up @@ -306,20 +306,6 @@ mlx5e_rep_indr_block_priv_lookup(struct mlx5e_rep_priv *rpriv,
return NULL;
}

static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv,
struct net_device *netdev);

void mlx5e_rep_indr_clean_block_privs(struct mlx5e_rep_priv *rpriv)
{
struct mlx5e_rep_indr_block_priv *cb_priv, *temp;
struct list_head *head = &rpriv->uplink_priv.tc_indr_block_priv_list;

list_for_each_entry_safe(cb_priv, temp, head, list) {
mlx5e_rep_indr_unregister_block(rpriv, cb_priv->netdev);
kfree(cb_priv);
}
}

static int
mlx5e_rep_indr_offload(struct net_device *netdev,
struct flow_cls_offload *flower,
Expand Down Expand Up @@ -423,9 +409,14 @@ mlx5e_rep_indr_setup_block(struct net_device *netdev,
struct flow_block_offload *f,
flow_setup_cb_t *setup_cb)
{
struct mlx5e_priv *priv = netdev_priv(rpriv->netdev);
struct mlx5e_rep_indr_block_priv *indr_priv;
struct flow_block_cb *block_cb;

if (!mlx5e_tc_tun_device_to_offload(priv, netdev) &&
!(is_vlan_dev(netdev) && vlan_dev_real_dev(netdev) == rpriv->netdev))
return -EOPNOTSUPP;

if (f->binder_type != FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
return -EOPNOTSUPP;

Expand Down Expand Up @@ -492,76 +483,20 @@ int mlx5e_rep_indr_setup_cb(struct net_device *netdev, void *cb_priv,
}
}

static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv,
struct net_device *netdev)
{
int err;

err = __flow_indr_block_cb_register(netdev, rpriv,
mlx5e_rep_indr_setup_cb,
rpriv);
if (err) {
struct mlx5e_priv *priv = netdev_priv(rpriv->netdev);

mlx5_core_err(priv->mdev, "Failed to register remote block notifier for %s err=%d\n",
netdev_name(netdev), err);
}
return err;
}

static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv,
struct net_device *netdev)
{
__flow_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_cb,
rpriv);
}

static int mlx5e_nic_rep_netdevice_event(struct notifier_block *nb,
unsigned long event, void *ptr)
{
struct mlx5e_rep_priv *rpriv = container_of(nb, struct mlx5e_rep_priv,
uplink_priv.netdevice_nb);
struct mlx5e_priv *priv = netdev_priv(rpriv->netdev);
struct net_device *netdev = netdev_notifier_info_to_dev(ptr);

if (!mlx5e_tc_tun_device_to_offload(priv, netdev) &&
!(is_vlan_dev(netdev) && vlan_dev_real_dev(netdev) == rpriv->netdev))
return NOTIFY_OK;

switch (event) {
case NETDEV_REGISTER:
mlx5e_rep_indr_register_block(rpriv, netdev);
break;
case NETDEV_UNREGISTER:
mlx5e_rep_indr_unregister_block(rpriv, netdev);
break;
}
return NOTIFY_OK;
}

int mlx5e_rep_tc_netdevice_event_register(struct mlx5e_rep_priv *rpriv)
{
struct mlx5_rep_uplink_priv *uplink_priv = &rpriv->uplink_priv;
int err;

/* init indirect block notifications */
INIT_LIST_HEAD(&uplink_priv->tc_indr_block_priv_list);

uplink_priv->netdevice_nb.notifier_call = mlx5e_nic_rep_netdevice_event;
err = register_netdevice_notifier_dev_net(rpriv->netdev,
&uplink_priv->netdevice_nb,
&uplink_priv->netdevice_nn);
return err;
return flow_indr_dev_register(mlx5e_rep_indr_setup_cb, rpriv);
}

void mlx5e_rep_tc_netdevice_event_unregister(struct mlx5e_rep_priv *rpriv)
{
struct mlx5_rep_uplink_priv *uplink_priv = &rpriv->uplink_priv;

/* clean indirect TC block notifications */
unregister_netdevice_notifier_dev_net(rpriv->netdev,
&uplink_priv->netdevice_nb,
&uplink_priv->netdevice_nn);
flow_indr_dev_unregister(mlx5e_rep_indr_setup_cb, rpriv,
mlx5e_rep_indr_setup_tc_cb);
}

#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
Expand Down
4 changes: 0 additions & 4 deletions drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ void mlx5e_rep_encap_entry_detach(struct mlx5e_priv *priv,

int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
void *type_data);
void mlx5e_rep_indr_clean_block_privs(struct mlx5e_rep_priv *rpriv);

bool mlx5e_rep_tc_update_skb(struct mlx5_cqe64 *cqe,
struct sk_buff *skb,
Expand Down Expand Up @@ -65,9 +64,6 @@ static inline int
mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
void *type_data) { return -EOPNOTSUPP; }

static inline void
mlx5e_rep_indr_clean_block_privs(struct mlx5e_rep_priv *rpriv) {}

struct mlx5e_tc_update_priv;
static inline bool
mlx5e_rep_tc_update_skb(struct mlx5_cqe64 *cqe,
Expand Down
1 change: 0 additions & 1 deletion drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
Original file line number Diff line number Diff line change
Expand Up @@ -1018,7 +1018,6 @@ static int mlx5e_init_rep_tx(struct mlx5e_priv *priv)
static void mlx5e_cleanup_uplink_rep_tx(struct mlx5e_rep_priv *rpriv)
{
mlx5e_rep_tc_netdevice_event_unregister(rpriv);
mlx5e_rep_indr_clean_block_privs(rpriv);
mlx5e_rep_bond_cleanup(rpriv);
mlx5e_rep_tc_cleanup(rpriv);
}
Expand Down
5 changes: 0 additions & 5 deletions drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,13 +69,8 @@ struct mlx5_rep_uplink_priv {
* tc_indr_block_cb_priv_list is used to lookup indirect callback
* private data
*
* netdevice_nb is the netdev events notifier - used to register
* tunnel devices for block events
*
*/
struct list_head tc_indr_block_priv_list;
struct notifier_block netdevice_nb;
struct netdev_net_notifier netdevice_nn;

struct mlx5_tun_entropy tun_entropy;

Expand Down
11 changes: 7 additions & 4 deletions drivers/net/ethernet/netronome/nfp/flower/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -830,6 +830,10 @@ static int nfp_flower_init(struct nfp_app *app)
if (err)
goto err_cleanup;

err = flow_indr_dev_register(nfp_flower_indr_setup_tc_cb, app);
if (err)
goto err_cleanup;

if (app_priv->flower_ext_feats & NFP_FL_FEATS_VF_RLIM)
nfp_flower_qos_init(app);

Expand All @@ -856,6 +860,9 @@ static void nfp_flower_clean(struct nfp_app *app)
skb_queue_purge(&app_priv->cmsg_skbs_low);
flush_work(&app_priv->cmsg_work);

flow_indr_dev_unregister(nfp_flower_indr_setup_tc_cb, app,
nfp_flower_setup_indr_block_cb);

if (app_priv->flower_ext_feats & NFP_FL_FEATS_VF_RLIM)
nfp_flower_qos_cleanup(app);

Expand Down Expand Up @@ -959,10 +966,6 @@ nfp_flower_netdev_event(struct nfp_app *app, struct net_device *netdev,
return ret;
}

ret = nfp_flower_reg_indir_block_handler(app, netdev, event);
if (ret & NOTIFY_STOP_MASK)
return ret;

ret = nfp_flower_internal_port_event_handler(app, netdev, event);
if (ret & NOTIFY_STOP_MASK)
return ret;
Expand Down
7 changes: 4 additions & 3 deletions drivers/net/ethernet/netronome/nfp/flower/main.h
Original file line number Diff line number Diff line change
Expand Up @@ -458,9 +458,10 @@ void nfp_flower_qos_cleanup(struct nfp_app *app);
int nfp_flower_setup_qos_offload(struct nfp_app *app, struct net_device *netdev,
struct tc_cls_matchall_offload *flow);
void nfp_flower_stats_rlim_reply(struct nfp_app *app, struct sk_buff *skb);
int nfp_flower_reg_indir_block_handler(struct nfp_app *app,
struct net_device *netdev,
unsigned long event);
int nfp_flower_indr_setup_tc_cb(struct net_device *netdev, void *cb_priv,
enum tc_setup_type type, void *type_data);
int nfp_flower_setup_indr_block_cb(enum tc_setup_type type, void *type_data,
void *cb_priv);

void
__nfp_flower_non_repr_priv_get(struct nfp_flower_non_repr_priv *non_repr_priv);
Expand Down
35 changes: 6 additions & 29 deletions drivers/net/ethernet/netronome/nfp/flower/offload.c
Original file line number Diff line number Diff line change
Expand Up @@ -1619,8 +1619,8 @@ nfp_flower_indr_block_cb_priv_lookup(struct nfp_app *app,
return NULL;
}

static int nfp_flower_setup_indr_block_cb(enum tc_setup_type type,
void *type_data, void *cb_priv)
int nfp_flower_setup_indr_block_cb(enum tc_setup_type type,
void *type_data, void *cb_priv)
{
struct nfp_flower_indr_block_cb_priv *priv = cb_priv;
struct flow_cls_offload *flower = type_data;
Expand Down Expand Up @@ -1708,10 +1708,13 @@ nfp_flower_setup_indr_tc_block(struct net_device *netdev, struct nfp_app *app,
return 0;
}

static int
int
nfp_flower_indr_setup_tc_cb(struct net_device *netdev, void *cb_priv,
enum tc_setup_type type, void *type_data)
{
if (!nfp_fl_is_netdev_to_offload(netdev))
return -EOPNOTSUPP;

switch (type) {
case TC_SETUP_BLOCK:
return nfp_flower_setup_indr_tc_block(netdev, cb_priv,
Expand All @@ -1720,29 +1723,3 @@ nfp_flower_indr_setup_tc_cb(struct net_device *netdev, void *cb_priv,
return -EOPNOTSUPP;
}
}

int nfp_flower_reg_indir_block_handler(struct nfp_app *app,
struct net_device *netdev,
unsigned long event)
{
int err;

if (!nfp_fl_is_netdev_to_offload(netdev))
return NOTIFY_OK;

if (event == NETDEV_REGISTER) {
err = __flow_indr_block_cb_register(netdev, app,
nfp_flower_indr_setup_tc_cb,
app);
if (err)
nfp_flower_cmsg_warn(app,
"Indirect block reg failed - %s\n",
netdev->name);
} else if (event == NETDEV_UNREGISTER) {
__flow_indr_block_cb_unregister(netdev,
nfp_flower_indr_setup_tc_cb,
app);
}

return NOTIFY_OK;
}
Loading

0 comments on commit 2aec17f

Please sign in to comment.