Skip to content

Commit

Permalink
dpif-netdev: Per-port configurable EMC.
Browse files Browse the repository at this point in the history
Conditional EMC insert helps a lot in scenarios with high numbers
of parallel flows, but in current implementation this option affects
all the threads and ports at once. There are scenarios where we have
different number of flows on different ports. For example, if one
of the VMs encapsulates traffic using additional headers, it will
receive large number of flows but only few flows will come out of
this VM. In this scenario it's much faster to use EMC instead of
classifier for traffic from the VM, but it's better to disable EMC
for the traffic which flows to VM.

To handle above issue introduced 'emc-enable' configurable to
enable/disable EMC on a per-port basis. Ex.:

  ovs-vsctl set interface dpdk0 other_config:emc-enable=false

EMC probability kept as is and it works for all the ports with
'emc-enable=true'.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
  • Loading branch information
igsilya authored and istokes committed Jan 18, 2019
1 parent b9ada83 commit 2fbadeb
Show file tree
Hide file tree
Showing 4 changed files with 126 additions and 12 deletions.
13 changes: 13 additions & 0 deletions Documentation/topics/dpdk/bridge.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,19 @@ observed with pmd stats::
For certain traffic profiles with many parallel flows, it's recommended to set
``N`` to '0' to achieve higher forwarding performance.

It is also possible to enable/disable EMC on per-port basis using::

$ ovs-vsctl set interface <iface> other_config:emc-enable={true,false}

.. note::

This could be useful for cases where different number of flows expected on
different ports. For example, if one of the VMs encapsulates traffic using
additional headers, it will receive large number of flows but only few flows
will come out of this VM. In this scenario it's much faster to use EMC
instead of classifier for traffic from the VM, but it's better to disable
EMC for the traffic which flows to the VM.

For more information on the EMC refer to :doc:`/intro/install/dpdk` .


Expand Down
7 changes: 5 additions & 2 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,14 @@ Post-v2.10.0
allocated dynamically using the following syntax:
ovn-nbctl lsp-set-addresses <port> "dynamic <IP>"
- DPDK:
* Add support for DPDK 18.11
* Add support for port representors.
- Userspace datapath:
* Add option for simple round-robin based Rxq to PMD assignment.
It can be set with pmd-rxq-assign.
* Add support for DPDK 18.11
* Add support for Auto load balancing of PMDs (experimental)
* Add support for port representors.
* Added new per-port configurable option to manage EMC:
'other_config:emc-enable'.
- Add 'symmetric_l3' hash function.
- OVS now honors 'updelay' and 'downdelay' for bonds with LACP configured.
- ovs-vswitchd:
Expand Down
99 changes: 89 additions & 10 deletions lib/dpif-netdev.c
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,7 @@ struct dp_netdev_port {
unsigned n_rxq; /* Number of elements in 'rxqs' */
unsigned *txq_used; /* Number of threads that use each tx queue. */
struct ovs_mutex txq_used_mutex;
bool emc_enabled; /* If true EMC will be used. */
char *type; /* Port type as requested by user. */
char *rxq_affinity_list; /* Requested affinity of rx queues. */
};
Expand Down Expand Up @@ -588,6 +589,7 @@ static void dp_netdev_actions_free(struct dp_netdev_actions *);
struct polled_queue {
struct dp_netdev_rxq *rxq;
odp_port_t port_no;
bool emc_enabled;
};

/* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
Expand Down Expand Up @@ -617,6 +619,8 @@ struct dp_netdev_pmd_thread_ctx {
long long now;
/* RX queue from which last packet was received. */
struct dp_netdev_rxq *last_rxq;
/* EMC insertion probability context for the current processing cycle. */
uint32_t emc_insert_min;
};

/* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate
Expand Down Expand Up @@ -1798,6 +1802,7 @@ port_create(const char *devname, const char *type,
port->netdev = netdev;
port->type = xstrdup(type);
port->sf = sf;
port->emc_enabled = true;
port->need_reconfigure = true;
ovs_mutex_init(&port->txq_used_mutex);

Expand Down Expand Up @@ -2830,9 +2835,7 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd,
* default the value is UINT32_MAX / 100 which yields an insertion
* probability of 1/100 ie. 1% */

uint32_t min;

atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);
uint32_t min = pmd->ctx.emc_insert_min;

if (min && random_uint32() <= min) {
emc_insert(&(pmd->flow_cache).emc_cache, key, flow);
Expand Down Expand Up @@ -3698,7 +3701,8 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute)
ovs_mutex_lock(&dp->non_pmd_mutex);
}

/* Update current time in PMD context. */
/* Update current time in PMD context. We don't care about EMC insertion
* probability, because we are on a slow path. */
pmd_thread_ctx_time_update(pmd);

/* The action processing expects the RSS hash to be valid, because
Expand Down Expand Up @@ -3842,7 +3846,7 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
if (insert_min != cur_min) {
atomic_store_relaxed(&dp->emc_insert_min, insert_min);
if (insert_min == 0) {
VLOG_INFO("EMC has been disabled");
VLOG_INFO("EMC insertion probability changed to zero");
} else {
VLOG_INFO("EMC insertion probability changed to 1/%llu (~%.2f%%)",
insert_prob, (100 / (float)insert_prob));
Expand Down Expand Up @@ -3965,8 +3969,29 @@ dpif_netdev_port_set_rxq_affinity(struct dp_netdev_port *port,
return error;
}

/* Changes the affinity of port's rx queues. The changes are actually applied
* in dpif_netdev_run(). */
/* Returns 'true' if one of the 'port's RX queues exists in 'poll_list'
* of given PMD thread. */
static bool
dpif_netdev_pmd_polls_port(struct dp_netdev_pmd_thread *pmd,
struct dp_netdev_port *port)
OVS_EXCLUDED(pmd->port_mutex)
{
struct rxq_poll *poll;
bool found = false;

ovs_mutex_lock(&pmd->port_mutex);
HMAP_FOR_EACH (poll, node, &pmd->poll_list) {
if (port == poll->rxq->port) {
found = true;
break;
}
}
ovs_mutex_unlock(&pmd->port_mutex);
return found;
}

/* Updates port configuration from the database. The changes are actually
* applied in dpif_netdev_run(). */
static int
dpif_netdev_port_set_config(struct dpif *dpif, odp_port_t port_no,
const struct smap *cfg)
Expand All @@ -3975,10 +4000,49 @@ dpif_netdev_port_set_config(struct dpif *dpif, odp_port_t port_no,
struct dp_netdev_port *port;
int error = 0;
const char *affinity_list = smap_get(cfg, "pmd-rxq-affinity");
bool emc_enabled = smap_get_bool(cfg, "emc-enable", true);

ovs_mutex_lock(&dp->port_mutex);
error = get_port_by_number(dp, port_no, &port);
if (error || !netdev_is_pmd(port->netdev)
if (error) {
goto unlock;
}

if (emc_enabled != port->emc_enabled) {
struct dp_netdev_pmd_thread *pmd;
struct ds ds = DS_EMPTY_INITIALIZER;
uint32_t cur_min, insert_prob;

port->emc_enabled = emc_enabled;
/* Mark for reload all the threads that polls this port and request
* for reconfiguration for the actual reloading of threads. */
CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
if (dpif_netdev_pmd_polls_port(pmd, port)) {
pmd->need_reload = true;
}
}
dp_netdev_request_reconfigure(dp);

ds_put_format(&ds, "%s: EMC has been %s.",
netdev_get_name(port->netdev),
(emc_enabled) ? "enabled" : "disabled");
if (emc_enabled) {
ds_put_cstr(&ds, " Current insertion probability is ");
atomic_read_relaxed(&dp->emc_insert_min, &cur_min);
if (!cur_min) {
ds_put_cstr(&ds, "zero.");
} else {
insert_prob = UINT32_MAX / cur_min;
ds_put_format(&ds, "1/%"PRIu32" (~%.2f%%).",
insert_prob, 100 / (float) insert_prob);
}
}
VLOG_INFO("%s", ds_cstr(&ds));
ds_destroy(&ds);
}

/* Checking for RXq affinity changes. */
if (!netdev_is_pmd(port->netdev)
|| nullable_string_is_equal(affinity_list, port->rxq_affinity_list)) {
goto unlock;
}
Expand Down Expand Up @@ -5123,6 +5187,13 @@ dpif_netdev_run(struct dpif *dpif)
if (!netdev_is_pmd(port->netdev)) {
int i;

if (port->emc_enabled) {
atomic_read_relaxed(&dp->emc_insert_min,
&non_pmd->ctx.emc_insert_min);
} else {
non_pmd->ctx.emc_insert_min = 0;
}

for (i = 0; i < port->n_rxq; i++) {
if (dp_netdev_process_rxq_port(non_pmd,
&port->rxqs[i],
Expand Down Expand Up @@ -5296,6 +5367,7 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread *pmd,
HMAP_FOR_EACH (poll, node, &pmd->poll_list) {
poll_list[i].rxq = poll->rxq;
poll_list[i].port_no = poll->rxq->port->port_no;
poll_list[i].emc_enabled = poll->rxq->port->emc_enabled;
i++;
}

Expand Down Expand Up @@ -5360,6 +5432,14 @@ pmd_thread_main(void *f_)
pmd_perf_start_iteration(s);

for (i = 0; i < poll_cnt; i++) {

if (poll_list[i].emc_enabled) {
atomic_read_relaxed(&pmd->dp->emc_insert_min,
&pmd->ctx.emc_insert_min);
} else {
pmd->ctx.emc_insert_min = 0;
}

process_packets =
dp_netdev_process_rxq_port(pmd, poll_list[i].rxq,
poll_list[i].port_no);
Expand Down Expand Up @@ -6301,15 +6381,14 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
struct dfc_cache *cache = &pmd->flow_cache;
struct dp_packet *packet;
const size_t cnt = dp_packet_batch_size(packets_);
uint32_t cur_min;
uint32_t cur_min = pmd->ctx.emc_insert_min;
int i;
uint16_t tcp_flags;
bool smc_enable_db;
size_t map_cnt = 0;
bool batch_enable = true;

atomic_read_relaxed(&pmd->dp->smc_enable_db, &smc_enable_db);
atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
pmd_perf_update_counter(&pmd->perf_stats,
md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV,
cnt);
Expand Down
19 changes: 19 additions & 0 deletions vswitchd/vswitch.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3101,6 +3101,25 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
</column>
</group>

<group title="EMC (Exact Match Cache) Configuration">
<p>
These settings controls behaviour of EMC lookups/insertions for packets
received from the interface.
</p>

<column name="other_config" key="emc-enable" type='{"type": "boolean"}'>
<p>
Specifies if Exact Match Cache (EMC) should be used while processing
packets received from this interface.
If true, <ref table="Open_vSwitch" column="other_config"
key="emc-insert-inv-prob"/> will have effect on this interface.
</p>
<p>
Defaults to true.
</p>
</column>
</group>

<group title="MTU">
<p>
The MTU (maximum transmission unit) is the largest amount of data
Expand Down

0 comments on commit 2fbadeb

Please sign in to comment.