OVS DPDK performance evaluation

OVS DPDK performance evaluation Setup:

Nmap and sfr_delay_10_1g from trex was used to generate a traffic that creates around 200k active flows.
Nmap test: nmap -p- 7.7.7.101-200

Env for nmap

OVS DPDK performance evaluation

Trex sfr test:   t-rex-64 -f avl/sfr_delay_10_1g.yaml -c 1 -m 10 -d 100 -p
Config:

– port_limit : 2
version : 2
#List of interfaces. Change to suit your setup. Use ./dpdk_setup_ports.py -s to see available options
interfaces : [“00:02.0″,”00:03.0”]
port_info : # set eh mac addr
– dest_mac : [0x52,0x54,0x00,0x02,0xd9,0x04] # port 0
src_mac : [0x52,0x54,0x00,0x02,0xd9,0x03]
– dest_mac : [0x52,0x54,0x00,0x02,0xd9,0x03] # port 1
src_mac : [0x52,0x54,0x00,0x02,0xd9,0x04]

Env for trex

ovs-dpdk-performance-evaluation1.png

Flow patterns with varying EMC and Megaflow configuration:

Running tests with different probability of insertion to EMC and with megaflow enabled/disable, might help in measuring the performance of upcalls, revalidators, flow expiration.
After running the above two tests with megaflow disable/enabled and different EMC value it was observed that, megaflow need to be disabled to create more active flows.
It was not possible to obtain 200k active flows with above two tests when megaflow is enabled.For example with megaflow enabled maximum flow that was generated was 104

flows : (current 0) (avg 0) (max 104)
pmd thread numa_id 0 core_id 0:
emc hits:8
megaflow hits:13107353
avg. subtable lookups per hit:1.01
miss:143
but with megaflow disabled the maximum flow obtained was 219271
flows : (current 0) (avg 0) (max 219271) (limit 200000)
pmd thread numa_id 0 core_id 0:
emc hits:9
megaflow hits:0
avg. subtable lookups per hit:0.00
miss:13107826

And the value of the probability of insertion to EMC does not seems to affect the number of flows while running tests.Number of flow generated around 200k for different values of emc-insert-inv-prob
emc-insert-inv-prob=100, the default value

lows : (current 0) (avg 0) (max 219271)
emc-insert-inv-prob=1, 100% probability of insertion to EMC
flows : (current 0) (avg 0) (max 208370) (limit 200000)
emc-insert-inv-prob=0, 0% probability of insertion to EMC, EMC disabled
flows : (current 0) (avg 0) (max 217570) (limit 200000)

Currently using ovs-dpdk-datapath-classifier-part-2 as reference for tracing and profiling the call graph for packet processing in OVS

Up to 15% of cycles spent in revalidate, with megaflow disables

+ 7.98% 0.00% urcu1 ovs-vswitchd [.] ovsthread_wrapper ▒

+ 7.98% 0.00% urcu1 libpthread-2.23.so [.] start_thread ▒

+ 7.98% 0.00% urcu1 libc-2.23.so [.] __clone ▒

+ 7.98% 0.08% urcu1 ovs-vswitchd [.] ovsrcu_postpone_thread ▒

6.80% 6.80% revalidator12 ovs-vswitchd [.] revalidate.isra.19

Below is extended test results of the test cases with megaflow enabled/disabled and different emc-insert-inv-prob value.

Emc-insert probability megaflow Emc hits Megaflow hits miss
OVS 2.6 – nmap Default Enabled 8 13107353 143
OVS 2.6 – trex Default Enabled 13592393 7924727 5
OVS 2.6 – nmap Default Disabled 9 0 13107826
OVS 2.6 – trex Default Disabled 67540150 119919463 27677119
OVS 2.6 – nmap 1 Disabled 22 0 13107705
OVS 2.6 – nmap 0 Disabled 8 0 13108184
OVS (latest) – nmap 0 Enabled 0 13107361 146
OVS (latest) – nmap 1 Enabled 25 13107476 155

nmap with megaflow enabled and default emc-insert-inv-prob

pmd thread numa_id 0 core_id 0:
emc hits:8
megaflow hits:13107353
avg. subtable lookups per hit:1.01
miss:143
lost:0
polling cycles:224722269058 (87.28%)
processing cycles:32742614492 (12.72%)
avg cycles per packet: 9781.99 (257464883550/26320306)
avg processing cycles per packet: 1244.01 (32742614492/26320306)

trex with megaflow enabled and default emc-insert-inv-prob

pmd thread numa_id 0 core_id 0:
emc hits:13592393
megaflow hits:7924727
avg. subtable lookups per hit:1.00
miss:5
lost:0
polling cycles:444135983951 (96.67%)
processing cycles:15289810525 (3.33%)
avg cycles per packet: 15596.58 (459425794476/29456829)
avg processing cycles per packet: 519.06 (15289810525/29456829)

nmap with megaflow disabled and default emc-insert-inv-prob

netdev@ovs-netdev:
flows : (current 0) (avg 0) (max 219271) (limit 200000)
dump duration : 1ms
ufid enabled : true
7: (keys 0)
pmd thread numa_id 0 core_id 0:
emc hits:9
megaflow hits:0
avg. subtable lookups per hit:0.00
miss:13107826
lost:0
polling cycles:494746899819 (73.38%)
processing cycles:179503783259 (26.62%)
avg cycles per packet: 51438.75 (674250683078/13107835)
avg processing cycles per packet: 13694.39 (179503783259/13107835)

trex with megaflow disabled and default emc-insert-inv-prob

netdev@ovs-netdev:
flows : (current 0) (avg 0) (max 209715) (limit 200000)
dump duration : 1ms
ufid enabled : true
7: (keys 0)
pmd thread numa_id 0 core_id 0:
emc hits:67540150
megaflow hits:119919463
avg. subtable lookups per hit:1.42
miss:27677119
lost:0
polling cycles:604568670107 (29.01%)
processing cycles:1479751125672 (70.99%)
avg cycles per packet: 5401.76 (2084319795779/385859589)
avg processing cycles per packet: 3834.95 (1479751125672/385859589)

nmap with megaflow disabled and emc-insert-inv-prob 1

[root@localhost openvswitch-2.6.1]# ./utilities/ovs-appctl upcall/show
netdev@ovs-netdev:
flows : (current 0) (avg 0) (max 208370) (limit 200000)
dump duration : 1ms
ufid enabled : true
6: (keys 0)
[root@localhost openvswitch-2.6.1]# ./utilities/ovs-appctl dpctl/dump-flows netdev@ovs-netdev
[root@localhost openvswitch-2.6.1]# ./utilities/ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 0 core_id 0:
emc hits:22
megaflow hits:0
avg. subtable lookups per hit:0.00
miss:13107705
lost:0
polling cycles:476162914613 (73.36%)
processing cycles:172889441682 (26.64%)
avg cycles per packet: 49516.77 (649052356295/13107727)
avg processing cycles per packet: 13189.89 (172889441682/13107727)

nmap with megaflow disabled and emc-insert-inv-prob 0

[root@localhost openvswitch-2.6.1]# ./utilities/ovs-appctl upcall/show
netdev@ovs-netdev:
flows : (current 0) (avg 0) (max 217570) (limit 200000)
dump duration : 1ms
ufid enabled : true
7: (keys 0)
pmd thread numa_id 0 core_id 0:
emc hits:8
megaflow hits:0
avg. subtable lookups per hit:0.00
miss:13108184
lost:0
polling cycles:168494002964 (48.21%)
processing cycles:181018583110 (51.79%)
avg cycles per packet: 26663.68 (349512586074/13108192)
avg processing cycles per packet: 13809.58 (181018583110/13108192)

nmap with patched ovs, megaflow enabled and emc-insert-inv-prob 0

pmd thread numa_id 0 core_id 0:
emc hits:0
megaflow hits:13107361
avg. subtable lookups per hit:1.02
miss:146
lost:0
polling cycles:140234173492 (68.20%)
processing cycles:65401982081 (31.80%)
avg cycles per packet: 7779.18 (205636155573/26434165)
avg processing cycles per packet: 2474.15 (65401982081/26434165)

nmap with patched ovs, megaflow enabled and emc-insert-inv-prob 1

flows : (current 0) (avg 0) (max 104) (limit 10000)
dump duration : 1ms
ufid enabled : true
6: (keys 0)
pmd thread numa_id 0 core_id 0:
emc hits:25
megaflow hits:13107476
avg. subtable lookups per hit:1.00
miss:155
lost:0
polling cycles:597268748462 (89.70%)
processing cycles:68602128353 (10.30%)
avg cycles per packet: 25362.44 (665870876815/26254210)
avg processing cycles per packet: 2613.00 (68602128353/26254210)

OVS call stack

dp_netdev_process_rxq_port
|
|– netdev_rxq_recv
|
|– dp_netdev_input
|
|– dp_netdev_input__
|
|–emc_processing ## emc miss causes fast_path_processing
|
| — fast_path_processing
|– dp_netdev_pmd_lookup_dpcls ## get classifier for the port
|– dpcls_lookup ### causes upcall when it fails to match in subtables
|
|– netdev_flow_key_hash_in_mask
|
|– handle_packet_upcall ##called with dpif_upcall_type type=DPIF_UC_MISS
|
|– dp_netdev_upcall
|
|– upcall_cb ofproto/ofproto-dpif-upcall.c
|
|– upcall_receive
|– process_upcall
|
|–classify_upcall## identfies the type of upcall MISS_UPCALL,SFLOW_UPCALL
|– upcall_xlate ### in case of MISS_UPCALL
|
|– ukey_create_from_upcall
|
|– flow_wildcards_init_for_packet ## Converts a flow into flow wildcards.
|– emc_probabilistic_insert #add to emc
|– emc_insert

Megaflow hit pattern

For a given flow rule evaluate creation and distribution of megaflow hits among the generated flows.

Traffic pattern:
Nmap was used as traffic generator, with nmap client VM IP set to 7.7.7.1 and nmap target vm set to range 7.7.7.101-200

Traffic generator command: Namp -p- 7.7.7.101-200 run on client

Scenarios considered for analyse,

  • Default setup with no rules
  • Behavior(flow generated and megaflow hits) when rules added to drop packets
  • With varying distribution of ports
  • Subset of rule with priority
  • Overlapping negative rule with priority

Default setup with no rules:

Both tcp and udp use the same flow

For udp and tcp traffic flow created based on recirc_id(0),in_port(2),eth(src=52:54:00:02:d9:03,dst=52:54:00:02:d9:04),eth_type(0x0800),ipv4(frag=no), hence both tcp and udp match same flow.

With varying distribution of ports

Drop on a single port

priority=65535,tcp,tp_dst=90
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=310.612s, table=0, n_packets=200, n_bytes=11600, idle_age=223, priority=65535,tcp,tp_dst=90 actions=drop
cookie=0x0, duration=14784.512s, table=0, n_packets=131050450, n_bytes=7338758212, idle_age=211, priority=0 actions=NORMAL

pmd thread numa_id 0 core_id 0:
emc hits:0
megaflow hits:26215100
avg. subtable lookups per hit:2.02
miss:339
lost:0
polling cycles:1718918822602 (92.20%)
processing cycles:145365311077 (7.80%)
avg cycles per packet: 23565.15 (1864284133679/79111900)
avg processing cycles per packet: 1837.46 (145365311077/79111900)

Drop on multiple ports 90,91 (adjacent)

NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=703.939s, table=0, n_packets=400, n_bytes=23200, idle_age=182, priority=65535,tcp,tp_dst=90 actions=drop
cookie=0x0, duration=262.234s, table=0, n_packets=200, n_bytes=11600, idle_age=187, priority=65535,tcp,tp_dst=91 actions=drop
cookie=0x0, duration=15177.839s, table=0, n_packets=144157608, n_bytes=8072752848, idle_age=169, priority=0 actions=NORMAL

pmd thread numa_id 0 core_id 0:
emc hits:0
megaflow hits:13107424
avg. subtable lookups per hit:1.55
miss:134
lost:0
polling cycles:456656331253 (87.01%)
processing cycles:68160626106 (12.99%)
avg cycles per packet: 15711.41 (524816957359/33403561)
avg processing cycles per packet: 2040.52 (68160626106/33403561)

Drop on multiple ports 22,5900 (random)

NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=916.133s, table=0, n_packets=495, n_bytes=28710, idle_age=234, priority=65535,tcp,tp_dst=22 actions=drop
cookie=0x0, duration=908.148s, table=0, n_packets=495, n_bytes=28710, idle_age=234, priority=65535,tcp,tp_dst=5900 actions=drop
cookie=0x0, duration=1058.716s, table=0, n_packets=26214175, n_bytes=1467981542, idle_age=211, priority=0 actions=NORMAL

pmd thread numa_id 0 core_id 0:
emc hits:0
megaflow hits:13107248
avg. subtable lookups per hit:1.54
miss:148
lost:0
polling cycles:521231962931 (88.32%)
processing cycles:68939617774 (11.68%)
avg cycles per packet: 17742.96 (590171580705/33262300)
avg processing cycles per packet: 2072.61 (68939617774/33262300)

Drop on range of ports(using mask)

NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=763.930s, table=0, n_packets=1632, n_bytes=94656, idle_age=638, priority=65535,tcp,tp_dst=0x58/0xfff8 actions=drop
cookie=0x0, duration=16429.639s, table=0, n_packets=170369811, n_bytes=9540625486, idle_age=638, priority=0 actions=NORMAL

pmd thread numa_id 0 core_id 0:
emc hits:0
megaflow hits:13107414
avg. subtable lookups per hit:1.52
miss:123
lost:0
polling cycles:1433383284938 (95.52%)
processing cycles:67299002441 (4.48%)
avg cycles per packet: 45368.62 (1500682287379/33077539)
avg processing cycles per packet: 2034.58 (67299002441/33077539)

Drop on range of ports (individual flows)

NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=1071.015s, table=0, n_packets=600, n_bytes=34800, idle_age=231, priority=65535,tcp,tp_dst=90 actions=drop
cookie=0x0, duration=629.310s, table=0, n_packets=400, n_bytes=23200, idle_age=224, priority=65535,tcp,tp_dst=91 actions=drop
cookie=0x0, duration=332.275s, table=0, n_packets=200, n_bytes=11600, idle_age=227, priority=65535,tcp,tp_dst=92 actions=drop
cookie=0x0, duration=329.065s, table=0, n_packets=200, n_bytes=11600, idle_age=224, priority=65535,tcp,tp_dst=93 actions=drop
cookie=0x0, duration=325.884s, table=0, n_packets=200, n_bytes=11600, idle_age=232, priority=65535,tcp,tp_dst=94 actions=drop
cookie=0x0, duration=321.222s, table=0, n_packets=200, n_bytes=11600, idle_age=243, priority=65535,tcp,tp_dst=95 actions=drop
cookie=0x0, duration=15544.915s, table=0, n_packets=157263906, n_bytes=8806700164, idle_age=220, priority=0 actions=NORMAL

pmd thread numa_id 0 core_id 0:
emc hits:0
megaflow hits:13107353
avg. subtable lookups per hit:1.55
miss:145
lost:0
polling cycles:573051591938 (89.02%)
processing cycles:70647647633 (10.98%)
avg cycles per packet: 19288.41 (643699239571/33372334)
avg processing cycles per packet: 2116.95 (70647647633/33372334)

Observation:

Flow rule using mask has far fewer number of flows compared to equivalent rule based on individual flows.

Subset of rule with priority

Drop on IP and port with different priority rule

NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=506.260s, table=0, n_packets=16, n_bytes=928, idle_age=436, priority=65535,tcp,nw_dst=7.7.7.104,tp_dst=0x58/0xfff8 actions=drop
cookie=0x0, duration=534.825s, table=0, n_packets=3488, n_bytes=202304, idle_age=343, priority=65534,tcp,tp_dst=0x50/0xfff0 actions=drop
cookie=0x0, duration=14320.076s, table=0, n_packets=117943152, n_bytes=6604754896, idle_age=342, priority=0 actions=NORMAL

pmd thread numa_id 0 core_id 0:
emc hits:0
megaflow hits:13107736
avg. subtable lookups per hit:2.47
miss:205
lost:0
polling cycles:839604875890 (91.50%)
processing cycles:77960165781 (8.50%)
avg cycles per packet: 20171.47 (917565041671/45488269)
avg processing cycles per packet: 1713.85 (77960165781/45488269)

Drop on IP and port with same priority rule

NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=13172.593s, table=0, n_packets=23376, n_bytes=1355808, idle_age=734, priority=65535,tcp,tp_dst=0x50/0xfff0 actions=drop
cookie=0x0, duration=13132.160s, table=0, n_packets=0, n_bytes=0, idle_age=13132, priority=65535,tcp,nw_dst=7.7.7.104,tp_dst=0x58/0xfff8 actions=drop
cookie=0x0, duration=13531.996s, table=0, n_packets=104838715, n_bytes=5870913438, idle_age=734, priority=0 actions=NORMAL

pmd thread numa_id 0 core_id 0:
emc hits:0
megaflow hits:13107488
avg. subtable lookups per hit:2.40
miss:202
lost:0
polling cycles:1040592199369 (92.92%)
processing cycles:79341061547 (7.08%)
avg cycles per packet: 25109.17 (1119933260916/44602553)
avg processing cycles per packet: 1778.85 (79341061547/44602553)

Observation:

No observable difference in number of flows, when flow rule with same/varying priority is subset of another rule

Overlapping negative rule with priority

Higher priority rule which allow packets on a port , over a subset of range of ports to drop

NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=485.046s, table=0, n_packets=1, n_bytes=58, idle_age=405, priority=65535,tcp,nw_dst=7.7.7.104,tp_dst=90 actions=output:2
cookie=0x0, duration=460.826s, table=0, n_packets=294, n_bytes=17052, idle_age=345, priority=65533,tcp,tp_dst=90 actions=drop
cookie=0x0, duration=22605.741s, table=0, n_packets=66186128, n_bytes=3706393004, idle_age=337, priority=0 actions=NORMAL

pmd thread numa_id 0 core_id 0:
emc hits:0
megaflow hits:13107651
avg. subtable lookups per hit:2.62
miss:241
lost:0
polling cycles:793656888377 (91.15%)
processing cycles:77049701271 (8.85%)
avg cycles per packet: 18327.99 (870706589648/47506933)
avg processing cycles per packet: 1621.86 (77049701271/47506933)

Higher priority rule which allow packets on multiple ports , over a subset of range of ports to drop

NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=1195.522s, table=0, n_packets=2, n_bytes=116, idle_age=273, priority=65535,tcp,nw_dst=7.7.7.104,tp_dst=90 actions=output:2
cookie=0x0, duration=1171.302s, table=0, n_packets=558, n_bytes=32364, idle_age=214, priority=65533,tcp,tp_dst=90 actions=drop
cookie=0x0, duration=298.201s, table=0, n_packets=266, n_bytes=15428, idle_age=210, priority=65533,tcp,tp_dst=91 actions=drop
cookie=0x0, duration=23316.217s, table=0, n_packets=79293342, n_bytes=4440391844, idle_age=202, priority=0 actions=NORMAL

pmd thread numa_id 0 core_id 0:
emc hits:0
megaflow hits:13107492
avg. subtable lookups per hit:2.66
miss:253
lost:0
polling cycles:510428258044 (86.65%)
processing cycles:78637881424 (13.35%)
avg cycles per packet: 12268.58 (589066139468/48014208)
avg processing cycles per packet: 1637.80 (78637881424/48014208)

Observation:

Observable increase in number of flows in the presence of a rule with different action on port.

Overall observation based on the above flow rules:

In the presence of high number of flows rules (example: rules with match on port) even with same action, mega flow would end up performing as EMC, since OVS does not aggregate the individual flow rules.

Links:

Trex setup:
build-your-own-dpdk-traffic-generator
Multi queue:
configure-vhost-user-multiqueue-for-ovs-with-dpdk
Intel dpdk vswitch performance:
intel_dpdk_vswitch_performance_figures_0.10.0_0.pdf