In the previous exercise we implemented ECMP, a very basic (but widely used) technique to load balance traffic across
multiple equal cost paths. ECMP works very well when it has to load balance many small flows with similar sizes (since it
randomly maps them to one of the possible paths). However, real traffic does not look as described above, real traffic is composed by many
small flows, but also but very few that are quite bigger. This makes ECMP suffer from a well-known performance problem such as hash collisions,
in which few big flows end up colliding in the same path. In this exercise we will use state and information provided by the simple_switch's
standard_metadata
to fix the collision problem of ECMP, by implementing flowlet switching on top.
Flowlet switching leverages the burstiness of TCP flows to achieve a better load balancing. TCP flows tend to come in bursts (for instance because a flow needs to wait to get window space). Every time there is gap which is big enough (i.e., 50ms) between packets from the same flow, flowlet switching will rehash the flow to another path (by hashing an ID value together with the 5-tuple).
For more information about flowlet switching check out this paper
As usual, we provide you with the following files:
p4app.json
: describes the topology we want to create with the help of mininet and p4-utils package.network.py
: a Python scripts that initializes the topology using Mininet and P4-Utils. One can use indifferentlynetwork.py
orp4app.json
to start the network.p4src/flowlet_switching.p4
: p4 program skeleton to use as a starting point.p4src/includes
: In the includes directory you will findheaders.p4
andparsers.p4
(which also have to be completed).send.py
: a small python script to send burst of packets that belong to the same flow.
For this exercise (and the next two) we will use a new IP assignment strategy. If you have a look at p4app.json
you will see that
the option is set to mixed
. Therefore, only hosts connected to the same switch will be assigned to the same subnet. Hosts connected
to a different switch will belong to a different /24
subnet. If you use the namings hY
and sX
(e.g h1, h2, s1...), the IP assignment
goes as follows: 10.x.x.y
. Where x
is the switch id (upper and lower bytes), and y
is the host id. For example, in the topology above,
h1
gets 10.0.1.1
and h2
gets 10.0.2.2
.
You can find all the documentation about p4app.json
in the p4-utils
documentation. Also, you can find information about assignment strategies here.
This exercise is an enhancement of ECMP, and thus you can start by copying all the code. You will use exactly the same headers, parser, tables, and cli commands (so you do not need to write this part either).
To solve this exercise you will have to use two registers, one for flowlet_ids
(hash seed) and one to keep the last timestamp of
every flow. You will have to slightly change the ingress logic, define a new action to read/write the flowlet registers. And modify
the hash function used in ECMP, adding a new field (the flowlet_id
) which will vary over time.
You will have to fill the gaps in several files: p4src/flowlet_switching.p4
, p4src/include/headers.p4
and p4src/include/parsers.p4
.
To successfully complete the exercise you have to do the following:
-
Like in the previous exercise, header definitions are already provided.
-
Define the parser that is able to parse packets up to
tcp
. Note that for simplicity we do not considerudp
packets in this exercise. This time you must define the parser in:p4src/include/parsers.p4
. -
Define the deparser. Just emit all the headers.
-
Copy the tables and actions from the previous exercise. You will have to slightly modify them.
-
Define two registers
flowlet_to_id
andflowlet_time_stamp
(for register sizing use the constant defined at the beginning offlowlet_switching.p4
file: REGISTER_SIZE, TIMESTAMP_WIDTH, ID_WIDTH). We will use this two registers to keep two things:-
In
flowlet_to_id
register we keep the id (a random generated number) of each flowlet, this id is now added to the hash function that devices the output port. As long as this id does not change, packets for that flow will stay in the same path. -
In
flowlet_time_stamp
register we keep the last timestamp for the last observed packet belonging to a flow.
-
-
Define an action to read the flowlet's register values (
read_flowlet_registers
). In this action you will have to hash the 5-tuple of every packet the index you will use to read the flowlet registers (to save the index you will need to define a new metadata field with a width size of 14 bits). Using the index you got from the hash function read flowlet id and last timestamp and save them in a metadata field (you also have to define them). Finally, update the timestamp register usingstandard_metadata.ingress_global_timestamp
. -
Define another action to update the flowlet id (
update_flowlet_id
). We will use this action to update flowlet ids when needed. In this action you just have to generate a random number, and then save it in the flowlet to id register (using the id you already computed previously). -
Modify the
hash
function you defined in the ECMP exercise (ecmp_group
), now instead of just hashing the 5-tuple, you have to add the metadata field where you store theflowlet_id
you read from the register (or you just updated). -
Define the ingress control logic (keep the logic from the ecmp example and add):
Before applying the
ipv4_lpm
table:- Read the flowlet registers (calling the action)
- Compute the time difference between now and the last packet observed for the current flow.
- Check if the time difference is bigger than
FLOWLET_TIMEOUT
(define at the beginning of the file with a default value of 200ms). - Update the flowlet id if the difference is bigger. Updating the flowlet id will make the hash function output a new value.
- Apply
ipv4_lpm
andecmp_group
is the same way you did inecmp
.
-
Copy the
sX-commands.txt
from the previous exercise.
Once you have the flowlet_switching.p4
program finished you can test its behaviour:
-
Start the topology (this will also compile and load the program).
sudo p4run
or
sudo python network.py
-
Check that you can ping:
mininet> pingall
-
Monitor the 4 links from
s1
that will be used duringecmp
(froms1-eth2
tos1-eth5
). Doing this you will be able to check which path is each flow taking.sudo tcpdump -enn -i s1-ethX
-
Ping between two hosts:
If you run a normal ping from the mininet cli, or using the terminal, by default it will send a ping packet every 1 second. In this case every ping should belong to a different flowlet, and thus it should be crossing different paths all the time.
-
Do iperf between two hosts:
If you do iperf between
h1
andh2
you should see all the packets cross the same interfaces almost all the time (unless you set the gap interval very small). -
Get a terminal in
h1
. Use thesend.py
script.python send.py 10.0.6.2 1000 <sleep_time_between_packets>
This will send
tcp syn
packets with the same 5-tuple. You can play with the sleep time (third parameter). If you set it bigger than your gap, packets should change paths, if you set it smaller (set it quite smaller since the software model is not very precise) you will see all the packets cross the same interfaces.
We have added a small guideline in the documentation section. Use it as a reference when things do not work as expected.