You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on my investigations, one of the main issues I see with Airship openstack is the lack of tuning regarding the OVS, Openstack and Calico processes. I have observed many of these processes running on CPU reserved for virtual machines. It is important to note that most of these processes are actually related to networking. This is a wider problem that should be submitted to the Airship community.
My investigations have mainly revealed that a single process is basically responsible of 95% of the ping delays. This is the "neutron-sriov-nic-agent". This process, most of the times runs, on CPU 4 or 5. The virtual machine associated with this CPU is always getting hit very hard. A simple procedure to change the CPU affinity of this process improves the ping delays by 95%. No restart is needed. This is a runtime adjustment.
See the example summary (full printouts in attachment):
Worker 9 having issues (look at average and max round trip time)
145 packets transmitted, 145 received, 0% packet loss, time 145123ms
rtt min/avg/max/mdev = 0.130/294.980/1769.737/378.595 ms, pipe 2
Worker 9 runs on compute 6
Worker9 uses CPU 5
neutron-sriov-nic-agent running on CPU 5 on compute 6
Once the neutron-sriov-nic-agent process is assigned to a host reserved CPU, Worker 9 returns to normal (go check the printout in the example, it is evident when the process got re-assigned to another CPU):
90 packets transmitted, 90 received, 0% packet loss, time 91010ms
rtt min/avg/max/mdev = 0.163/2.103/115.250/12.729 ms
Ping results are not perfect but at least 95% better. Small pikes are observed from time to time. There must be other processes affecting the virtual machine.
As a short term solution, the next step would be to implement a startup script to fix this issue automatically when the neutron-sriov-nic-agent appears in the system. A long term solution would be a correction from Airship to insure no operating system based processes or Airship processes run on virtual machine reserved CPUs.
If you have any questions regarding this document, let me know.
courtesy: Claude
The text was updated successfully, but these errors were encountered:
Based on my investigations, one of the main issues I see with Airship openstack is the lack of tuning regarding the OVS, Openstack and Calico processes. I have observed many of these processes running on CPU reserved for virtual machines. It is important to note that most of these processes are actually related to networking. This is a wider problem that should be submitted to the Airship community.
My investigations have mainly revealed that a single process is basically responsible of 95% of the ping delays. This is the "neutron-sriov-nic-agent". This process, most of the times runs, on CPU 4 or 5. The virtual machine associated with this CPU is always getting hit very hard. A simple procedure to change the CPU affinity of this process improves the ping delays by 95%. No restart is needed. This is a runtime adjustment.
See the example summary (full printouts in attachment):
145 packets transmitted, 145 received, 0% packet loss, time 145123ms
rtt min/avg/max/mdev = 0.130/294.980/1769.737/378.595 ms, pipe 2
90 packets transmitted, 90 received, 0% packet loss, time 91010ms
rtt min/avg/max/mdev = 0.163/2.103/115.250/12.729 ms
As a short term solution, the next step would be to implement a startup script to fix this issue automatically when the neutron-sriov-nic-agent appears in the system. A long term solution would be a correction from Airship to insure no operating system based processes or Airship processes run on virtual machine reserved CPUs.
If you have any questions regarding this document, let me know.
courtesy: Claude
The text was updated successfully, but these errors were encountered: