Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ping Latency Checkpoint #90

Open
nagajagan opened this issue Feb 16, 2021 · 2 comments
Open

Ping Latency Checkpoint #90

nagajagan opened this issue Feb 16, 2021 · 2 comments

Comments

@nagajagan
Copy link

Based on my investigations, one of the main issues I see with Airship openstack is the lack of tuning regarding the OVS, Openstack and Calico processes. I have observed many of these processes running on CPU reserved for virtual machines. It is important to note that most of these processes are actually related to networking. This is a wider problem that should be submitted to the Airship community.

My investigations have mainly revealed that a single process is basically responsible of 95% of the ping delays. This is the "neutron-sriov-nic-agent". This process, most of the times runs, on CPU 4 or 5. The virtual machine associated with this CPU is always getting hit very hard. A simple procedure to change the CPU affinity of this process improves the ping delays by 95%. No restart is needed. This is a runtime adjustment.

See the example summary (full printouts in attachment):

  • Worker 9 having issues (look at average and max round trip time)
    145 packets transmitted, 145 received, 0% packet loss, time 145123ms
    rtt min/avg/max/mdev = 0.130/294.980/1769.737/378.595 ms, pipe 2
  • Worker 9 runs on compute 6
  • Worker9 uses CPU 5
  • neutron-sriov-nic-agent running on CPU 5 on compute 6
  • Once the neutron-sriov-nic-agent process is assigned to a host reserved CPU, Worker 9 returns to normal (go check the printout in the example, it is evident when the process got re-assigned to another CPU):
    90 packets transmitted, 90 received, 0% packet loss, time 91010ms
    rtt min/avg/max/mdev = 0.163/2.103/115.250/12.729 ms
  • Ping results are not perfect but at least 95% better. Small pikes are observed from time to time. There must be other processes affecting the virtual machine.

As a short term solution, the next step would be to implement a startup script to fix this issue automatically when the neutron-sriov-nic-agent appears in the system. A long term solution would be a correction from Airship to insure no operating system based processes or Airship processes run on virtual machine reserved CPUs.

If you have any questions regarding this document, let me know.
courtesy: Claude

@nagajagan
Copy link
Author

@nagajagan
Copy link
Author

Not sure how this issues is closed by me, its a still open issues issues there is no progress on this.

@nagajagan nagajagan reopened this Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants