-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
README.md: New combined Transmit & Receive design #9
Conversation
Is having no alignment at all a good idea? We've found that aligning packets by cache line helps a lot on modern CPUs (only 3% space overhead in a real-world trace). However, our benchmark is completely synthetic (except for packet sizes) and it's just a simple pcap filter matching on various header fields. https://pam2018.inet.berlin/wp-content/uploads/2018/03/pam18poster-paper8.pdf |
Great link @emmericp! I love the work that you guys are doing :). This actually touches on a fundamental goal of the EasyNIC approach that I have not properly articulated yet. I want to focus on optimizing for the general case and to resist optimizing for a collection of special cases. I want to build network equipment with robust performance that is not too sensitive to configurations and workloads. I'm willing to sacrifice peak performance on a few specific workloads to achieve this goal. Here is the kind of hazard that I see:
Then a few years later the workload changes. Now your packets are all VXLAN encapsulated and the window of payload you care about has shifted to include the next cache line. 64 bytes is not the optimal alignment anymore. Oh no! What do we do?
Phew! So we managed to defend the advertised performance of our application, but in the process we had to make all of the NICs more complicated, and we also had to make the application and drivers more complicated to support the diverse approaches taken on the NICs. Then a few years later the workload changes again. Now we are deploying on a mobile network and we need GTP-U encap. Oh no! This market is not important enough for the hardware vendors to support in silicon. Now we need to rearchitect our application to break away from that precious assumption about how our buffers are aligned... or tell our users to buy extra hardware before enabling GTP-U in their config file. This is the point where we might wish that we could turn back time and optimize for the most general case from the beginning. Then we would not have needed all these complicated hardware and driver and software features. So the next step would be to design a simplified hardware interface that eschews special cases, and that is what leads us to EasyNIC :). I like the way Juho Snellman framed this in his lessons learned in production talk:
|
One more example from the real world is OpenStack Networking in the Atlanta/Paris era. The kernel people are all showing slides with 10Gbps throughput and 3% CPU usage. The users are all showing slides with 1.3Gbps throughput and 100% CPU usage. What's the difference? The kernel people have all optimized for the case where everything is offloaded onto the NIC but the users have all enabled a feature that the current generation of NICs doesn't support (VXLAN.) So the users are all hating life, and waiting for the next generation of NICs to save them. Meanwhile the kernel hackers don't even realize this is a problem and a couple of years later when they appreciate what's happening they say "oh, dude, we could just do the LRO/TSO with a software fallback and then we'd get great performance with any card. Sorry I was too busy messing with offloads to understand the problem you were having." |
My natural inclination is that some alignment would be a good thing. One-byte alignment (i.e. no alignment) makes the specification very simple, but potentially at the cost of making both software (i.e. drivers) and hardware (i.e. the NIC) more complex. Some natural considerations for alignment are:
|
Yeah, as I've said the benchmark is completely synthetic and very specific to that use case. I'm also not saying that cache line alignment is necesarily a good idea -- i was just surprised how big the effect was and how small the space overhead on real traffic was. Also, I strongly believe in benchmarking stuff before optimizing stuff. |
Thanks @corsix and @emmericp for the detailed feedback. I pushed commit 9b50d34 to articulate that this transmit/receive interface is essentially a high-speed serial port. The NIC acts like a modem to translate a continuous stream of ones and zeros between host memory and the network. The framing information is plain in-band data and has no special treatment from a DMA perspective e.g. the device could always do 4KB PCIe transactions and only needs to make sure that the packet cursor never points to a partially-written packet. I reckon that the lack of alignment options makes sense in the specific context of this "high speed serial port" design. However these points you guys raise may be reasons to doubt that this model is the right one. I am keen to support multiple CPUs to cooperate on processing traffic without relying on the NIC to do sharding. On the serial port design this will require some thought about alignment and the MESIF state machine. For example on transmit this design might make it overly complicated to prevent two cores from writing to the same cache line and needing to synchronously ping-pong that between their L1 caches. So this branch's interface is really the beginning and not the end..... and if we want to do e.g. low-latency I reckon that will need to be a separate interface too. |
Linking #11 about likely feature creep for efficiently supporting multiple CPUs. |
I'll merge this one now and we can use new PRs to discuss adding alignment rules. I have some ideas for accommodating this but one step at a time. |
Here is an idea for a new combined Transmit & Receive interface. Based on feedback & discussions on the issues.
What do we reckon?