-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document Big-O Complexity of Common Methods #31
Comments
It's kind of intentional because it would be misleading. It's probably worth a blog-post-style article. All of the common methods have been optimized for speed. They're generally quite fast. The load factor is constant per object. So in theory many operations would be O(n) but only for very large n (like 1e100). Your memory constraints probably make 1e9 the high-end and up through there everything is quite fast. At 1e8, I might increase the load factor to 10,000 but benchmarking is the only way to really know. There's a load factor performance comparison and I generally recommend the load factor be the square-root of your "n". |
I suspected it wasn't accidental :) I see what you mean, and I've read the implementation notes (which are great). It looks like a very pragmatic approach. I still think complexities could be presented. You could just make it a more precise bound that includes the (low) constant factor, which will give an indication that "okay, it's linear, but grows slooowly". It's always nice to have at least something to on when looking through the API. At the moment I don't know if some methods are slow and something I should watch out for, without visiting the detailed implementation notes. |
For reference, which methods specifically did you wonder about? I don't think there's a good way to indicate the "low" constant factor. How could I estimate that? You might also find this thread interesting: geertj/pyskiplist#1 Geert has a really solid pure-Python skip-list implementation with better Big-O characteristics. But SortedContainers is faster even at scale. In some benchmarks it's even increasingly faster at scale. |
Well it wasn't really some specific method I was wondering about, I was just surprised to not see any complexities presented, since it's quite common for data structure docs to mention them, at least for the more common operations. But I understand the motivation for not presenting them. In my case, I'm using a When I said the "low" constant factor, I meant actually sitting down and analyzing the complexity of a method analytically, the same way you would do if you were to prove the complexity rigorously. So it wouldn't be an estimation. The result would be a closed expression for how the method grows, and your load factor (say In a proof you would conclude by simplifying the expression to say I didn't say it was going to be easy :) But yes, I understand this might be a lot of work for very little gain. It was just a suggestion. Feel free to close this wish. And thanks again for a nice implementation, it works very well for my uses! |
The complexities are not a problem of calculating the "low" constant factor. It doesn't really have that anyway. It's a problem of some computer operations being faster than others. We don't talk in theory about cache locality or branch mispredictions, etc. If I could somehow say that the tree-based implementations takes 200 "big" instructions (read 1-100 cycles each) and this list-based implementation takes 1000 "small" instructions (read 0.1-5 cycles each) then maybe it would convince folks but it feels wishy-washy. Hence the benchmarks. SortedContainers executes many more instructions but its CPI is much lower (I am actually guessing here. I have not checked CPU counters. But it is the only rationale I can come up with.) I could prove that it's doing more work but that wouldn't convince folks that it's fast. |
Very nice library!
I can't see the time complexity of any methods mentioned in the docs. Is there a reason for this? (misleading? obvious?). At least I would like to know the the theoretical complexity of the common operations, even if real life performance can only be measured through benchmarking.
The text was updated successfully, but these errors were encountered: