Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Big-O Complexity of Common Methods #31

Closed
estan opened this issue Jul 17, 2015 · 5 comments
Closed

Document Big-O Complexity of Common Methods #31

estan opened this issue Jul 17, 2015 · 5 comments

Comments

@estan
Copy link

estan commented Jul 17, 2015

Very nice library!

I can't see the time complexity of any methods mentioned in the docs. Is there a reason for this? (misleading? obvious?). At least I would like to know the the theoretical complexity of the common operations, even if real life performance can only be measured through benchmarking.

@grantjenks
Copy link
Owner

It's kind of intentional because it would be misleading. It's probably worth a blog-post-style article.

All of the common methods have been optimized for speed. They're generally quite fast.

The load factor is constant per object. So in theory many operations would be O(n) but only for very large n (like 1e100). Your memory constraints probably make 1e9 the high-end and up through there everything is quite fast. At 1e8, I might increase the load factor to 10,000 but benchmarking is the only way to really know. There's a load factor performance comparison and I generally recommend the load factor be the square-root of your "n".

@estan
Copy link
Author

estan commented Jul 17, 2015

I suspected it wasn't accidental :)

I see what you mean, and I've read the implementation notes (which are great). It looks like a very pragmatic approach. I still think complexities could be presented. You could just make it a more precise bound that includes the (low) constant factor, which will give an indication that "okay, it's linear, but grows slooowly".

It's always nice to have at least something to on when looking through the API. At the moment I don't know if some methods are slow and something I should watch out for, without visiting the detailed implementation notes.

@grantjenks
Copy link
Owner

For reference, which methods specifically did you wonder about?

I don't think there's a good way to indicate the "low" constant factor. How could I estimate that?

You might also find this thread interesting: geertj/pyskiplist#1 Geert has a really solid pure-Python skip-list implementation with better Big-O characteristics. But SortedContainers is faster even at scale. In some benchmarks it's even increasingly faster at scale.

@estan
Copy link
Author

estan commented Jul 25, 2015

Well it wasn't really some specific method I was wondering about, I was just surprised to not see any complexities presented, since it's quite common for data structure docs to mention them, at least for the more common operations. But I understand the motivation for not presenting them.

In my case, I'm using a SortedListWithKey as a receive buffer for UDP packets that arrive out of order, and make use of add, bisect_key_left (often) and del L[i:j] (slightly less often).

When I said the "low" constant factor, I meant actually sitting down and analyzing the complexity of a method analytically, the same way you would do if you were to prove the complexity rigorously. So it wouldn't be an estimation. The result would be a closed expression for how the method grows, and your load factor (say k) would figure in it.

In a proof you would conclude by simplifying the expression to say O(n), but in your case, you could keep it more precise, so that it is clear that the linearity is not a "problem" for reasonable n.

I didn't say it was going to be easy :) But yes, I understand this might be a lot of work for very little gain. It was just a suggestion. Feel free to close this wish. And thanks again for a nice implementation, it works very well for my uses!

@grantjenks
Copy link
Owner

add is very fast. bisect_key_left is fast but you'll pay for indexing (which is also fast but not very fast). And del L[i:j] will depend a lot on the size of the range. The delete operation could probably be made faster depending on your use case. If you delete a large enough chunk, it'll actually rebuild the whole list (which is a fast-path). It was intended to make del L[:] very fast.

The complexities are not a problem of calculating the "low" constant factor. It doesn't really have that anyway. It's a problem of some computer operations being faster than others. We don't talk in theory about cache locality or branch mispredictions, etc. If I could somehow say that the tree-based implementations takes 200 "big" instructions (read 1-100 cycles each) and this list-based implementation takes 1000 "small" instructions (read 0.1-5 cycles each) then maybe it would convince folks but it feels wishy-washy.

Hence the benchmarks. SortedContainers executes many more instructions but its CPI is much lower (I am actually guessing here. I have not checked CPU counters. But it is the only rationale I can come up with.) I could prove that it's doing more work but that wouldn't convince folks that it's fast.

@grantjenks grantjenks changed the title Document complexity of common methods Document Big-O Complexity of Common Methods Aug 31, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants