Document Big-O Complexity of Common Methods #31

estan · 2015-07-17T08:09:08Z

Very nice library!

I can't see the time complexity of any methods mentioned in the docs. Is there a reason for this? (misleading? obvious?). At least I would like to know the the theoretical complexity of the common operations, even if real life performance can only be measured through benchmarking.

grantjenks · 2015-07-17T15:58:38Z

It's kind of intentional because it would be misleading. It's probably worth a blog-post-style article.

All of the common methods have been optimized for speed. They're generally quite fast.

The load factor is constant per object. So in theory many operations would be O(n) but only for very large n (like 1e100). Your memory constraints probably make 1e9 the high-end and up through there everything is quite fast. At 1e8, I might increase the load factor to 10,000 but benchmarking is the only way to really know. There's a load factor performance comparison and I generally recommend the load factor be the square-root of your "n".

estan · 2015-07-17T18:37:20Z

I suspected it wasn't accidental :)

I see what you mean, and I've read the implementation notes (which are great). It looks like a very pragmatic approach. I still think complexities could be presented. You could just make it a more precise bound that includes the (low) constant factor, which will give an indication that "okay, it's linear, but grows slooowly".

It's always nice to have at least something to on when looking through the API. At the moment I don't know if some methods are slow and something I should watch out for, without visiting the detailed implementation notes.

grantjenks · 2015-07-20T17:52:11Z

For reference, which methods specifically did you wonder about?

I don't think there's a good way to indicate the "low" constant factor. How could I estimate that?

You might also find this thread interesting: geertj/pyskiplist#1 Geert has a really solid pure-Python skip-list implementation with better Big-O characteristics. But SortedContainers is faster even at scale. In some benchmarks it's even increasingly faster at scale.

estan · 2015-07-25T05:53:41Z

Well it wasn't really some specific method I was wondering about, I was just surprised to not see any complexities presented, since it's quite common for data structure docs to mention them, at least for the more common operations. But I understand the motivation for not presenting them.

In my case, I'm using a SortedListWithKey as a receive buffer for UDP packets that arrive out of order, and make use of add, bisect_key_left (often) and del L[i:j] (slightly less often).

When I said the "low" constant factor, I meant actually sitting down and analyzing the complexity of a method analytically, the same way you would do if you were to prove the complexity rigorously. So it wouldn't be an estimation. The result would be a closed expression for how the method grows, and your load factor (say k) would figure in it.

In a proof you would conclude by simplifying the expression to say O(n), but in your case, you could keep it more precise, so that it is clear that the linearity is not a "problem" for reasonable n.

I didn't say it was going to be easy :) But yes, I understand this might be a lot of work for very little gain. It was just a suggestion. Feel free to close this wish. And thanks again for a nice implementation, it works very well for my uses!

grantjenks · 2015-07-25T18:56:05Z

add is very fast. bisect_key_left is fast but you'll pay for indexing (which is also fast but not very fast). And del L[i:j] will depend a lot on the size of the range. The delete operation could probably be made faster depending on your use case. If you delete a large enough chunk, it'll actually rebuild the whole list (which is a fast-path). It was intended to make del L[:] very fast.

The complexities are not a problem of calculating the "low" constant factor. It doesn't really have that anyway. It's a problem of some computer operations being faster than others. We don't talk in theory about cache locality or branch mispredictions, etc. If I could somehow say that the tree-based implementations takes 200 "big" instructions (read 1-100 cycles each) and this list-based implementation takes 1000 "small" instructions (read 0.1-5 cycles each) then maybe it would convince folks but it feels wishy-washy.

Hence the benchmarks. SortedContainers executes many more instructions but its CPI is much lower (I am actually guessing here. I have not checked CPU counters. But it is the only rationale I can come up with.) I could prove that it's doing more work but that wouldn't convince folks that it's fast.

grantjenks changed the title ~~Document complexity of common methods~~ Document Big-O Complexity of Common Methods Aug 31, 2015

grantjenks closed this as completed Sep 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Big-O Complexity of Common Methods #31

Document Big-O Complexity of Common Methods #31

estan commented Jul 17, 2015

grantjenks commented Jul 17, 2015

estan commented Jul 17, 2015

grantjenks commented Jul 20, 2015

estan commented Jul 25, 2015

grantjenks commented Jul 25, 2015

Document Big-O Complexity of Common Methods #31

Document Big-O Complexity of Common Methods #31

Comments

estan commented Jul 17, 2015

grantjenks commented Jul 17, 2015

estan commented Jul 17, 2015

grantjenks commented Jul 20, 2015

estan commented Jul 25, 2015

grantjenks commented Jul 25, 2015