Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cross-compilation support for Scala-2.13.0-M5 #4

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Add cross-compilation support for Scala-2.13.0-M5 #4

wants to merge 3 commits into from

Conversation

dmit
Copy link

@dmit dmit commented Jan 24, 2019

Since the first Scala 2.13 Release Candidate is coming soon™ and brings with it the reworked collections library, I thought it would be interesting to add 2.13 as a cross target for this project. 2.12 remains the default.

When running benchmarks compiled with Scala 2.13, the newly introduced s.c.i.LazyList will be used instead of the deprecated s.c.i.Stream. In order to keep the diff small and avoid code duplication, this is done by simply aliasing Stream to LazyList for 2.13 builds. This means that the corresponding benchmarks will still be named "stream*".

If there is interest, I was also thinking about adding benchmarks for the new immutable array wrapper s.c.i.ArraySeq as well as cats.data.Chain, which advertises O(1) concat, O(1) append, and amortized O(1) uncons.

When running benchmarks compiled with Scala 2.13 s.c.i.LazyList will be
used instead of the deprecated s.c.i.Stream.
@fosskers
Copy link
Owner

Thank you! I will get this in as soon as I can.

@fosskers
Copy link
Owner

How does one invoke compilation with one or the other, again? I'd need to be able to generate the benchmark numbers for either version with minimal hassle.

@dmit
Copy link
Author

dmit commented Jan 24, 2019

Prepending a + to an sbt command performs it for all targets: sbt +clean +compile.

Double plus specifies a single Scala version to use: sbt "++2.13.0-M5 compile".

Since 2.12 is the default, there is no need to do anything extra if that is the version needed.

@dmit
Copy link
Author

dmit commented Jan 24, 2019

Added instructions on how to run these benchmarks on different versions to the README.

Also, I don't have access to hardware where I can run the whole benchmark suite without external interference, but here are the StreamBench results for 2.12.8 and 2.13.0-M5 on my i7-6700k desktop:

2.12.8:

Benchmark List IList Vector Array Stream EStream Iterator
Head 145.561 213.727 124.877 176.047 0.049 0.165 0.021
Max 171.946 249.249 227.556 202.556 763.730 1406.926 130.229
Reverse 32.279 29.436 146.453 35.448 321.679 312.288
Sort 205.785 366.869 251.272 252.490 1110.265

2.13.0-M5:

Benchmark List IList Vector Array LazyList EStream Iterator
Head 157.581 213.973 111.111 100.743 0.204 0.155 0.020
Max 173.365 250.070 129.923 139.667 1308.481 1388.734 123.437
Reverse 28.474 28.921 81.615 33.027 209.873 318.322
Sort 210.450 347.894 153.390 126.459 1440.950

Looks like Vector got quite a bit faster, LazyList is slower than Stream that it's replacing (although the semantics are different), Array somehow got faster (?), and the rest are mostly the same as before.

@fosskers
Copy link
Owner

fosskers commented Jan 24, 2019

Thanks! I'll run these on my own machine too, and see what we see. It's good to see that Vector is faster - I've been quite against that data structure in general (i.e. I wasn't convinced it had a use-case).

@dmit
Copy link
Author

dmit commented Jan 24, 2019

I think the main case for Vector in Scala <=2.12 was random access, for sufficiently large numbers of N. Especially if the vector was shared among execution actors, where the immutable nature of the data structure was paramount.

Looks like in 2.13 that use case is even more viable.

@fosskers
Copy link
Owner

Just as much could be accomplished with Array (with an immutable wrapper, if one wanted), couldn't it?

@dmit
Copy link
Author

dmit commented Jan 25, 2019

Absolutely, if access is read-only. But if each thread/execution context needs to make minor changes to its copy of the collection, then arrays quickly get too expensive memory-wise. Speaking of which, don't forget that List also takes about twice as much memory as a Vector.

That's my intuition for Vector's usefulness - shared collections that have thousands of elements or more, and allow localized changes without copying the whole thing. Of course now that I've written out all those caveats, it seems that the actual real-life use cases for Vector are pretty rare. When you get far enough to worry about performance characteristics of Vector, you probably just want to implement a custom collection type that best meets your needs (and is probably based on Array).

I think we both agree that if RAM usage is not a concern, contiguous chunks of memory + memcpy is a great approach on modern hardware.

@fosskers
Copy link
Owner

fosskers commented Apr 30, 2019

I updated a few things on master. I'll hijack this PR and get it merged. I also need to figure out a nice layout for posting the various results for each Scala version in a way that's easy to visually parse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants