Simplified implementations of Array.foreach and Array.map improve performance by 2–3.5× #23649

2Pit · 2025-08-01T15:20:08Z

2Pit
Aug 1, 2025

Hi!

I've been benchmarking performance of Array.map and Array.foreach in Scala and found that their implementations can be made significantly faster — map by up to 3.5×, and foreach by up to 2× — by simplifying the loop structure in cases where the function operates on statically known primitive types.

The current implementation relies on a match over array types, introducing overhead and making performance less predictable in cases chained transformations. In contrast, a plain loop yields consistent scaling and significantly lower latency per element.

Here’s how the current map method can be simplified:

// current version
def oldMap[B](f: A => B)(implicit ct: ClassTag[B]): Array[B] = {
  val len = xs.length
  val ys = new Array[B](len)
  if (len > 0) {
    var i = 0
    (xs: Any @unchecked) match {
      case xs: Array[Int] => while (i < len) { ys(i) = f(xs(i).asInstanceOf[A]); i += 1 }
      case xs: Array[Long] => while (i < len) { ys(i) = f(xs(i).asInstanceOf[A]); i += 1 }
      // other primitive types...
    }
  }
  ys
}

// proposed version
def newMap[B](f: A => B)(implicit ct: ClassTag[B]): Array[B] = {
  val len = xs.length
  val ys = new Array[B](len)
  var i = 0
  while (i < len) {
    ys(i) = f(xs(i))
    i += 1
  }
  ys
}

You can find the full analysis with plots and measurements below.
Would this kind of improvement be something you'd like to see in the standard library? I'd be happy to contribute a PR.

📊 https://github.com/2Pit/scala-benchmarks/blob/main/review/array_map_foreach/README.md

lrytz · 2025-08-05T14:04:19Z

lrytz
Aug 5, 2025
Maintainer

That's certainly interesting. The source code of the standard library (also the one used in Scala 3, for now) is over at https://github.com/scala/scala (Scala 2), there's also some existing infra for writing benchmarks (https://github.com/scala/scala/blob/2.13.x/test/benchmarks/README.md).

It seems to me the old code is trying to prevent primitive boxing / unboxing, I'd be interested to check in detail (in bytecode) if that wasn't actually helpful. It's possible that the situation is different on Scala 2 because Scala 3 doesn't have function specialization (right?).

1 reply

sjrd Aug 9, 2025
Maintainer

It seems to me the old code is trying to prevent primitive boxing / unboxing, I'd be interested to check in detail (in bytecode) if that wasn't actually helpful. It's possible that the situation is different on Scala 2 because Scala 3 doesn't have function specialization (right?).

More importantly, it avoids performing the dispatch from
https://github.com/scala/scala/blob/177b7453e142d3e0b9e5c4a2dc32736e1272c548/src/library/scala/runtime/ScalaRunTime.scala#L56-L69
in every iteration of the loop.

2Pit · 2025-08-09T15:25:34Z

2Pit
Aug 9, 2025
Author

@lrytz I didn’t notice any significant differences in the bytecode between Scala 2 and Scala 3 for the same code. I think the main difference lies in the implementation of map: the old version produce fairly large bytecode with type-based branching, and according to the JIT logs, the compiler occasionally fails to inline this method (hot method too big). This might prevent boxing elimination and other optimizations inside the loop.

Thank you for the benchmark links. It was interesting to go through them. It seems to me that, for example

@Benchmark def mapIntInt(bh: Blackhole): Unit =
bh.consume(integersA.map(x => 0))

@Benchmark def mapIntString(bh: Blackhole): Unit =
bh.consume(integersA.map(x => ""))

use rather simple lambdas that the JIT will most likely optimize aggressively. In real-world scenarios, where the functions inside map are more complex, the compiler’s behavior could be different.

0 replies

2Pit · 2025-08-09T15:42:31Z

2Pit
Aug 9, 2025
Author

Here’s a more detailed analysis with proof:

Bytecode in benchmarks.
I analyzed the bytecode for Scala 2 and Scala 3 and found no significant differences at the call site of map.
When passing a method to map, the compiler generates a specialized lambda apply$mcII$sp (int -> int), which avoids boxing at the call site.
When passing a regular lambda, it compiles to Function1<Object, Object>, and boxing occurs during the invocation.

// method -> specialized lambda (identical for oldMap/newMap)
bh.consume(ArrayOpsCompat.ArrayOpsExt$.MODULE$.newMap$extension(
  ArrayOpsCompat$.MODULE$.ArrayOpsExt(this.array()),
  (JFunction1.mcII.sp)(x) -> this.heavy(x),
  scala.reflect.ClassTag$.MODULE$.Int()
));

// regular lambda -> (Object -> Object)
bh.consume(ArrayOpsCompat.ArrayOpsExt$.MODULE$.newMap$extension(
  ArrayOpsCompat$.MODULE$.ArrayOpsExt(this.array()),
  (Object x) -> this.genericHeavy(x),
  scala.reflect.ClassTag$.MODULE$.Object()
));

Bytecode inside method implementations.
The differences appear inside the implementations of oldMap$extension and newMap$extension.
The old version oldMap contains a type switch for all primitive and reference array types.
In the int[] branch, primitives are explicitly boxed via BoxesRunTime.boxToInteger before calling f.apply(...). This large branching structure makes the method big and harder for the JIT to optimize.
The new version newMap is linear: a single loop without type matching, minimal extra code.

// oldMap
if ($this instanceof int[]) {
    for (int[] arr = (int[]) $this; i < len; ++i) {
        ys[i] = f.apply(BoxesRunTime.boxToInteger(arr[i]));
    }
} else if ($this instanceof double[]) {
    // branch for double[]
}
// ... other primitive array branches ...

// newMap
for (int i = 0; i < len; i++) {
    ys[i] = f.apply(xs[i]);
}

JIT log summary.
DISCLAIMER: I'm inspecting the log manually, so it is possible that can miss something.

Analysis of the JIT logs shows that specialized lambda calls (apply$mcII$sp) were inlined in all cases.
The key difference is that oldMap sometimes fails to be fully inlined due to JIT policy/size limits (see below). This likely prevents the JIT from performing further optimizations, such as eliminating boxing inside the loop.
In contrast, newMap is smaller, allowing the JIT to inline it completely and apply all possible optimizations.

<!-- context (C2): call to benchmarks.ArrayOpsCompat$ArrayOpsExt$.oldMap$extension -->
<method id='1455' holder='1411' name='oldMap$extension' return='1222' arguments='1222 1453 1454' flags='17' bytes='599' compile_id='781' compiler='c2' level='4' iicount='696'/>
<call method='1455' instr='invokevirtual'/>
<inline_fail reason='inlining prohibited by policy'/>

<!-- context (C1): call to benchmarks.ArrayOpsCompat$ArrayOpsExt$.oldMap$extension -->
<method id='1444' holder='1398' name='oldMap$extension' return='1222' arguments='1222 1442 1443' flags='17' bytes='599' compile_id='767' compiler='c1' level='3' iicount='406'/>
<call method='1444' instr='invokevirtual'/>
<inline_fail reason='callee is too large'/>

<!-- context: call to benchmarks.ArrayOpsCompat$ArrayOpsExt$.newMap$extension -->
<method id='1421' holder='1409' name='newMap$extension' return='1222' arguments='1222 1419 1420' flags='17' bytes='63' compile_id='775' compiler='c2' level='4' iicount='636'/>
<call method='1421' count='113050' prof_factor='1.000000' inline='1'/>
<inline_success reason='inline (hot)'/>

<!-- context: specialized lambda inlining -->
<method id='1507' holder='1424' name='apply$mcII$sp' return='1215' arguments='1215' flags='1' bytes='9' compile_id='753' compiler='c1' level='3' iicount='28090'/>
<call method='1507' count='27579' prof_factor='1.000000' inline='1'/>
<inline_success reason='inline (hot)'/>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplified implementations of Array.foreach and Array.map improve performance by 2–3.5× #23649

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Simplified implementations of Array.foreach and Array.map improve performance by 2–3.5× #23649

Uh oh!

2Pit Aug 1, 2025

Replies: 3 comments · 1 reply

Uh oh!

lrytz Aug 5, 2025 Maintainer

Uh oh!

Uh oh!

sjrd Aug 9, 2025 Maintainer

Uh oh!

2Pit Aug 9, 2025 Author

Uh oh!

2Pit Aug 9, 2025 Author

2Pit
Aug 1, 2025

Replies: 3 comments 1 reply

lrytz
Aug 5, 2025
Maintainer

sjrd Aug 9, 2025
Maintainer

2Pit
Aug 9, 2025
Author

2Pit
Aug 9, 2025
Author