Skip to content

Commit 7d1ad95

Browse files
committed
Add a guide on programming with collections generically
1 parent d8949a9 commit 7d1ad95

File tree

3 files changed

+327
-3
lines changed

3 files changed

+327
-3
lines changed

_data/overviews.yml

+6-1
Original file line numberDiff line numberDiff line change
@@ -47,11 +47,16 @@
4747
url: "core/architecture-of-scala-213-collections.html"
4848
by: Julien Richard-Foy
4949
description: "These pages describe the architecture of the collections framework introduced in Scala 2.13. Compared to the Collections API you will find out more about the internal workings of the framework."
50-
- title: Custom Collection Types
50+
- title: Implementing a Custom Collection (Scala 2.13)
5151
icon: building
5252
url: "core/custom-collections.html"
5353
by: Martin Odersky, Lex Spoon and Julien Richard-Foy
5454
description: "In this document you will learn how the collections framework helps you define your own collections in a few lines of code, while reusing the overwhelming part of collection functionality from the framework."
55+
- title: Programming With Collections Generically (Scala 2.13)
56+
icon: building
57+
url: "core/programming-collections-generically.html"
58+
by: Julien Richard-Foy
59+
description: "By “generically” we mean the ability to write operations that abstract over collection types. This guide shows how to write operations that can be applied to any collection type and return the same collection type, and how to write operations that can be parameterized by the type of collection to build."
5560

5661
- category: Language
5762
description: "Guides and overviews covering features in the Scala language."

_overviews/core/custom-collections.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
---
22
layout: singlepage-overview
3-
title: Custom Collection Types
4-
3+
title: Implementing a Custom Collection (Scala 2.13)
54
permalink: /overviews/core/:title.html
65
---
76

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
---
2+
layout: singlepage-overview
3+
title: Programming With Collections Generically (Scala 2.13)
4+
permalink: /overviews/core/:title.html
5+
---
6+
7+
**Julien Richard-Foy**
8+
9+
This guide shows how to write operations that can be applied to any collection type and return the same
10+
collection type, and how to write operations that can be parameterized by the type of collection to build.
11+
It is recommended to first read the article about the
12+
[architecture of the collections]({{ site.baseurl }}/overviews/core/architecture-of-scala-213-collections.html).
13+
14+
The following sections present how to **consume**, **produce** and **transform** any collection type.
15+
16+
## Consuming any collection
17+
18+
Several solutions can apply, depending on the desired level of genericity.
19+
20+
### Consuming any *actual* collection
21+
22+
Let’s start with the simplest case: consuming any collection that can be traversed.
23+
You don’t need to know the precise type of the collection,
24+
but just that it *is* a collection. This can be achieved by taking an `IterableOnce[A]`
25+
as parameter, or an `Iterable[A]` if you need more than one traversals. Here is an
26+
example that shows how to implement a `sumBy` operation that sums the elements of a
27+
collection after they have been transformed by a function:
28+
29+
~~~ scala
30+
implicit class SumByOperation[A](coll: IterableOnce[A]) {
31+
def sumBy[B](f: A => B)(implicit num: Numeric[B]): B = {
32+
val it = coll.iterator
33+
var result = f(it.next())
34+
while (it.hasNext()) {
35+
result = num.plus(result, it.next())
36+
}
37+
result
38+
}
39+
}
40+
~~~
41+
42+
We define the `sumBy` operation as an [implicit class](/overviews/core/implicit-classes.html) method so that
43+
it can be applied to a collection as if it was a method of that collection:
44+
45+
~~~ scala
46+
case class User(name: String, age: Int)
47+
48+
val users = Seq(User("Alice", 22), User("Bob", 20))
49+
50+
println(users.sumBy(_.age)) // “42”
51+
~~~
52+
53+
Unfortunately, this extension method does not work with values of type `String` and not
54+
even with `Array`. This is because these types are not part of the Scala collections
55+
hierarchy.
56+
57+
### Consuming any type that is *like* a collection
58+
59+
If we want the `sumBy` to work on any type that is *like* a collection, such as `String`
60+
and `Array`, we have to add another indirection level:
61+
62+
~~~ scala
63+
import scala.collection.generic.IsIterableLike
64+
65+
class SumByOperation[A](coll: IterableOnce[A]) {
66+
def sumBy[B](f: A => B)(implicit num: Numeric[B]): B = ... // same as before
67+
}
68+
69+
implicit def SumByOperation[Repr](coll: Repr)(implicit it: IsIterableLike[Repr]): SumByOperation[it.A] =
70+
new SumByOperation[it.A](it(coll))
71+
~~~
72+
73+
The type `IsIterableLike[Repr]` has implicit instances for all types `Repr` that can be converted
74+
to `IterableOps[A, Iterable, C]` (for some element type `A` and some collection type `C`). There are
75+
instances for actual collection types and also for `String` and `Array`.
76+
77+
### Consuming a more specific collection than `Iterable`
78+
79+
In some cases we want (or need) the receiver of the operation to be more specific than `Iterable`.
80+
For instance, some operations make sense only on `Seq` but not on `Set`.
81+
82+
In such a case, again, the most straightforward solution would be to take as parameter a `Seq` instead
83+
of an `Iterable` or an `IterableOnce`, but this would work only with *actual* `Seq` values. If you want
84+
to support `String` and `Array` values you have to use `IsSeqLike` instead. `IsSeqLike` is similar to
85+
`IsIterableLike` but provides a conversion to `SeqOps[A, Iterable, C]` (for some types `A` and `C`).
86+
87+
Using `IsSeqLike` is also required to make your operation work on `SeqView` values, because `SeqView`
88+
does not extend `Seq`. Similarly, there is an `IsMapLike` type that makes operations work with
89+
both `Map` and `MapView` values.
90+
91+
## Producing any collection
92+
93+
This situation happens when a library provides an operation that produces a collection while leaving out the
94+
choice of the precise collection type to the user.
95+
96+
For instance, consider a type class `Gen[A]`, whose instances define how to produce values of type `A`.
97+
Such a type class is typically used to create arbitrary test data.
98+
99+
A very basic definition of `Gen[A]` could be the following:
100+
101+
~~~ scala
102+
trait Gen[A] {
103+
/** Get a generated value of type `A` */
104+
def get: A
105+
}
106+
~~~
107+
108+
And the following instances can be defined:
109+
110+
~~~ scala
111+
import scala.util.Random
112+
113+
object Gen {
114+
115+
/** Generator of `Int` values */
116+
implicit def int: Gen[Int] =
117+
new Gen[Int] { def get: Int = Random.nextInt() }
118+
119+
/** Generator of `Boolean` values */
120+
implicit def boolean: Gen[Boolean] =
121+
new Gen[Boolean] { def get: Boolean = Random.nextBoolean() }
122+
123+
/** Given a generator of `A` values, provides a generator of `List[A]` values */
124+
implicit def list[A](implicit genA: Gen[A]): Gen[List[A]] =
125+
new Gen[List[A]] {
126+
def get: List[A] =
127+
if (Random.nextInt(100) < 10) Nil
128+
else genA.get :: get
129+
}
130+
131+
}
132+
~~~
133+
134+
The last definition (`list`) generates a value of type `List[A]` given a generator
135+
of values of type `A`. We could implement a generator of `Vector[A]` or `Set[A]` as
136+
well, but their implementations would be very similar.
137+
138+
Instead, we want to abstract over the type of the generated collection so that users
139+
can decide which collection type they want to produce.
140+
141+
To achieve that we have to use `scala.collection.Factory`:
142+
143+
~~~ scala
144+
trait Factory[-A, +C] {
145+
146+
/** @return A collection of type `C` containing the same elements
147+
* as the source collection `it`.
148+
* @param it Source collection
149+
*/
150+
def fromSpecific(it: IterableOnce[A]): C
151+
152+
/** Get a Builder for the collection. For non-strict collection
153+
* types this will use an intermediate buffer.
154+
* Building collections with `fromSpecific` is preferred
155+
* because it can be lazy for lazy collections.
156+
*/
157+
def newBuilder: Builder[A, C]
158+
}
159+
~~~
160+
161+
The `Factory[A, C]` trait provides two ways of building a collection `C` from
162+
elements of type `A`:
163+
164+
- `fromSpecific`, converts a source collection of `A` to a collection `C`,
165+
- `newBuilder`, provides a `Builder[A, C]`.
166+
167+
The difference between these two methods is that the former does not necessarily
168+
evaluate the elements of the source collection. It can produce a non-strict
169+
collection type (such as `LazyList`) that does not evaluate its elements unless
170+
it is traversed. On the other hand, the builder-based way of constructing the
171+
collection necessarily evaluates the elements of the resulting collection.
172+
In practice, it is recommended to not eagerly evaluate the elements of the collection.
173+
174+
Finally, here is how we can implement a generator of arbitrary collection types:
175+
176+
~~~ scala
177+
import scala.collection.Factory
178+
179+
implicit def collection[CC[_], A](implicit
180+
genA: Gen[A],
181+
factory: Factory[A, CC[A]]
182+
): Gen[CC[A]] =
183+
new Gen[CC[A]] {
184+
def get: CC[A] = {
185+
val lazyElements =
186+
LazyList.unfold(()) { _ =>
187+
if (Random.nextInt(100) < 10) None
188+
else Some((genA.get, ()))
189+
}
190+
factory.fromSpecific(lazyElements)
191+
}
192+
}
193+
~~~
194+
195+
The implementation uses a lazy source collection of a random size (`lazyElements`).
196+
Then it calls the `fromSpecific` method of the `Factory` to build the collection
197+
expected by the user.
198+
199+
Here is an example of use of `collection`:
200+
201+
~~~
202+
scala> collection[List, Int].get
203+
res0: List[Int] = List(606179450, -1479909815, 2107368132, 332900044, 1833159330, -406467525, 646515139, -575698977, -784473478, -1663770602)
204+
205+
scala> collection[LazyList, Boolean].get
206+
res1: LazyList[Boolean] = LazyList(_, ?)
207+
208+
scala> collection[Set, Int].get
209+
res2: Set[Int] = HashSet(-1775377531, -1376640531, -1009522404, 526943297, 1431886606, -1486861391)
210+
~~~
211+
212+
## Transforming any collection
213+
214+
In this section we will see how we can implement an `intersperse` operation that can be applied to
215+
any sequence and returns a sequence with a new element inserted between each element of the
216+
source sequence.
217+
218+
Transforming collections consists in both consuming and producing collections. This is achieved by
219+
combining the techniques described in the previous sections. We can already start hacking with
220+
something like the following:
221+
222+
~~~ scala
223+
import scala.collection.{ AbstractIterator, AbstractView, Factory, SeqOps }
224+
import scala.collection.generic.IsSeqLike
225+
226+
class IntersperseOperation[A](seqOps: SeqOps[A, Iterable, _]) {
227+
def intersperse[B >: A, That](sep: B)(implicit factory: Factory[B, That]): That =
228+
factory.fromSpecific(new AbstractView[B] {
229+
def iterator = new AbstractIterator[B] {
230+
val it = seqOps.iterator
231+
var intersperseNext = false
232+
def hasNext = intersperseNext || it.hasNext
233+
def next() = {
234+
val elem = if (intersperseNext) sep else it.next()
235+
intersperseNext = !intersperseNext && it.hasNext
236+
elem
237+
}
238+
}
239+
})
240+
}
241+
242+
implicit def IntersperseOperation[Repr](coll: Repr)(implicit seq: IsSeqLike[Repr]): IntersperseOperation[seq.A] =
243+
new IntersperseOperation(seq(coll))
244+
~~~
245+
246+
However, if we try it we get the following behaviour:
247+
248+
~~~
249+
scala> List(1, 2, 3).intersperse(0)
250+
res0: Array[Int] = Array(1, 0, 2, 0, 3)
251+
~~~
252+
253+
We get back an `Array` although the source collection was a `List`! Indeed, there is
254+
nothing that constrains the result type of `interseprse` to depend on the receiver type.
255+
The following expressions show the behavior we expect:
256+
257+
~~~ scala
258+
List(1, 2, 3).intersperse(0) == List(1, 0, 2, 0, 3)
259+
"foo".intersperse(' ') == "f o o"
260+
~~~
261+
262+
When we call it on a `List`, we want to get back another `List`, and when we call it on
263+
a `String` we want to get back another `String`, and so on.
264+
265+
To produce a collection whose type depend on a source collection, we have to use
266+
`scala.collection.BuildFrom` (formerly `CanBuildFrom`) instead of `Factory`.
267+
`BuildFrom` is defined as follows:
268+
269+
~~~ scala
270+
trait BuildFrom[-From, -A, +C] {
271+
/** @return A collection of type `C` containing the same elements as the source collection `it`. */
272+
def fromSpecificIterable(from: From)(it: Iterable[A]): C
273+
274+
/** Get a Builder for the collection type `C` */
275+
def newBuilder(from: From): Builder[A, C]
276+
}
277+
~~~
278+
279+
`BuildFrom` has similar operations to `Factory`, but they take an additional `from`
280+
parameter. Before explaining how implicit instances of `BuildFrom` are resolved, let’s first have
281+
a look at how you can use it. Here is the implementation of `intersperse` based on `BuildFrom`:
282+
283+
~~~ scala
284+
import scala.collection.{ AbstractView, BuildFrom }
285+
import scala.collection.generic.IsSeqLike
286+
287+
class IntersperseOperation[Repr, S <: IsSeqLike[Repr](coll: Repr, seq: S) {
288+
def intersperse[B >: seq.A, That](sep: B)(implicit bf: BuildFrom[Repr, B, That]): That = {
289+
val seqOps = seq(repr)
290+
bf.fromSpecific(coll)(new AbstractView[B] {
291+
// same as before
292+
})
293+
}
294+
}
295+
296+
implicit def IntersperseOperation[Repr](coll: Repr)(implicit seq: IsSeqLike[Repr]): IntersperseOperation[Repr, seq.type] =
297+
new IntersperseOperation(coll)(seq)
298+
~~~
299+
300+
Note that we track the type of the receiver collection `Repr` in the `IntersperseOperation`
301+
class. Now, consider what happens when we write the following expression:
302+
303+
~~~ scala
304+
List(1, 2, 3).intersperse(0)
305+
~~~
306+
307+
An implicit parameter of type `BuildFrom[Repr, B, That]` has to be resolved by the compiler.
308+
The type `Repr` is constrained by the receiver type (here, `List[Int]`) and the type `B` is
309+
inferred by the value passed as a separator (here, `Int`). Finally, the type of the collection
310+
to produce, `That` is fixed by the resolution of the `BuildFrom` parameter. In our case,
311+
there is a `BuildFrom[List[Int], Int, List[Int]]` instance that fixes the result type to
312+
be `List[Int]`.
313+
314+
## Summary
315+
316+
- To consume any collection, take an `IterableOnce` (or something more specific) as parameter,
317+
- To also support `String`, `Array` and `View`, use `IsIterableLike`,
318+
- To produce a collection given its type, use a `Factory`,
319+
- To produce a collection based on the type of a source collection and the type of elements of the collection
320+
to produce, use `BuildFrom`.

0 commit comments

Comments
 (0)