forked from SquareBracketAssociates/UpdatedPharoByExample
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCollections.pier
1039 lines (793 loc) · 40.8 KB
/
Collections.pier
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
!Collections
@cha:collections
!!Introduction
The collection classes form a loosely-defined group of general-purpose
subclasses of ==Collection== and ==Stream==. Many of these (like ==Bitmap==,
==FileStream== and ==CompiledMethod==) are special-purpose classes crafted for
use in other parts of the system or in applications, and hence not categorized
as ''Collections'' by the system organization. For the purposes of this chapter,
we use the term ''Collection Hierarchy'' to mean ==Collection== and its
subclasses that are ''also'' in the packages labelled ==Collections-*==. We use
the term ''Stream Hierarchy'' to mean ==Stream== and its subclasses that are
''also'' in the ==Collections-Streams== packages.
In this chapter we focus mainly on the subset of collection classes shown in
Figure *@fig:CollClassesTree*. Streams will be discussed separately in
Chapter *: Streams>../Streams/Streams.pillar@cha:streams*.
+Some of the key collection classes in Pharo.>file://figures/CollectionHierarchy.png|label=fig:CollClassesTree+
!!The varieties of collections
@sec:varieties
To make good use of the collection classes, the reader needs at least a
superficial knowledge of the wide variety of collections that they implement,
and their commonalities and differences.
Programming with collections rather than individual elements is an important way
to raise the level of abstraction of a program. The Lisp function ==map==,
which applies an argument function to every element of a list and returns a new
list containing the results is an early example of this style, but Smalltalk-80
adopted collection-based programming as a central tenet. Modern functional
programming languages such as ML and Haskell have followed Smalltalk's lead.
Why is this a good idea? Suppose you have a data structure containing a
collection of student records, and wish to perform some action on all of the
students that meet some criterion. Programmers raised to use an imperative
language will immediately reach for a loop. But the Smalltalk programmer will
write:
[[[
students select: [ :each | each gpa < threshold ]
]]]
This evaluates to a new collection containing precisely those elements of
==students== for which the bracketed function returns ==true==. (The expression
in brackets can be thought of as a lambda-expression defining an anonymous
function ==x. x gpa < threshold==). This code has the simplicity and
elegance of a domain-specific query language.
The message ==select:== is understood by ''all'' collections in Smalltalk.
There is no need to find out if the student data structure is an array or a
linked list: the ==select:== message is understood by both. Note that this is
quite different from using a loop, where one must know whether ==students== is
an array or a linked list before the loop can be set up.
In Smalltalk, when one speaks of a collection without being more specific about
the kind of collection, one means an object that supports well-defined protocols
for testing membership and enumerating the elements. ''All'' collections
understand the ==testing== messages ==includes:==, ==isEmpty== and
==occurrencesOf:==. ''All'' collections understand the ==enumeration== messages
==do:==, ==select:==, ==reject:== (which is the opposite of ==select:==),
==collect:== (which is like Lisp's ==map==), ==detect:ifNone:==,
==inject:into:== (which performs a left fold) and many more. It is the ubiquity
of this protocol, as well as its variety, that makes it so powerful.
% Figure *@fig:protocols* %
The table below summarizes the standard protocols supported by most of
the classes in the collection hierarchy. These methods are defined, redefined,
optimized or occasionally even forbidden by subclasses of ==Collection==.
% \caption{Standard Collection protocols\figlabel{protocols}}
|!Protocol |!Methods |
| ==accessing== | ==size==, ==capacity==, ==at:==, ==at:put:== |
| ==testing== | ==isEmpty==, ==includes:==, ==contains:==, ==occurrencesOf:== |
| ==adding== | ==add:==, ==addAll:==
| ==removing== | ==remove:==, ==remove:ifAbsent:==, ==removeAll:==
| ==enumerating== | ==do:==, ==collect:==, ==select:==, == reject:==, ==detect:==, ==detect:ifNone:==, ==inject:into:==
| ==converting== | ==asBag==, ==asSet==, ==asOrderedCollection==, ==asSortedCollection==,
| | ==asArray==, ==asSortedCollection:==
| ==creation== | ==with:==, ==with:with:==, ==with:with:with:==, ==with:with:with:with:==, ==withAll:==
Beyond this basic uniformity, there are many different kinds of collection
either supporting different protocols, or providing different behaviour for the
same requests. Let us briefly survey some of the key differences:
;==Sequenceable:==
:Instances of all subclasses of ==SequenceableCollection== start from a ==first== element and proceed in a well-defined order to a ==last== element. Instances of ==Set==, ==Bag== and ==Dictionary==, on the other hand, are not sequenceable.
;==Sortable:==
:A ==SortedCollection== maintains its elements in sort order.
;==Indexable:==
:Most sequenceable collections are also indexable, that is, elements can be retrieved with ==at:==. ==Array== is the familiar indexable data structure with a fixed size; ==anArray at: n== retrieves the n^^th^^ element of ==anArray==, and ==anArray at: n put: v== changes the n^^th^^ element to ==v==. ==LinkedList==s and ==SkipList==s are sequenceable but not indexable, that is, they understand ==first== and ==last==, but not ==at:==.
;==Keyed:==
:Instances of ==Dictionary== and its subclasses are accessed by keys instead of indices.
;==Mutable:==
:Most collections are mutable, but ==Interval==s and ==Symbol==s are not. An ==Interval== is an immutable collection representing a range of ==Integer==s. For example, ==5 to: 16 by: 2== is an interval that contains the elements 5, 7, 9, 11, 13 and 15. It is indexable with ==at:==, but cannot be changed with ==at:put:==.
;==Growable:==
:Instances of ==Interval== and ==Array== are always of a fixed size. Other kinds of collections (sorted collections, ordered collections, and linked lists) can grow after creation. The class ==OrderedCollection== is more general than ==Array==; the size of an ==OrderedCollection== grows on demand, and it has methods for ==addFirst:== and ==addLast:== as well as ==at:== and ==at:put:==.
;==Accepts duplicates:==
:A ==Set== will filter out duplicates, but a ==Bag== will not. ==Dictionary==, ==Set== and ==Bag== use the ==\=== method provided by the elements; the ==Identity== variants of these classes use the ==\=\=== method, which tests whether the arguments are the same object, and the ==Pluggable== variants use an arbitrary equivalence relation supplied by the creator of the collection.
;==Heterogeneous:==
:Most collections will hold any kind of element. A ==String==, ==CharacterArray== or ==Symbol==, however, only holds ==Character==s. An ==Array== will hold any mix of objects, but a ==ByteArray== only holds \lct{Byte}s, an ==IntegerArray== only holds ==Integer==s and a ==FloatArray== only holds ==Float==s. A ==LinkedList== is constrained to hold elements that conform to the ==Link== > ==accessing== protocol.
!!Implementations of collections
@sec:implementation
+Some collection classes categorized by implementation technique.>file://figures/CollectionsByImpl.png|width=100|label=fig:collsByImpl+
These categorizations by functionality are not our only concern; we must also
consider how the collection classes are implemented. As shown in Figure *@fig:collsByImpl*, five main implementation techniques are employed.
- Arrays store their elements in the (indexable) instance variables of the collection object itself; as a consequence, arrays must be of a fixed size, but can be created with a single memory allocation.
- ==OrderedCollections== and ==SortedCollections== store their elements in an array that is referenced by one of the instance variables of the collection. Consequently, the internal array can be replaced with a larger one if the collection grows beyond its storage capacity.
-The various kinds of set and dictionary also reference a subsidiary array for storage, but use the array as a hash table. Bags use a subsidiary ==Dictionary==, with the elements of the bag as keys and the number of occurrences as values.
-LinkedLists use a standard singly-linked representation.
-Intervals are represented by three integers that record the two endpoints and the step size.
In addition to these classes, there are also ''weak'' variants of ==Array==,
==Set== and of the various kinds of dictionary. These collections hold onto
their elements weakly, i.e., in a way that does not prevent the elements
from being garbage collected. The Pharo virtual machine is aware of these
classes and handles them specially.
!!Examples of key classes
We present now the most common or important collection classes using simple code
examples. The main protocols of collections are: ==at:==, ==at:put:== — to
access an element, ==add:==, ==remove:== — to add or remove an element,
==size==, ==isEmpty==, ==include:== — to get some information about the
collection, ==do:==, ==collect:==, ==select:== — to iterate over the collection.
Each collection may implement (or not) such protocols, and when they do, they
interpret them to fit with their semantics. We suggest you browse the classes
themselves to identify specific and more advanced protocols.
We will focus on the most common collection classes: ==OrderedCollection==,
==Set==, ==SortedCollection==, ==Dictionary==, ==Interval==, and ==Array==.
!!!!!Common creation protocol.
There are several ways to create instances of collections. The most generic ones
use the methods ==new:== and ==with:==. ==new: anInteger== creates a collection
of size ==anInteger== whose elements will all be ==nil==. ==with:==
==anObject== creates a collection and adds ==anObject== to the created
collection. Different collections will realize this behaviour differently.
You can create collections with initial elements using the methods ==with:==,
==with:with:== etc. for up to six elements.
[[[
Array with: 1 --> #(1)
Array with: 1 with: 2 --> #(1 2)
Array with: 1 with: 2 with: 3 --> #(1 2 3)
Array with: 1 with: 2 with: 3 with: 4 --> #(1 2 3 4)
Array with: 1 with: 2 with: 3 with: 4 with: 5 --> #(1 2 3 4 5)
Array with: 1 with: 2 with: 3 with: 4 with: 5 with: 6 --> #(1 2 3 4 5 6)
]]]
You can also use ==addAll:== to add all elements of one kind of collection to
another kind:
[[[
(1 to: 5) asOrderedCollection addAll: '678'; yourself --> an OrderedCollection(1 2 3 4 5 $6 $7 $8)
]]]
Take care that ==addAll:== returns its argument, and not the receiver!
You can also create many collections with ==withAll:== or ==newFrom:==
[[[
Array withAll: #(7 3 1 3) --> #(7 3 1 3)
OrderedCollection withAll: #(7 3 1 3) --> an OrderedCollection(7 3 1 3)
SortedCollection withAll: #(7 3 1 3) --> a SortedCollection(1 3 3 7)
Set withAll: #(7 3 1 3) --> a Set(7 1 3)
Bag withAll: #(7 3 1 3) --> a Bag(7 1 3 3)
Dictionary withAll: #(7 3 1 3) --> a Dictionary(1->7 2->3 3->1 4->3 )
]]]
[[[
Array newFrom: #(7 3 1 3) --> #(7 3 1 3)
OrderedCollection newFrom: #(7 3 1 3) --> an OrderedCollection(7 3 1 3)
SortedCollection newFrom: #(7 3 1 3) --> a SortedCollection(1 3 3 7)
Set newFrom: #(7 3 1 3) --> a Set(7 1 3)
Bag newFrom: #(7 3 1 3) --> a Bag(7 1 3 3)
Dictionary newFrom: {1 -> 7. 2 -> 3. 3 -> 1. 4 -> 3} --> a Dictionary(1->7 2->3 3->1 4->3 )
]]]
Note that these two methods are not identical. In particular, ==Dictionary
class>>withAll:== interprets its argument as a collection of values, whereas
==Dictionary class>>newFrom:== expects a collection of associations.
!!!==Array==
An ==Array== is a fixed-sized collection of elements accessed by integer
indices. Contrary to the C convention, the first element of a Smalltalk array is
at position 1 and not 0. The main protocol to access array elements is the
method ==at:== and ==at:put:==. ==at: anInteger== returns the element at index
==anInteger==. ==at: anInteger put: anObject== puts ==anObject== at index
==anInteger==. Arrays are fixed-size collections therefore we cannot add or
remove elements at the end of an array. The following code creates an array of
size 5, puts values in the first 3 locations and returns the first element.
[[[
anArray := Array new: 5.
anArray at: 1 put: 4.
anArray at: 2 put: 3/2.
anArray at: 3 put: 'ssss'.
anArray at: 1 --> 4
]]]
There are several ways to create instances of the class ==Array==. We can use
==new:==, ==with:==, and the constructs ==#( )== and =={ }==.
!!!!!Creation with ==new:==
==new: anInteger== creates an array of size ==anInteger==. ==Array new: 5==
creates an array of size 5. (Note: the value of each element is initialized
to ==nil==).
!!!!!Creation using ==with:==
==with:== methods allow one to specify the value of the elements. The following
code creates an array of three elements consisting of the number ==4==, the
fraction ==3/2== and the string =='lulu'==.
[[[
Array with: 4 with: 3/2 with: 'lulu' --> {4. (3/2). 'lulu'}
]]]
!!!!!Literal creation with ==#()==
==#()== creates literal arrays with static (or ''literal'') elements that have
to be known when the expression is compiled, and not when it is executed. The
following code creates an array of size 2 where the first element is the
(literal) number ==1== and the second the (literal) string =='here'==.
[[[
#(1 'here') size --> 2
]]]
Now, if you evaluate ==#(1+2)==, you do not get an array with a single element
==3== but instead you get the array ==#(1 #+ 2)== i.e., with three elements:
==1==, the symbol ==#+== and the number ==2==.
[[[
#(1+2) --> #(1 #+ 2)
]]]
This occurs because the construct ==#()== causes the compiler to interpret
literally the expressions contained in the array. The expression is scanned and
the resulting elements are fed to a new array. Literal arrays contain numbers,
==nil==, ==true==, ==false==, symbols, strings and other literal arrays.
!!!!!Dynamic creation with =={ }==
Finally, you can create a dynamic array using the construct =={}==. =={ a. b }==
is equivalent to ==Array with: a with: b==. This means in particular that the
expressions enclosed by =={== and ==}== are executed.
[[[
{ 1 + 2 } --> #(3)
{(1/2) asFloat} at: 1 --> 0.5
{10 atRandom. 1/3} at: 2 --> (1/3)
]]]
!!!!!Element Access
Elements of all sequenceable collections can be accessed with ==at:== and
==at:put:==.
[[[
anArray := #(1 2 3 4 5 6) copy.
anArray at: 3 --> 3
anArray at: 3 put: 33.
anArray at: 3 --> 33
]]]
Be careful with code that modifies literal arrays! The compiler tries to
allocate space just once for literal arrays. Unless you copy the array, the
second time you evaluate the code your ''literal'' array may not have the value
you expect. (Without cloning, the second time around, the literal ==#(1 2 3 4 5
6)== will actually be ==#(1 2 33 4 5 6)==!) Dynamic arrays do not have this
problem.
!!!==OrderedCollection==
==OrderedCollection== is one of the collections that can grow, and to which
elements can be added sequentially. It offers a variety of methods such as
==add:==, ==addFirst:==, ==addLast:==, and ==addAll:==.
[[[
ordCol := OrderedCollection new.
ordCol add: 'Seaside'; add: 'SmalltalkHub'; addFirst: 'Monticello'.
ordCol --> an OrderedCollection('Monticello' 'Seaside' 'SmalltalkHub')
]]]
!!!!!Removing Elements
The method ==remove:== ==anObject== removes the first occurrence of an object
from the collection. If the collection does not include such an object, it
raises an error.
[[[
ordCol add: 'Monticello'.
ordCol remove: 'Monticello'.
ordCol --> an OrderedCollection('Seaside' 'SmalltalkHub' 'Monticello')
]]]
There is a variant of ==remove:== named ==remove:ifAbsent:== that allows one to
specify as second argument a block that is executed in case the element to be
removed is not in the collection.
[[[
result := ordCol remove: 'zork' ifAbsent: [33].
result --> 33
]]]
!!!!!Conversion
It is possible to get an ==OrderedCollection== from an ==Array== (or any other
collection) by sending the message ==asOrderedCollection==:
[[[
#(1 2 3) asOrderedCollection --> an OrderedCollection(1 2 3)
'hello' asOrderedCollection --> an OrderedCollection($h $e $l $l $o)
]]]
!!!==Interval==
The class ==Interval== represents ranges of numbers. For example, the interval
of numbers from 1 to 100 is defined as follows:
[[[
Interval from: 1 to: 100 --> (1 to: 100)
]]]
The ==printString== of this interval reveals that the class ==Number== provides
us with a convenience method called ==to:== to generate intervals:
[[[
(Interval from: 1 to: 100) = (1 to: 100) --> true
]]]
We can use ==Interval class>>from:to:by:== or ==Number>>to:by:== to specify the
step between two numbers as follow:
[[[
(Interval from: 1 to: 100 by: 0.5) size --> 199
(1 to: 100 by: 0.5) at: 198 --> 99.5
(1/2 to: 54/7 by: 1/3) last --> (15/2)
]]]
!!!==Dictionary==
Dictionaries are important collections whose elements are accessed using keys.
Among the most commonly used messages of dictionary you will find ==at:==,
==at:put:==, ==at:ifAbsent:==, ==keys== and ==values==.
[[[
colors := Dictionary new.
colors at: #yellow put: Color yellow.
colors at: #blue put: Color blue.
colors at: #red put: Color red.
colors at: #yellow --> Color yellow
colors keys --> a Set(#blue #yellow #red)
colors values --> {Color blue. Color yellow. Color red}
]]]
Dictionaries compare keys by equality. Two keys are considered to be the same if
they return true when compared using =====. A common and difficult to spot bug
is to use as key an object whose ===== method has been redefined but not its
==hash== method. Both methods are used in the implementation of dictionary and
when comparing objects.
In its implementation, a ==Dictionary== can be seen as consisting of a set of
associations (key value) created using the message ==->==. We can create a
==Dictionary== from a collection of associations, or we may convert a dictionary
to an array of associations.
[[[
colors := Dictionary newFrom: { #blue->Color blue . #red->Color red . #yellow->Color yellow }.
colors removeKey: #blue.
colors associations --> {#yellow->Color yellow. #red->Color red}
]]]
!!!!!==IdentityDictionary==
While a dictionary uses the result of the messages ===== and ==hash== to
determine if two keys are the same, the class ==IdentityDictionary== uses the
identity (message ==\=\===) of the key instead of its values, i.e., it
considers two keys to be equal ''only'' if they are the same object.
Often ==Symbol==s are used as keys, in which case it is natural to use an
==IdentityDictionary==, since a ==Symbol== is guaranteed to be globally unique.
If, on the other hand, your keys are ==String==s, it is better to use a plain
==Dictionary==, or you may get into trouble:
[[[
a := 'foobar'.
b := a copy.
trouble := IdentityDictionary new.
trouble at: a put: 'a'; at: b put: 'b'.
trouble at: a --> 'a'
trouble at: b --> 'b'
trouble at: 'foobar' --> 'a'
]]]
Since ==a== and ==b== are different objects, they are treated as different
objects. Interestingly, the literal =='foobar'== is allocated just once, so is
really the same object as ==a==. You don't want your code to depend on behaviour
like this! A plain ==Dictionary== would give the same value for any key equal to
=='foobar'==.
Use only globally unique objects (like ==Symbol==s or ==SmallInteger==s) as keys
for an ==IdentityDictionary==, and ==String==s (or other objects) as keys for a
plain ==Dictionary==.
Note that the global ==Smalltalk== is an instance of ==SystemDictionary==, a
subclass of ==IdentityDictionary==, hence all its keys are ==Symbol==s
(actually, ==ByteSymbol==s, which contain only 8-bit characters).
[[[
Smalltalk keys collect: [ :each | each class ] --> a Set(ByteSymbol)
]]]
Sending ==keys== or ==values== to a ==Dictionary== results in a ==Set==, which
we look at next.
!!!==Set==
The class ==Set== is a collection which behaves as a mathematical set, i.e.,
as a collection with no duplicate elements and without any order. In a ==Set==
elements are added using the message ==add:== and they cannot be accessed using
the message ==at:==. Objects put in a set should implement the methods ==hash==
and =====.
[[[
s := Set new.
s add: 4/2; add: 4; add:2.
s size --> 2
]]]
You can also create sets using ==Set class>>newFrom:== or the conversion message
==Collection>>asSet==:
[[[
(Set newFrom: #( 1 2 3 1 4 )) = #(1 2 3 4 3 2 1) asSet --> true
]]]
==asSet== offers us a convenient way to eliminate duplicates from a collection:
[[[
{ Color black. Color white. (Color red + Color blue + Color green) } asSet size --> 2
]]]
Note that ""red"" ==\+== ""blue"" ==\+== ""green"" ==\=== ""white"".
A ==Bag== is much like a ==Set== except that it does allow duplicates:
[[[
{ Color black. Color white. (Color red + Color blue + Color green) } asBag size --> 3
]]]
The set operations ''union'', ''intersection'' and ''membership test'' are
implemented by the ==Collection== messages ==union:==, ==intersection:== and
==includes:==. The receiver is first converted to a ==Set==, so these operations
work for all kinds of collections!
[[[
(1 to: 6) union: (4 to: 10) --> a Set(1 2 3 4 5 6 7 8 9 10)
'hello' intersection: 'there' --> 'he'
#Smalltalk includes: $k --> true
]]]
As we explain below, elements of a set are accessed using iterators (see Section
*@sec:iterators*).
!!!==SortedCollection==
In contrast to an ==OrderedCollection==, a ==SortedCollection== maintains its
elements in sort order. By default, a sorted collection uses the message ==<===
to establish sort order, so it can sort instances of subclasses of the abstract
class ==Magnitude==, which defines the protocol of comparable objects (==<==,
==\===, ==>==, ==>===, ==between:and:==...).
(See Chapter *: Basic Classes>../BasicClasses/BasicClasses.pier@cha:basicClasses*).
You can create a ==SortedCollection== by creating a new instance and adding
elements to it:
[[[
SortedCollection new add: 5; add: 2; add: 50; add: -10; yourself. --> a SortedCollection(-10 2 5 50)
]]]
More usually, though, one will send the conversion message
==asSortedCollection== to an existing collection:
[[[
#(5 2 50 -10) asSortedCollection --> a SortedCollection(-10 2 5 50)
]]]
[[[
'hello' asSortedCollection --> a SortedCollection($e $h $l $l $o)
]]]
How do you get a ==String== back from this result? ==asString== unfortunately
returns the ==printString== representation, which is not what we want:
[[[
'hello' asSortedCollection asString --> 'a SortedCollection($e $h $l $l $o)'
]]]
The correct answer is to either use ==String class>>newFrom:==, ==String
class>>withAll:== or ==Object>>as:==:
[[[
'hello' asSortedCollection as: String --> 'ehllo'
String newFrom: ('hello' asSortedCollection) --> 'ehllo'
String withAll: ('hello' asSortedCollection) --> 'ehllo'
]]]
It is possible to have different kinds of elements in a ==SortedCollection== as
long as they are all comparable. For example we can mix different kinds of
numbers such as integers, floats and fractions:
[[[
{ 5. 2/-3. 5.21 } asSortedCollection --> a SortedCollection((-2/3) 5 5.21)
]]]
Imagine that you want to sort objects that do not define the method ==<=== or
that you would like to have a different sorting criterion. You can do this by
supplying a two argument block, called a sortblock, to the sorted collection.
For example, the class ==Color== is not a Magnitude and it does not implement
the method ==<===, but we can specify a block stating that the colors should be
sorted according to their luminance (a measure of brightness).
[[[
col := SortedCollection sortBlock: [:c1 :c2 | c1 luminance <= c2 luminance].
col addAll: { Color red. Color yellow. Color white. Color black }.
col --> a SortedCollection(Color black Color red Color yellow Color white)
]]]
!!!==String==
A Smalltalk ==String== represents a collection of ==Character==s. It is
sequenceable, indexable, mutable and homogeneous, containing only ==Character==
instances. Like ==Array==s, ==String==s have a dedicated syntax, and are
normally created by directly specifying a ==String== literal within single
quotes, but the usual collection creation methods will work as well.
[[[
'Hello' --> 'Hello'
String with: $A --> 'A'
String with: $h with: $i with: $! --> 'hi!'
String newFrom: #($h $e $l $l $o) --> 'hello'
]]]
In actual fact, ==String== is abstract. When we instantiate a ==String== we
actually get either an 8-bit ==ByteString== or a 32-bit ==WideString==. To keep
things simple, we usually ignore the difference and just talk about instances of
==String==.
Two instances of ==String== can be concatenated with a comma.
[[[
s := 'no', ' ', 'worries'.
s --> 'no worries'
]]]
Since a string is a mutable collection we can also change it using the method
==at:put:==.
[[[
s at: 4 put: $h; at: 5 put: $u.
s --> 'no hurries'
]]]
Note that the comma method is defined by ==Collection==, so it will work for any
kind of collection!
[[[
(1 to: 3), '45' --> #(1 2 3 $4 $5)
]]]
We can also modify an existing string using ==replaceAll:with:== or
==replaceFrom:to:with:== as shown below. Note that the number of characters and
the interval should have the same size.
[[[
s replaceAll: $n with: $N.
s --> 'No hurries'
s replaceFrom: 4 to: 5 with: 'wo'.
s --> 'No worries'
]]]
In contrast to the methods described above, the method ==copyReplaceAll:==
creates a new string. (Curiously, here the arguments are substrings rather than
individual characters, and their sizes do not have to match.)
[[[
s copyReplaceAll: 'rries' with: 'mbats' --> 'No wombats'
]]]
A quick look at the implementation of these methods reveals that they are
defined not only for ==String==s, but for any kind of
==SequenceableCollection==, so the following also works:
[[[
(1 to: 6) copyReplaceAll: (3 to: 5) with: { 'three'. 'etc.' } --> #(1 2 'three' 'etc.' 6)
]]]
!!!!!String matching
It is possible to ask whether a pattern matches a string by sending the
==match:== message. The pattern can specify ==*== to match an arbitrary series
of characters and # to match a single character. Note that ==match:== is sent to
the pattern and not the string to be matched.
[[[
'Linux *' match: 'Linux mag' --> true
'GNU/Linux #ag' match: 'GNU/Linux tag' --> true
]]]
Another useful method is ==findString:==.
[[[
'GNU/Linux mag' findString: 'Linux' --> 5
'GNU/Linux mag' findString: 'linux' startingAt: 1 caseSensitive: false --> 5
]]]
More advanced pattern matching facilities are also available in the ==Regex==
package.
!!!!!Substrings
For substring manipulation we can use methods like ==first==, ==first:==,
==allButFirst:==, ==copyFrom:to:== and others, defined in
==SequenceableCollection==.
[[[
'alphabet' at: 6 --> $b
'alphabet' first --> $a
'alphabet' first: 5 --> 'alpha'
'alphabet' allButFirst: 3 --> 'habet'
'alphabet' copyFrom: 5 to: 7 --> 'abe'
'alphabet' copyFrom: 3 to: 3 --> 'p' (not $p)
]]]
Be aware that result type can be different, depending on the method used.
Most of the substring-related methods return String instances. But the methods
that always return one element of the ==String== collection, return a
==Character== instance (for example, =='alphabet' at: 6== returns the character
==$b==). Methods that just ''happen'' to return a single element, such as
==copyFrom: x to: x==, return a ==String== of size 1, not a ==Character==.
This is because other times (with different parameters), ==copyFrom:to:== can
return multiple elements.
For complete list of substring-related methods, browse the
==SequenceableCollection== class (especially the ==accessing== protocol).
!!!!!Some tests on strings
The following examples illustrate the use of ==isEmpty==, ==includes:== and
==anySatisfy:== which are also messages defined not only on Strings but more
generally on collections.
[[[
'Hello' isEmpty --> false
'Hello' includes: $a --> false
'JOE' anySatisfy: [:c | c isLowercase] --> false
'Joe' anySatisfy: [:c | c isLowercase] --> true
]]]
!!!!!String templating
There are three messages that are useful to manage string templating:
==format:==, ==expandMacros== and ==expandMacrosWith:==.
[[[
'{1} is {2}' format: {'Pharo' . 'cool'} --> 'Pharo is cool'
]]]
The messages of the ==expandMacros== family offer variable substitution, using
==<n>== for carriage return, ==<t>== for tabulation, ==<1s>==, ==<2s>==,
==<3s>== for arguments (==<1p>==, ==<2p>==, surrounds the string with single
quotes), and ==<1?value1:value2>== for conditional.
[[[
'look-<t>-here' expandMacros --> 'look- -here'
'<1s> is <2s>' expandMacrosWith: 'Pharo' with: 'cool' --> 'Pharo is cool'
'<2s> is <1s>' expandMacrosWith: 'Pharo' with: 'cool' --> 'cool is Pharo'
'<1p> or <1s>' expandMacrosWith: 'Pharo' with: 'cool' --> '''Pharo'' or Pharo'
'<1?Quentin:Thibaut> plays' expandMacrosWith: true --> 'Quentin plays'
'<1?Quentin:Thibaut> plays' expandMacrosWith: false --> 'Thibaut plays'
]]]
!!!!!Some other utility methods
The class ==String== offers numerous other utilities including the messages
==asLowercase==, ==asUppercase== and ==capitalized==.
[[[
'XYZ' asLowercase --> 'xyz'
'xyz' asUppercase --> 'XYZ'
'hilaire' capitalized --> 'Hilaire'
'1.54' asNumber --> 1.54
'this sentence is without a doubt far too long' contractTo: 20 --> 'this sent...too long'
]]]
Note that there is generally a difference between asking an object its string
representation by sending the message ==printString== and converting it to a
string by sending the message ==asString==. Here is an example of the
difference.
[[[
#ASymbol printString --> '#ASymbol'
#ASymbol asString --> 'ASymbol'
]]]
A symbol is similar to a string but is guaranteed to be globally unique. For
this reason symbols are preferred to strings as keys for dictionaries, in
particular for instances of ==IdentityDictionary==. See also
Chapter *: Basic Classes>../BasicClasses/BasicClasses.pier@cha:basicClasses* for
more about ==String== and ==Symbol==.
!!Collection iterators
@sec:iterators
In Smalltalk loops and conditionals are simply messages sent to collections or
other objects such as integers or blocks (see also Chapter *: Syntax in a Nutshell>../SyntaxNutshell/SyntaxNutshell.pillar@cha:syntax*).
In addition to low-level messages such as ==to:do:== which evaluates a block with
an argument ranging from an initial to a final number, the collection
hierarchy offers various high-level iterators. Using such iterators will make
your code more robust and compact.
!!!Iterating (==do:)==
The method ==do:== is the basic collection iterator. It applies its argument (a
block taking a single argument) to each element of the receiver. The following
example prints all the strings contained in the receiver to the transcript.
[[[
#('bob' 'joe' 'toto') do: [:each | Transcript show: each; cr].
]]]
!!!!!Variants
There are a lot of variants of ==do:==, such as ==do:without:==,
==doWithIndex:== and ==reverseDo:==.
For the indexed collections (==Array==, ==OrderedCollection==,
==SortedCollection==) the method ==doWithIndex:== also gives access to the
current index. This method is related to ==to:do:== which is defined in class
==Number==.
[[[
#('bob' 'joe' 'toto') doWithIndex: [:each :i | (each = 'joe') ifTrue: [ ^ i ] ] --> 2
]]]
For ordered collections, ==reverseDo:== walks the collection in the reverse
order.
The following code shows an interesting message: ==do:separatedBy:== which
executes the second block only in between two elements.
[[[
res := ''.
#('bob' 'joe' 'toto') do: [:e | res := res, e ] separatedBy: [res := res, '.'].
res --> 'bob.joe.toto'
]]]
Note that this code is not especially efficient since it creates intermediate
strings and it would be better to use a write stream to buffer the result (see
Chapter *: Streams>../Streams/Streams.pillar@cha:streams*):
[[[
String streamContents: [:stream | #('bob' 'joe' 'toto') asStringOn: stream delimiter: '.' ] --> 'bob.joe.toto'
]]]
!!!!Dictionaries
When the message ==do:== is sent to a dictionary, the elements taken into
account are the values, not the associations. The proper methods to use are
==keysDo:==, ==valuesDo:==, and ==associationsDo:==, which iterate respectively
on keys, values or associations.
[[[
colors := Dictionary newFrom: { #yellow -> Color yellow. #blue -> Color blue. #red -> Color red }.
colors keysDo: [:key | Transcript show: key; cr]. "displays the keys"
colors valuesDo: [:value | Transcript show: value; cr]. "displays the values"
colors associationsDo: [:value | Transcript show: value; cr]. "displays the associations"
]]]
!!!Collecting results (==collect:==)
If you want to process the elements of a collection and produce a new collection
as a result, rather than using ==do:==, you are probably better off using
==collect:==, or one of the other iterator methods. Most of these can be found
in the ==enumerating== protocol of ==Collection== and its subclasses.
Imagine that we want a collection containing the doubles of the elements in
another collection. Using the method ==do:== we must write the following:
[[[
double := OrderedCollection new.
#(1 2 3 4 5 6) do: [:e | double add: 2 * e].
double --> an OrderedCollection(2 4 6 8 10 12)
]]]
The method ==collect:== executes its argument block for each element and returns
a new collection containing the results. Using ==collect:== instead, the code is
much simpler:
[[[
#(1 2 3 4 5 6) collect: [:e | 2 * e] --> #(2 4 6 8 10 12)
]]]
The advantages of ==collect:== over ==do:== are even more dramatic in the
following example, where we take a collection of integers and generate as a
result a collection of absolute values of these integers:
[[[
aCol := #( 2 -3 4 -35 4 -11).
result := aCol species new: aCol size.
1 to: aCol size do: [ :each | result at: each put: (aCol at: each) abs].
result --> #(2 3 4 35 4 11)
]]]
Contrast the above with the much simpler following expression:
[[[
#( 2 -3 4 -35 4 -11) collect: [:each | each abs ] --> #(2 3 4 35 4 11)
]]]
A further advantage of the second solution is that it will also work for sets
and bags.
Generally you should avoid using ==do:==, unless you want to send messages to
each of the elements of a collection.
Note that sending the message ==collect:== returns the same kind of collection
as the receiver. For this reason the following code fails. (A ==String== cannot
hold integer values.)
[[[
'abc' collect: [:ea | ea asciiValue ] "error!"
]]]
Instead we must first convert the string to an ==Array== or an
==OrderedCollection==:
[[[
'abc' asArray collect: [:ea | ea asciiValue ] --> #(97 98 99)
]]]
Actually ==collect:== is not guaranteed to return a collection of exactly the
same class as the receiver, but only the same ''species''. In the case of
an ==Interval==, the species is actually an ==Array==!
[[[
(1 to: 5) collect: [ :ea | ea * 2 ] --> #(2 4 6 8 10)
]]]
!!!Selecting and rejecting elements
==select:== returns the elements of the receiver that satisfy a particular
condition:
[[[
(2 to: 20) select: [:each | each isPrime] --> #(2 3 5 7 11 13 17 19)
]]]
==reject:== does the opposite:
[[[
(2 to: 20) reject: [:each | each isPrime] --> #(4 6 8 9 10 12 14 15 16 18 20)
]]]
!!!Identifying an element with ==detect:==
The method ==detect:== returns the first element of the receiver that matches
block argument.
[[[
'through' detect: [:each | each isVowel] --> $o
]]]
The method ==detect:ifNone:== is a variant of the method ==detect:==. Its second
block is evaluated when there is no element matching the block.
[[[
Smalltalk allClasses detect: [:each | '*cobol*' match: each asString] ifNone: [ nil ] --> nil
]]]
!!!Accumulating results with ==inject:into:==
Functional programming languages often provide a higher-order function called
''fold'' or ''reduce'' to accumulate a result by applying some binary operator
iteratively over all elements of a collection. In Pharo this is done by
==Collection>>inject:into:==.
The first argument is an initial value, and the second argument is a
two-argument block which is applied to the result this far, and each element in
turn.
A trivial application of ==inject:into:== is to produce the sum of a collection
of numbers. Following Gauss, in Pharo we could write this expression to sum the
first 100 integers:
[[[
(1 to: 100) inject: 0 into: [:sum :each | sum + each ] --> 5050
]]]
Another example is the following one-argument block which computes factorials:
[[[
factorial := [:n | (1 to: n) inject: 1 into: [:product :each | product * each ] ].
factorial value: 10 --> 3628800
]]]
!!!Other messages
!!!!! ==count:==
The message ==count:== returns the number of elements satisfying a condition.
The condition is represented as a boolean block.
[[[
Smalltalk allClasses count: [:each | 'Collection*' match: each asString ] --> 6
]]]
!!!!! ==includes:==
The message ==includes:== checks whether the argument is contained in the
collection.
[[[
colors := {Color white. Color yellow. Color red. Color blue. Color orange}.
colors includes: Color blue. --> true
]]]
!!!!! ==anySatisfy:==
The message ==anySatisfy:== answers true if at least one element of the
collection satisfies the condition represented by the argument.
[[[
colors anySatisfy: [:c | c red > 0.5] --> true
]]]
!!Some hints for using collections
!!!!!A common mistake with ==add:==
The following error is one of the most frequent Smalltalk mistakes.
[[[
collection := OrderedCollection new add: 1; add: 2.
collection --> 2
]]]
Here the variable ==collection== does not hold the newly created collection but
rather the last number added. This is because the method ==add:== returns the
element added and not the receiver.
The following code yields the expected result:
[[[
collection := OrderedCollection new.
collection add: 1; add: 2.
collection --> an OrderedCollection(1 2)
]]]
You can also use the message ==yourself== to return the receiver of a cascade of
messages:
[[[
collection := OrderedCollection new add: 1; add: 2; yourself --> an OrderedCollection(1 2)
]]]
!!!!!Removing an element of the collection you are iterating on
Another mistake you may make is to remove an element from a collection you are
currently iterating over.
[[[
range := (2 to: 20) asOrderedCollection.
range do: [:aNumber | aNumber isPrime ifFalse: [ range remove: aNumber ] ].
range --> an OrderedCollection(2 3 5 7 9 11 13 15 17 19)
]]]
This result is clearly incorrect since 9 and 15 should have been filtered out!
The solution is to copy the collection before going over it.
[[[
range := (2 to: 20) asOrderedCollection.
range copy do: [:aNumber | aNumber isPrime ifFalse: [ range remove: aNumber ] ].
range --> an OrderedCollection(2 3 5 7 11 13 17 19)