Skip to content

Commit d97754d

Browse files
authored
Error messages for rename_columns and Vector.duplicates (#9917)
- Improve error message for `rename_columns`. - Add `length` to `Set` and `Map`. - Add `duplicates` to `Vector` (and `Array`). ![image](https://github.com/enso-org/enso/assets/4699705/623df253-52e8-4bdc-a69c-ac8dc3ca594e)
1 parent 4af33f0 commit d97754d

File tree

9 files changed

+106
-8
lines changed

9 files changed

+106
-8
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -660,6 +660,7 @@
660660
- [Added ability to write to Data Links.][9750]
661661
- [Added `Vector.build_multiple`, and better for support for errors and warnings
662662
inside `Vector.build` and `Vector.build_multiple`.][9766]
663+
- [Added `Vector.duplicates`.][9917]
663664

664665
[debug-shortcuts]:
665666
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
@@ -968,6 +969,7 @@
968969
[9577]: https://github.com/enso-org/enso/pull/9577
969970
[9750]: https://github.com/enso-org/enso/pull/9750
970971
[9766]: https://github.com/enso-org/enso/pull/9766
972+
[9917]: https://github.com/enso-org/enso/pull/9917
971973

972974
#### Enso Compiler
973975

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Array.enso

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -343,6 +343,26 @@ type Array
343343
distinct : (Any -> Any) -> Vector Any
344344
distinct self (on = x->x) = Array_Like_Helpers.distinct self on
345345

346+
## ALIAS duplicates
347+
GROUP Selections
348+
ICON preparation
349+
Returns only non-unique elements within the array.
350+
351+
Arguments:
352+
- on: A projection from the element type to the value of that element
353+
which determines the uniqueness criteria.
354+
355+
The returned duplicate elements are kept in the same order as the
356+
first duplicate appeared in the input.
357+
358+
> Example
359+
Removing repeating entries.
360+
361+
[1, 3, 1, 2, 2, 1].to_array . duplicates == [1, 2].to_array
362+
duplicates : (Any -> Any) -> Vector Any
363+
duplicates self (on = x->x) =
364+
Array_Like_Helpers.duplicates self on
365+
346366
## ICON dataframe_map_column
347367
Applies a function to each element of the array, returning the `Vector` of
348368
results.

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Map.enso

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -118,12 +118,17 @@ type Map key value
118118
not_empty : Boolean
119119
not_empty self = self.is_empty.not
120120

121-
## GROUP Metadata
122-
ICON metadata
121+
## ICON metadata
123122
Returns the number of entries in this map.
124123
size : Integer
125124
size self = @Builtin_Method "Map.size"
126125

126+
## GROUP Metadata
127+
ICON metadata
128+
Returns the number of entries in this map.
129+
length : Integer
130+
length self = self.size
131+
127132
## GROUP Calculations
128133
ICON row_add
129134
Inserts a key-value mapping into this map, overriding any existing

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Set.enso

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,17 @@ type Set
4747
to_vector : Vector
4848
to_vector self = self.underlying_map.keys
4949

50-
## GROUP Metadata
51-
ICON metadata
50+
## ICON metadata
5251
Returns the number of elements in this set.
5352
size : Integer
5453
size self = self.underlying_map.size
5554

55+
## GROUP Metadata
56+
ICON metadata
57+
Returns the number of elements in this set.
58+
length : Integer
59+
length self = self.size
60+
5661
## GROUP Logical
5762
ICON metadata
5863
Checks if the set is empty.

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1243,6 +1243,26 @@ type Vector a
12431243
distinct self (on = x->x) =
12441244
Array_Like_Helpers.distinct self on
12451245

1246+
## ALIAS duplicates
1247+
GROUP Selections
1248+
ICON preparation
1249+
Returns only non-unique elements within the vector.
1250+
1251+
Arguments:
1252+
- on: A projection from the element type to the value of that element
1253+
which determines the uniqueness criteria.
1254+
1255+
The returned duplicate elements are kept in the same order as the
1256+
first duplicate appeared in the input.
1257+
1258+
> Example
1259+
Removing repeating entries.
1260+
1261+
[1, 3, 1, 2, 2, 1] . duplicates == [1, 2]
1262+
duplicates : (Any -> Any) -> Vector Any
1263+
duplicates self (on = x->x) =
1264+
Array_Like_Helpers.duplicates self on
1265+
12461266
## ICON convert
12471267
Returns the vector as a `Vector`.
12481268
to_vector : Vector

distribution/lib/Standard/Base/0.0.0-dev/src/Internal/Array_Like_Helpers.enso

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,13 @@ distinct vector on =
164164
builder.append item
165165
existing.insert key True
166166

167+
duplicates vector on = Vector.build builder->
168+
vector.fold Map.empty current-> item->
169+
key = on item
170+
count = current.get key 0
171+
if count == 1 then builder.append item
172+
current.insert key count+1
173+
167174
take vector range = case range of
168175
## We are using a specialized implementation for `take Sample`, because
169176
the default implementation (which needs to be generic for any

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Table_Helpers.enso

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -311,10 +311,18 @@ rename_columns (naming_helper : Column_Naming_Helper) (internal_columns:Vector)
311311
is_vec_pairs = mapping.is_a Vector && mapping.length > 0 && (mapping.first.is_a Text . not)
312312
case is_vec_pairs of
313313
True ->
314-
## Attempt to treat as Map
315-
map = Map.from_vector mapping
316-
if map.is_error then Error.throw (Illegal_Argument.Error "A mapping Vector must be either a list of names or a list of pairs (old name to new name).") else
317-
rename_columns naming_helper internal_columns map case_sensitivity error_on_missing_columns on_problems
314+
## Check all pairs (Integer | Text | Regex => Text )
315+
is_valid_row r = r.is_a Vector || r.is_a Pair
316+
is_valid_key k = k.is_a Integer || k.is_a Text || k.is_a Regex
317+
all_pairs = mapping.all p-> (is_valid_row p) && p.length == 2 && (is_valid_key p.first) && p.second.is_a Text
318+
if all_pairs.not then Error.throw (Illegal_Argument.Error "mapping is not a Vector of old name to new name.") else
319+
## Attempt to treat as Map
320+
map = Map.from_vector mapping error_on_duplicates=False
321+
if map.length == mapping.length then rename_columns naming_helper internal_columns map case_sensitivity error_on_missing_columns on_problems else
322+
duplicates = mapping.duplicates on=_.first . map p->p.first.to_text
323+
duplicate_text = if duplicates.length < 5 then duplicates.to_vector . join ", " else
324+
duplicates.take 3 . to_vector . join ", " + (", ... " + (duplicates.length - 3).to_text + " others")
325+
Error.throw (Illegal_Argument.Error "duplicate old name mappings ("+duplicate_text+").")
318326
False ->
319327
unique = naming_helper.create_unique_name_strategy
320328
problem_builder = Problem_Builder.new error_on_missing_columns=error_on_missing_columns

test/Base_Tests/src/Data/Vector_Spec.enso

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -857,6 +857,13 @@ type_spec suite_builder name alter = suite_builder.group name group_builder->
857857
alter [1, 1.0, 2, 2.0] . distinct . should_equal [1, 2]
858858
alter [] . distinct . should_equal []
859859

860+
group_builder.specify "should return a vector containing only duplicate elements" <|
861+
alter [1, 3, 1, 2, 2, 1] . duplicates . should_equal [1, 2]
862+
alter ["a", "a", "a"] . duplicates . should_equal ["a"]
863+
alter ['ś', 's', 's\u0301'] . duplicates . should_equal ['s\u0301']
864+
alter [1, 1.0, 2, 2.0] . duplicates . should_equal [1.0, 2.0]
865+
alter [] . duplicates . should_equal []
866+
860867
group_builder.specify "should be able to handle distinct on different primitive values" <|
861868
alter [1, "a"] . distinct . should_equal [1, "a"]
862869
alter ["a", 1] . distinct . should_equal ["a", 1]

test/Table_Tests/src/Common_Table_Operations/Select_Columns_Spec.enso

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from Standard.Base import all
2+
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument
23

34
from Standard.Table import Position, Value_Type, Bits
45
from Standard.Table.Errors import all
@@ -503,6 +504,29 @@ add_specs suite_builder setup =
503504
expect_column_names ["lpha", "beta", "gamma", "delta"] <|
504505
data.table.rename_columns map
505506

507+
group_builder.specify "should report invalid input map nicely" <|
508+
test_invalid_map map =
509+
result = data.table.rename_columns map
510+
result.should_fail_with Illegal_Argument
511+
result.catch Any . message . should_equal "mapping is not a Vector of old name to new name."
512+
513+
test_invalid_map [["Alpha"]]
514+
test_invalid_map [["Alpha", 1]]
515+
test_invalid_map [["Alpha", "Beta", "Delta"]]
516+
test_invalid_map [[True, "Beta"]]
517+
518+
group_builder.specify "should report duplicates in input map nicely" <|
519+
test_duplicate_names map message =
520+
result = data.table.rename_columns map
521+
result.should_fail_with Illegal_Argument
522+
result.catch Any . message . should_equal message
523+
524+
test_duplicate_names [["Alpha", "1"], ["Alpha", "2"]] "duplicate old name mappings (Alpha)."
525+
test_duplicate_names [["Alpha", "1"], ["Beta", "2"], ["Gamma", "3"], ["Beta", "4"], ["Alpha", "5"]] "duplicate old name mappings (Beta, Alpha)."
526+
test_duplicate_names [["Alpha", "1"], ["Alpha", "2"], ["Alpha", "3"]] "duplicate old name mappings (Alpha)."
527+
test_duplicate_names [["Alpha", "1"], ["Beta", "2"], ["Gamma", "3"], ["Beta", "4"], ["Alpha", "5"], ["Gamma","6"], ["Delta","7"], ["Delta","8"]] "duplicate old name mappings (Beta, Alpha, Gamma, Delta)."
528+
test_duplicate_names [["Alpha", "1"], ["Beta", "2"], ["Gamma", "3"], ["Beta", "4"], ["Alpha", "5"], ["Gamma","6"], ["Delta","7"], ["Delta","8"], ["Echo","9"], ["Echo","10"]] "duplicate old name mappings (Beta, Alpha, Gamma, ... 2 others)."
529+
506530
group_builder.specify "should correctly handle problems: unmatched names" <|
507531
weird_name = '.*?-!@#!"'
508532
map = Map.from_vector [["alpha", "FirstColumn"], ["omicron", "Another"], [weird_name, "Fixed"]]

0 commit comments

Comments
 (0)