Skip to content

Commit

Permalink
Error messages for rename_columns and Vector.duplicates (#9917)
Browse files Browse the repository at this point in the history
- Improve error message for `rename_columns`.
- Add `length` to `Set` and `Map`.
- Add `duplicates` to `Vector` (and `Array`).

![image](https://github.com/enso-org/enso/assets/4699705/623df253-52e8-4bdc-a69c-ac8dc3ca594e)
  • Loading branch information
jdunkerley authored May 10, 2024
1 parent 4af33f0 commit d97754d
Show file tree
Hide file tree
Showing 9 changed files with 106 additions and 8 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -660,6 +660,7 @@
- [Added ability to write to Data Links.][9750]
- [Added `Vector.build_multiple`, and better for support for errors and warnings
inside `Vector.build` and `Vector.build_multiple`.][9766]
- [Added `Vector.duplicates`.][9917]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -968,6 +969,7 @@
[9577]: https://github.com/enso-org/enso/pull/9577
[9750]: https://github.com/enso-org/enso/pull/9750
[9766]: https://github.com/enso-org/enso/pull/9766
[9917]: https://github.com/enso-org/enso/pull/9917

#### Enso Compiler

Expand Down
20 changes: 20 additions & 0 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data/Array.enso
Original file line number Diff line number Diff line change
Expand Up @@ -343,6 +343,26 @@ type Array
distinct : (Any -> Any) -> Vector Any
distinct self (on = x->x) = Array_Like_Helpers.distinct self on

## ALIAS duplicates
GROUP Selections
ICON preparation
Returns only non-unique elements within the array.

Arguments:
- on: A projection from the element type to the value of that element
which determines the uniqueness criteria.

The returned duplicate elements are kept in the same order as the
first duplicate appeared in the input.

> Example
Removing repeating entries.

[1, 3, 1, 2, 2, 1].to_array . duplicates == [1, 2].to_array
duplicates : (Any -> Any) -> Vector Any
duplicates self (on = x->x) =
Array_Like_Helpers.duplicates self on

## ICON dataframe_map_column
Applies a function to each element of the array, returning the `Vector` of
results.
Expand Down
9 changes: 7 additions & 2 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data/Map.enso
Original file line number Diff line number Diff line change
Expand Up @@ -118,12 +118,17 @@ type Map key value
not_empty : Boolean
not_empty self = self.is_empty.not

## GROUP Metadata
ICON metadata
## ICON metadata
Returns the number of entries in this map.
size : Integer
size self = @Builtin_Method "Map.size"

## GROUP Metadata
ICON metadata
Returns the number of entries in this map.
length : Integer
length self = self.size

## GROUP Calculations
ICON row_add
Inserts a key-value mapping into this map, overriding any existing
Expand Down
9 changes: 7 additions & 2 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data/Set.enso
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,17 @@ type Set
to_vector : Vector
to_vector self = self.underlying_map.keys

## GROUP Metadata
ICON metadata
## ICON metadata
Returns the number of elements in this set.
size : Integer
size self = self.underlying_map.size

## GROUP Metadata
ICON metadata
Returns the number of elements in this set.
length : Integer
length self = self.size

## GROUP Logical
ICON metadata
Checks if the set is empty.
Expand Down
20 changes: 20 additions & 0 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso
Original file line number Diff line number Diff line change
Expand Up @@ -1243,6 +1243,26 @@ type Vector a
distinct self (on = x->x) =
Array_Like_Helpers.distinct self on

## ALIAS duplicates
GROUP Selections
ICON preparation
Returns only non-unique elements within the vector.

Arguments:
- on: A projection from the element type to the value of that element
which determines the uniqueness criteria.

The returned duplicate elements are kept in the same order as the
first duplicate appeared in the input.

> Example
Removing repeating entries.

[1, 3, 1, 2, 2, 1] . duplicates == [1, 2]
duplicates : (Any -> Any) -> Vector Any
duplicates self (on = x->x) =
Array_Like_Helpers.duplicates self on

## ICON convert
Returns the vector as a `Vector`.
to_vector : Vector
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,13 @@ distinct vector on =
builder.append item
existing.insert key True

duplicates vector on = Vector.build builder->
vector.fold Map.empty current-> item->
key = on item
count = current.get key 0
if count == 1 then builder.append item
current.insert key count+1

take vector range = case range of
## We are using a specialized implementation for `take Sample`, because
the default implementation (which needs to be generic for any
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -311,10 +311,18 @@ rename_columns (naming_helper : Column_Naming_Helper) (internal_columns:Vector)
is_vec_pairs = mapping.is_a Vector && mapping.length > 0 && (mapping.first.is_a Text . not)
case is_vec_pairs of
True ->
## Attempt to treat as Map
map = Map.from_vector mapping
if map.is_error then Error.throw (Illegal_Argument.Error "A mapping Vector must be either a list of names or a list of pairs (old name to new name).") else
rename_columns naming_helper internal_columns map case_sensitivity error_on_missing_columns on_problems
## Check all pairs (Integer | Text | Regex => Text )
is_valid_row r = r.is_a Vector || r.is_a Pair
is_valid_key k = k.is_a Integer || k.is_a Text || k.is_a Regex
all_pairs = mapping.all p-> (is_valid_row p) && p.length == 2 && (is_valid_key p.first) && p.second.is_a Text
if all_pairs.not then Error.throw (Illegal_Argument.Error "mapping is not a Vector of old name to new name.") else
## Attempt to treat as Map
map = Map.from_vector mapping error_on_duplicates=False
if map.length == mapping.length then rename_columns naming_helper internal_columns map case_sensitivity error_on_missing_columns on_problems else
duplicates = mapping.duplicates on=_.first . map p->p.first.to_text
duplicate_text = if duplicates.length < 5 then duplicates.to_vector . join ", " else
duplicates.take 3 . to_vector . join ", " + (", ... " + (duplicates.length - 3).to_text + " others")
Error.throw (Illegal_Argument.Error "duplicate old name mappings ("+duplicate_text+").")
False ->
unique = naming_helper.create_unique_name_strategy
problem_builder = Problem_Builder.new error_on_missing_columns=error_on_missing_columns
Expand Down
7 changes: 7 additions & 0 deletions test/Base_Tests/src/Data/Vector_Spec.enso
Original file line number Diff line number Diff line change
Expand Up @@ -857,6 +857,13 @@ type_spec suite_builder name alter = suite_builder.group name group_builder->
alter [1, 1.0, 2, 2.0] . distinct . should_equal [1, 2]
alter [] . distinct . should_equal []

group_builder.specify "should return a vector containing only duplicate elements" <|
alter [1, 3, 1, 2, 2, 1] . duplicates . should_equal [1, 2]
alter ["a", "a", "a"] . duplicates . should_equal ["a"]
alter ['ś', 's', 's\u0301'] . duplicates . should_equal ['s\u0301']
alter [1, 1.0, 2, 2.0] . duplicates . should_equal [1.0, 2.0]
alter [] . duplicates . should_equal []

group_builder.specify "should be able to handle distinct on different primitive values" <|
alter [1, "a"] . distinct . should_equal [1, "a"]
alter ["a", 1] . distinct . should_equal ["a", 1]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from Standard.Base import all
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument

from Standard.Table import Position, Value_Type, Bits
from Standard.Table.Errors import all
Expand Down Expand Up @@ -503,6 +504,29 @@ add_specs suite_builder setup =
expect_column_names ["lpha", "beta", "gamma", "delta"] <|
data.table.rename_columns map

group_builder.specify "should report invalid input map nicely" <|
test_invalid_map map =
result = data.table.rename_columns map
result.should_fail_with Illegal_Argument
result.catch Any . message . should_equal "mapping is not a Vector of old name to new name."

test_invalid_map [["Alpha"]]
test_invalid_map [["Alpha", 1]]
test_invalid_map [["Alpha", "Beta", "Delta"]]
test_invalid_map [[True, "Beta"]]

group_builder.specify "should report duplicates in input map nicely" <|
test_duplicate_names map message =
result = data.table.rename_columns map
result.should_fail_with Illegal_Argument
result.catch Any . message . should_equal message

test_duplicate_names [["Alpha", "1"], ["Alpha", "2"]] "duplicate old name mappings (Alpha)."
test_duplicate_names [["Alpha", "1"], ["Beta", "2"], ["Gamma", "3"], ["Beta", "4"], ["Alpha", "5"]] "duplicate old name mappings (Beta, Alpha)."
test_duplicate_names [["Alpha", "1"], ["Alpha", "2"], ["Alpha", "3"]] "duplicate old name mappings (Alpha)."
test_duplicate_names [["Alpha", "1"], ["Beta", "2"], ["Gamma", "3"], ["Beta", "4"], ["Alpha", "5"], ["Gamma","6"], ["Delta","7"], ["Delta","8"]] "duplicate old name mappings (Beta, Alpha, Gamma, Delta)."
test_duplicate_names [["Alpha", "1"], ["Beta", "2"], ["Gamma", "3"], ["Beta", "4"], ["Alpha", "5"], ["Gamma","6"], ["Delta","7"], ["Delta","8"], ["Echo","9"], ["Echo","10"]] "duplicate old name mappings (Beta, Alpha, Gamma, ... 2 others)."

group_builder.specify "should correctly handle problems: unmatched names" <|
weird_name = '.*?-!@#!"'
map = Map.from_vector [["alpha", "FirstColumn"], ["omicron", "Another"], [weird_name, "Fixed"]]
Expand Down

0 comments on commit d97754d

Please sign in to comment.