`row-map` can obliterate data #452

harold · 2025-02-11T00:51:54Z

This seems surprising, at least a first glance:

user> (-> (ds/->dataset {:a [1 2 3]})
          (ds/row-map (fn [{:keys [a]}]
                        (when (= 2 a)
                          {:a 0 :b (inc a)}))))
_unnamed [3 2]:

| :a | :b |
|---:|---:|
|    |    |
|  0 |  3 |
|    |    |

Unsure if it's a bug, or somehow explicable.

harold · 2025-02-11T00:53:04Z

Link to clojurians thread for more context: https://clojurians.slack.com/archives/C0BQDEJ8M/p1739231087522179

harold · 2025-02-12T17:05:05Z

This is explicable, and the docs are actually pretty good: https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.html#var-row-map

Map a function across the rows of the dataset producing a new dataset that is merged back into the original potentially replacing existing columns.

The new dataset produced by row-map has the same number of rows as the input dataset, and a set of columns derived from the keys in the maps returned by the map-fn. Then the merge happens at the column level, so in the case above, the mapped dataset has :a and :b columns, with the first and last rows empty (due to the map-fn returning nil). The merge overwrites the original dataset with the new one.

There are cases where this behavior is desirable (related to joins and redaction), so I recommend we not change this behavior at this time.

harold closed this as completed Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`row-map` can obliterate data #452

`row-map` can obliterate data #452

harold commented Feb 11, 2025

harold commented Feb 11, 2025

harold commented Feb 12, 2025

row-map can obliterate data #452

row-map can obliterate data #452

Comments

harold commented Feb 11, 2025

harold commented Feb 11, 2025

harold commented Feb 12, 2025

`row-map` can obliterate data #452

`row-map` can obliterate data #452