Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

row-map can obliterate data #452

Closed
harold opened this issue Feb 11, 2025 · 2 comments
Closed

row-map can obliterate data #452

harold opened this issue Feb 11, 2025 · 2 comments

Comments

@harold
Copy link
Contributor

harold commented Feb 11, 2025

This seems surprising, at least a first glance:

user> (-> (ds/->dataset {:a [1 2 3]})
          (ds/row-map (fn [{:keys [a]}]
                        (when (= 2 a)
                          {:a 0 :b (inc a)}))))
_unnamed [3 2]:

| :a | :b |
|---:|---:|
|    |    |
|  0 |  3 |
|    |    |

Unsure if it's a bug, or somehow explicable.

@harold
Copy link
Contributor Author

harold commented Feb 11, 2025

Link to clojurians thread for more context: https://clojurians.slack.com/archives/C0BQDEJ8M/p1739231087522179

@harold
Copy link
Contributor Author

harold commented Feb 12, 2025

This is explicable, and the docs are actually pretty good: https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.html#var-row-map

Map a function across the rows of the dataset producing a new dataset that is merged back into the original potentially replacing existing columns.

The new dataset produced by row-map has the same number of rows as the input dataset, and a set of columns derived from the keys in the maps returned by the map-fn. Then the merge happens at the column level, so in the case above, the mapped dataset has :a and :b columns, with the first and last rows empty (due to the map-fn returning nil). The merge overwrites the original dataset with the new one.

There are cases where this behavior is desirable (related to joins and redaction), so I recommend we not change this behavior at this time.

@harold harold closed this as completed Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant