Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io/read_metadata: Fix dtype for the index column #1746

Merged
merged 2 commits into from
Feb 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@
### Bug Fixes

* schema: document node property values support `url`. This feature has been supported in Auspice since v2.25.0. [#1743][] (@joverlee521)
* augur.io.read_metadata: Ensure that the index column's dtype is always "string" so that numeric ids don't get converted to numeric dtypes. [#1746][] (@joverlee521)

[#1743]: https://github.com/nextstrain/augur/pull/1743
[#1746]: https://github.com/nextstrain/augur/pull/1746

## 28.0.0 (30 January 2025)

Expand Down
2 changes: 1 addition & 1 deletion augur/io/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ def read_metadata(metadata_file, delimiters=DEFAULT_DELIMITERS, columns=None, id

if isinstance(dtype, dict):
# Avoid reading numerical IDs as integers.
dtype["index_col"] = "string"
dtype[index_col] = "string"

# Avoid reading year-only dates as integers.
dtype[METADATA_DATE_COLUMN] = "string"
Expand Down
34 changes: 34 additions & 0 deletions tests/functional/export_v2/cram/metadata-with-float-strains.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
Setup

$ source "$TESTDIR"/_setup.sh

Create files for testing.

$ cat >metadata.tsv <<~~
> strain field_A field_B
> 1.00 AA AAA
> 2.00 BB BBB
> 3.00 CC CCC
> 4.00 DD DDD
> 5.00 EE EEE
> 6.00 FF FFF
> ~~

$ cat >tree.nwk <<~~
> (1.00:1,(2.00:1,3.00:1)internalBC:2,(4.00:3,5.00:4,6.00:1)internalDEF:5)ROOT:0;
> ~~

Run export with tree and metadata with additional columns.
The metadata should match with the tree even though the names are floats
because we force the index column to be strings.

$ ${AUGUR} export v2 \
> --tree tree.nwk \
> --metadata metadata.tsv \
> --metadata-columns "field_A" "field_B" \
> --maintainers "Nextstrain Team" \
> --output dataset.json > /dev/null

$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" "$TESTDIR/../data/dataset-with-float-strains.json" dataset.json \
> --exclude-paths "root['meta']['updated']" "root['meta']['maintainers']"
{}
121 changes: 121 additions & 0 deletions tests/functional/export_v2/data/dataset-with-float-strains.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
{
"version": "v2",
"meta": {
"updated": "2025-02-06",
"maintainers": [
{
"name": "Nextstrain Team"
}
],
"colorings": [],
"filters": [],
"panels": [
"tree"
]
},
"tree": {
"name": "ROOT",
"node_attrs": {
"div": 0
},
"branch_attrs": {},
"children": [
{
"name": "1.00",
"node_attrs": {
"div": 1.0,
"field_A": {
"value": "AA"
},
"field_B": {
"value": "AAA"
}
},
"branch_attrs": {}
},
{
"name": "internalBC",
"node_attrs": {
"div": 2.0
},
"branch_attrs": {},
"children": [
{
"name": "2.00",
"node_attrs": {
"div": 3.0,
"field_A": {
"value": "BB"
},
"field_B": {
"value": "BBB"
}
},
"branch_attrs": {}
},
{
"name": "3.00",
"node_attrs": {
"div": 3.0,
"field_A": {
"value": "CC"
},
"field_B": {
"value": "CCC"
}
},
"branch_attrs": {}
}
]
},
{
"name": "internalDEF",
"node_attrs": {
"div": 5.0
},
"branch_attrs": {},
"children": [
{
"name": "4.00",
"node_attrs": {
"div": 8.0,
"field_A": {
"value": "DD"
},
"field_B": {
"value": "DDD"
}
},
"branch_attrs": {}
},
{
"name": "5.00",
"node_attrs": {
"div": 9.0,
"field_A": {
"value": "EE"
},
"field_B": {
"value": "EEE"
}
},
"branch_attrs": {}
},
{
"name": "6.00",
"node_attrs": {
"div": 6.0,
"field_A": {
"value": "FF"
},
"field_B": {
"value": "FFF"
}
},
"branch_attrs": {}
}
]
}
]
}
}
Loading