Skip to content

Commit

Permalink
Merge pull request #1746 from nextstrain/fix-metadata-index-dtype
Browse files Browse the repository at this point in the history
io/read_metadata: Fix dtype for the index column
  • Loading branch information
joverlee521 authored Feb 7, 2025
2 parents 325b37b + 489fa92 commit 8920730
Show file tree
Hide file tree
Showing 4 changed files with 158 additions and 1 deletion.
2 changes: 2 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@
### Bug Fixes

* schema: document node property values support `url`. This feature has been supported in Auspice since v2.25.0. [#1743][] (@joverlee521)
* augur.io.read_metadata: Ensure that the index column's dtype is always "string" so that numeric ids don't get converted to numeric dtypes. [#1746][] (@joverlee521)

[#1743]: https://github.com/nextstrain/augur/pull/1743
[#1746]: https://github.com/nextstrain/augur/pull/1746

## 28.0.0 (30 January 2025)

Expand Down
2 changes: 1 addition & 1 deletion augur/io/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ def read_metadata(metadata_file, delimiters=DEFAULT_DELIMITERS, columns=None, id

if isinstance(dtype, dict):
# Avoid reading numerical IDs as integers.
dtype["index_col"] = "string"
dtype[index_col] = "string"

# Avoid reading year-only dates as integers.
dtype[METADATA_DATE_COLUMN] = "string"
Expand Down
34 changes: 34 additions & 0 deletions tests/functional/export_v2/cram/metadata-with-float-strains.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
Setup

$ source "$TESTDIR"/_setup.sh

Create files for testing.

$ cat >metadata.tsv <<~~
> strain field_A field_B
> 1.00 AA AAA
> 2.00 BB BBB
> 3.00 CC CCC
> 4.00 DD DDD
> 5.00 EE EEE
> 6.00 FF FFF
> ~~

$ cat >tree.nwk <<~~
> (1.00:1,(2.00:1,3.00:1)internalBC:2,(4.00:3,5.00:4,6.00:1)internalDEF:5)ROOT:0;
> ~~

Run export with tree and metadata with additional columns.
The metadata should match with the tree even though the names are floats
because we force the index column to be strings.

$ ${AUGUR} export v2 \
> --tree tree.nwk \
> --metadata metadata.tsv \
> --metadata-columns "field_A" "field_B" \
> --maintainers "Nextstrain Team" \
> --output dataset.json > /dev/null

$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" "$TESTDIR/../data/dataset-with-float-strains.json" dataset.json \
> --exclude-paths "root['meta']['updated']" "root['meta']['maintainers']"
{}
121 changes: 121 additions & 0 deletions tests/functional/export_v2/data/dataset-with-float-strains.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
{
"version": "v2",
"meta": {
"updated": "2025-02-06",
"maintainers": [
{
"name": "Nextstrain Team"
}
],
"colorings": [],
"filters": [],
"panels": [
"tree"
]
},
"tree": {
"name": "ROOT",
"node_attrs": {
"div": 0
},
"branch_attrs": {},
"children": [
{
"name": "1.00",
"node_attrs": {
"div": 1.0,
"field_A": {
"value": "AA"
},
"field_B": {
"value": "AAA"
}
},
"branch_attrs": {}
},
{
"name": "internalBC",
"node_attrs": {
"div": 2.0
},
"branch_attrs": {},
"children": [
{
"name": "2.00",
"node_attrs": {
"div": 3.0,
"field_A": {
"value": "BB"
},
"field_B": {
"value": "BBB"
}
},
"branch_attrs": {}
},
{
"name": "3.00",
"node_attrs": {
"div": 3.0,
"field_A": {
"value": "CC"
},
"field_B": {
"value": "CCC"
}
},
"branch_attrs": {}
}
]
},
{
"name": "internalDEF",
"node_attrs": {
"div": 5.0
},
"branch_attrs": {},
"children": [
{
"name": "4.00",
"node_attrs": {
"div": 8.0,
"field_A": {
"value": "DD"
},
"field_B": {
"value": "DDD"
}
},
"branch_attrs": {}
},
{
"name": "5.00",
"node_attrs": {
"div": 9.0,
"field_A": {
"value": "EE"
},
"field_B": {
"value": "EEE"
}
},
"branch_attrs": {}
},
{
"name": "6.00",
"node_attrs": {
"div": 6.0,
"field_A": {
"value": "FF"
},
"field_B": {
"value": "FFF"
}
},
"branch_attrs": {}
}
]
}
]
}
}

0 comments on commit 8920730

Please sign in to comment.