Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter: --group-by fails to error properly when used with date-only values #795

Closed
victorlin opened this issue Dec 3, 2021 · 3 comments
Closed
Labels
bug Something isn't working

Comments

@victorlin
Copy link
Member

While writing tests for #794, I noticed a couple edge case-y bugs.

Setup

Create a simple metadata file without a date column:

echo 'strain\tcountry
1\tA
2\tB' > metadata.tsv

Group by year

augur filter \
    --metadata metadata.tsv \
    --group-by year \
    --subsample-max-sequences 10 \
    --output-strains out.txt

Output:

WARNING: The specified group-by categories (['year']) were not found. No sequences-per-group sampling will be done. Note that using 'year' or 'year month' requires a column called 'date'.
Sampling at 10 per group.
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/nextstrain-native/bin/augur", line 33, in <module>
    sys.exit(load_entry_point('nextstrain-augur', 'console_scripts', 'augur')())
  File "/Users/vlin/repos/augur/augur/__main__.py", line 10, in main
    return augur.run( argv[1:] )
  File "/Users/vlin/repos/augur/augur/__init__.py", line 75, in run
    return args.__command__.run(args)
  File "/Users/vlin/repos/augur/augur/filter.py", line 1537, in run
    queues_by_group[group].add(
KeyError: ('_dummy',)

This should fail earlier instead of trying to use _dummy.

Group by year and month

augur filter \
    --metadata metadata.tsv \
    --group-by year month \
    --subsample-max-sequences 10 \
    --output-strains out.txt

Output:

WARNING: A 'date' column could not be found to group-by year or month.
Filtering by group may behave differently than expected!
Sampling at 10 per group.
WARNING: A 'date' column could not be found to group-by year or month.
Filtering by group may behave differently than expected!
0 strains were dropped during filtering
	0 of these were dropped because of subsampling criteria
2 strains passed all filters

This should fail instead of passing.

@huddlej
Copy link
Contributor

huddlej commented Dec 3, 2021

Thank you for calling this out, @victorlin! This might be a subset of #754, unless you think there's enough difference to keep this issue open. I'd love to see better error handling in augur filter (and generally in augur). Some of the current implementation decisions date back to the first version of Augur and could definitely be revisited.

Repository owner moved this from New to Done in Nextstrain planning (archived) Dec 3, 2021
@victorlin
Copy link
Member Author

Didn't mean to close this 🤦

@victorlin victorlin reopened this Dec 3, 2021
Repository owner moved this from Done to New in Nextstrain planning (archived) Dec 3, 2021
@victorlin
Copy link
Member Author

@huddlej

This might be a subset of #754, unless you think there's enough difference to keep this issue open.

When I opened this, the KeyError: ('_dummy',) was a different behavior so keeping this issue separate made sense. However, the behavior is consistent now after merging #794. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

No branches or pull requests

2 participants