Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior between --exclude-where and --query for missing columns #853

Closed
victorlin opened this issue Feb 17, 2022 · 2 comments
Labels
bug Something isn't working duplicate This issue or pull request already exists

Comments

@victorlin
Copy link
Member

Current Behavior

  • --exclude-where nonexistent_column='value' ignores the missing column silently
  • --query nonexistent_column!='value' raises a pandas UndefinedVariableError

Expected behavior

--exclude-where nonexistent_column='value' and --query nonexistent_column!='value' should produce the same effect.

How to reproduce

Setup:

echo 'strain\tcountry\tdate
SEQ1\tA\t2018-03-24
SEQ2\tB\t2018-03-25' > metadata.tsv

Run with --exclude-where:

$ augur filter \
  --metadata metadata.tsv \
  --exclude-where nonexistent_column='value' \
  --output-strains out.txt
0 strains were dropped during filtering
2 strains passed all filters

Run with --query:

$ augur filter \
  --metadata metadata.tsv \
  --query nonexistent_column!='value' \
  --output-strains out.txt
...
pandas.core.computation.ops.UndefinedVariableError: name 'nonexistent_column' is not defined

Possible solution

Both exclusion options should be consistent, though behavior could be any of the following:

  1. Pass silently (current behavior for --exclude-where)
  2. Pass with warning
  3. Error (current behavior for --query, but raise a custom FilterException to hide internal pandas implementation)

Your environment: if running Nextstrain locally

  • augur 14.0.0
@victorlin victorlin added the bug Something isn't working label Feb 17, 2022
@victorlin
Copy link
Member Author

Related to #754, which is about --group-by behavior with missing columns.

@victorlin
Copy link
Member Author

Oops, I misread #754. I see it is also mentions filter options briefly, so this issue is actually a subset of that one.

@victorlin victorlin added the duplicate This issue or pull request already exists label Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

1 participant