Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter: --query fails when the .str accessor is used on a column #1277

Closed
victorlin opened this issue Aug 11, 2023 · 0 comments · Fixed by #1278
Closed

filter: --query fails when the .str accessor is used on a column #1277

victorlin opened this issue Aug 11, 2023 · 0 comments · Fixed by #1278
Assignees
Labels
bug Something isn't working

Comments

@victorlin
Copy link
Member

victorlin commented Aug 11, 2023

First reported in a discussion post.

Current Behavior

Given a metadata file with a string column column, any query using column.str results in an error.

cat >metadata.tsv <<~~
strain	column
SEQ_1	value1
SEQ_2	value2
SEQ_3	value3
~~

augur filter \
  --metadata metadata.tsv \
  --query "column.str.startswith('value')" \
  --output-strains filtered_strains.txt
# ERROR: Internal Pandas error when applying query:
# 	'NoneType' object has no attribute 'str'
# Ensure the syntax is valid per <https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-query>.

This is because extract_variables added in 2ead5b3 is unable to properly extract column as a variable name.

Expected behavior

The filter should run successfully.

augur filter \
  --metadata metadata.tsv \
  --query "column.str.startswith('value')" \
  --output-strains filtered_strains.txt
# 0 strains were dropped during filtering
# 4 strains passed all filters

Possible solutions

  1. Fix extract_variables to support the column.str accessor. This isn't trivial since the error message doesn't provide the column name, which is what the function relies on for extracting variables from the query string.
  2. Remove extract_variables and attempt type conversion of all metadata columns to numeric.

After realizing that extract_variables isn't easy to implement properly, I'm leaning towards option 2.

Affected versions

This bug impacts Augur version 22.2.0.

@victorlin victorlin added the bug Something isn't working label Aug 11, 2023
@victorlin victorlin self-assigned this Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

1 participant