-
Notifications
You must be signed in to change notification settings - Fork 4
Add support for extra features #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fix column names for extra feature df
More diagnostic logging
more diagnostics log final dataframe show nan rows track df size
row logging log fixes
Current error is dealing with plotting. Details
|
Incorporated changes from |
A problem for the current implementation is file size. For example, some data sets can be up to 6 GB if we use the current TSV format. We have to optimize loading of this file so that the memory and IO usage is limited. |
For the extra feature support, I am creating a Rust library to process the large files in a memory-optimized manner. The current flow is described in the diagram below.
graph TD
A["Expression Data"]
B["Extra Feature Files"]
C["Identify All Unique Genes"]
D["Save unified gene order to pkl file"]
E["Realign extra features with unified gene order"]
F["Save each feature as pkl file"]
A --> C
B --> C
C --> D
C --> E
E --> F
TODO:
|
Adds support for additional features to FunMap. Users can use the
extra_feature_file
key in their config to specify a TSV file that contains features for a gene pair in any scale.New Features
only_extra_features = true
in your config YAML file to ignore expression dataStatus
Implementation notes
all_average
curve in the LLR plot does not use the extra features, only the cohort informationOther Changes
pyproject.toml
instead ofsetup.py
.maturin
for Rust library integrationData Format
The format for the
extra_feature_file
is below. The first column is the first gene in the pair, the second column is the second gene. The following columns are the feature values. If a feature does not a value for the specified pair, it should have a value ofNA
.Columns are tab-separated.