Releases: pachterlab/gget
v0.29.0 - cbio, opentargets, bgee and more
- New modules:
gget enrichr
now also supports species other than human and mouse (fly, yeast, worm, and fish) via modEnrichRgget mutate
:
gget mutate
will now merge identical sequences in the final file by default. Mutation creation was vectorized to decrease runtime. Improved flanking sequence check for non-substitution mutations to make sure no wildtype kmer is retained in the mutation-containing sequence. Addition of several new arguments to customize sequence generation and output.gget cosmic
:
Added support for targeted as well as gene screens. The CSV file created for gget mutate now also contains protein mutation info.gget ref
:
Added out file option.gget info
andgget seq
:
Switched to Ensembl POST API to increase speed (nothing changes in front end).- Other "behind the scenes" changes:
- Unit tests reorganized to increase speed and decrease code
- Requirements updated to allow newer mysql-connector versions
- Support Numpy>= 2.0
v0.28.6 - gget mutate, download_cosmic, fixes for Ensembl v112
- New module:
gget mutate
gget cosmic
: You can now download entire COSMIC databases using the argumentdownload_cosmic
argumentgget ref
: Can now fetch the GRCh37 genome assembly usingspecies='human_grch37'
gget search
: Adjust access of human data to the structure of Ensembl release 112 (fixes issue 129)
v0.28.4 - Fix Windows bug in gget elm setup
Fix Windows bug in gget elm setup
v0.28.3 - cosmic, invertebrates for ref and search, elm improvements
gget search
andgget ref
now also support fungi 🍄, protists 🌝, and invertebrate metazoa 🐝 🐜 🐌 🐙 (in addition to vertebrates and plants)- New module:
gget cosmic
gget enrichr
: Fix duplicate scatter dots in plot when pathway names are duplicatedgget elm
:- Changed ortho results column name 'Ortholog_UniProt_ID' to 'Ortholog_UniProt_Acc' to correctly reflect the column contents, which are UniProt Accessions. 'UniProt ID' was changed to 'UniProt Acc' in the documentation for all
gget
modules. - Changed ortho results column name 'motif_in_query' to 'motif_inside_subject_query_overlap'.
- Added interaction domain information to results (new columns: "InteractionDomainId", "InteractionDomainDescription", "InteractionDomainName").
- The regex string for regular expression matches was encapsulated as follows: "(?=(regex))" (instead of directly passing the regex string "regex") to enable capturing all occurrences of a motif when the motif length is variable and there are repeats in the sequence (https://regex101.com/r/HUWLlZ/1).
- Changed ortho results column name 'Ortholog_UniProt_ID' to 'Ortholog_UniProt_Acc' to correctly reflect the column contents, which are UniProt Accessions. 'UniProt ID' was changed to 'UniProt Acc' in the documentation for all
gget setup
: Use theout
argument to specify a directory the ELM database will be downloaded into. Completes this feature request.gget diamond
: The DIAMOND command is now run with--ignore-warnings
flag, allowing niche sequences such as amino acid sequences that only contain nucleotide characters and repeated sequences. This is also true for DIAMOND alignments performed withingget elm
.gget ref
andgget search
back-end change: the current Ensembl release is fetched from the new release file on the Ensembl FTP site to avoid errors during uploads of new releases.gget search
:- FTP link results (
--ftp
) are saved in txt file format instead of json. - Fix URL links to Ensembl gene summary for species with a subspecies name and invertebrates.
- FTP link results (
gget ref
:- Back-end changes to increase speed
- New argument:
list_iv_species
to list all available invertebrate species (can be combined with therelease
argument to fetch all species available from a specific Ensembl release)
v0.28.2 - NCBI server issues and gget elm expand
gget info
: Return a logging error message when the NCBI server fails for a reason other than a fetch fail (this is an error on the server side rather than an error withgget
)- Replace deprecated 'text' argument to find()-type methods whenever used with dependency
BeautifulSoup
gget elm
: Remove false positive and true negative instances from returned resultsgget elm
: Addexpand
argument
v0.28.0 - gget elm + gget diamond
- Updated documentation of
gget muscle
to add a tutorial on how to visualize sequences with sequence name lengths + slight change to returned visualization so it's a bit more robust to varying sequence names gget muscle
now also allows a list of sequences as input (as an alternative to providing the path to a FASTA file)- Allow missing gene filter for
gget cellxgene
(fixes bug) gget seq
: Allow missing gene names (fixes #107)- New arguments for
gget enrichr
: Use argumentskegg_out
andkegg_rank
to create an image of the KEGG pathway with the genes from the enrichment analysis highlighted (thanks to this PR by Noriaki Sato) - New modules:
gget elm
andgget diamond
co-authored-by: @anhchi172
v0.27.9 - gget enrichr background genes
v0.27.8 - Fixed bug in gget pdb; add release argument to gget search
- Fixed bug in gget pdb
- Added new
release
argument to gget search
Also see: https://pachterlab.github.io/gget/updates.html
Co-contributor: @anhchi172
v0.27.7 - Cleaned up requirements; gget alphafold compatibility with Python>=3.10
Moved dependencies for modules gget gpt and gget cellxgene from automatically installed requirements to gget setup.
Updated gget alphafold dependencies for compatibility with Python >= 3.10.
Added census_version argument to gget cellxgene.
v0.27.5 - Compatibility with pandas v2.0.0
Updated gget search to function correctly with new Pandas version 2.0.0 (released on April 3rd, 2023) as well as older versions of Pandas
Updated gget info with new flags uniprot and ncbi which allow turning off results from these databases independently to save runtime (note: flag ensembl_only was deprecated)
All gget modules now feature a -q / --quiet (Python: verbose=False) flag to turn off progress information
Co-author of this release: @anhchi172