GeoGrouper

Groups GEO (Gene Expression Omnibus) samples based on the keywords that the sample descriptions share.

Installation

To install the geogrouper package, follow the following steps. The latter steps (5-8) involve the installation of python_mcl, the MCL clustering algorithm implementation in python that this uses.

cd <path/to/your/working/directory>
git clone https://github.com/mnpatil17/GeoGrouper
cd GeoGrouper
pip install -e .
git clone https://github.com/koteth/python_mcl # do this inside the outer GeoGrouper directory
cd python_mcl
python setup.py install
cd ..

Usage

Using the geogrouper package is very simple. The primary method is cluster_descriptions_from_file

To cluster from a file:

from geogrouper import cluster_descriptions_from_file
clusters_for_each_series = cluster_descriptions_from_file(path_to_data_file)

To cluster from a file AND print to terminal as you go:

from geogrouper import cluster_descriptions_from_file
clusters_for_each_series = cluster_descriptions_from_file(path_to_data_file, should_print_output=True)

To cluster from a file AND print to terminal only the series that have at least N samples:

from geogrouper import cluster_descriptions_from_file
clusters_for_each_series = cluster_descriptions_from_file(path_to_data_file, should_print_output=True, print_series_sample_size=N)

To cluster a list of sample descriptions with some additional description text (abstract_text):

from geogrouper import cluster_descriptions
clusters, mcl_matrix = cluster_descriptions(sample_titles_list, abstract_text)

Modules

geo_id.py: handles reading from a specified datatable
keywords.py: has multiple methods for finding keywords for a series (not all are used)
geogrouper.py: the primary file, which handles the clustering. cluster_descriptions_from_file is the primary method
utils.py: various utility functions

Potential Future Optimizations

The file keywords.py contains the logic to find keywords from GEO data. Currently there are two methods:

get_acronyms()
get_common_words()

Changing the way the main algorithm finds keywords will change the effectiveness of the algorithm. Therefore, to iterate on this algorithm, changing the way keywords are found is a great way to improve performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GeoGrouper

Installation

Usage

Modules

Potential Future Optimizations

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
GeoGrouper		GeoGrouper
python_mcl		python_mcl
.gitignore		.gitignore
README.md		README.md
geo_io.py		geo_io.py
geogrouper.py		geogrouper.py
keywords.py		keywords.py
setup.py		setup.py
utils.py		utils.py

mnpatil17/GeoGrouper

Folders and files

Latest commit

History

Repository files navigation

GeoGrouper

Installation

Usage

Modules

Potential Future Optimizations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages