Skip to content

rintychy/Data-Science-and-Python-work

Repository files navigation

Data-Science-and-Python-work

It contains my course work and project for Data Science, Data Mining, Bioinformatics, and Digital Image Processing classes.

Data Science

Post-Storm Imegery Classification

Description: I build a model to classify and analysis post-storm images from NOAA to help researchers to study the effects of storms. I created python code to compress large images and to find the center of an image. I did statistical analysis with the data such as basic statistics, distribution modeling, hypothesis testing, correlation and coveriance.
Work type: group project.
Programming language and tools: Python 3, Jupyter Notebook, PyCharm.
Frames and libraries: pandas, numpy, csv, matplotlib, sklearn, seaborn, scipy, math.

Find best date night movie and survival rate of titanic by age, sex, and class

Description: Using Joining, Groupby, Sorting, find the best date night movie and titanic passengers survival ratio by age, class, and sex.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.
Frames and libraries: OS, pandas, HTML.

Random team generator

Description: I created a algorithm of random team generator based on the given team number.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.

Data Mining

Association Rule Mining Algorithm

Description: Mining interesting association rule between movies based on it's genre and rating using Apriori algorithm.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.
Frames and libraries: pandas, mlxtend, apriori, association_rules, transactionEncoder.

Data preprocessing and correlation analysis

Description: Find the features of the data, smoothing, mormalization and correlation analysis among the attributes.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.
Frames and libraries: pandas, numpy, scipy, sklearn.

LDA Topic Modeling

Description: Find top 20 topics out of the whole dataset and find top 10 words of each topic. Then using the top 10 words, try to figure out the topic. Also figure out how to topic has changed over time using topic modeling.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.
Frames and libraries: pandas, csv, gensim, nltk, pprint, LdaModel, WordNetLemmatizer, corpora.

Machine Learning Classification for Sentiment Analysis

Description: Performed sentiment analysis on the tweets data using machine learning Logistic Regression classification.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.
Frames and libraries: pandas, csv, re, string, nltk, itertools, sklearn, LogisticRegression.

Bioinformatics

Hypothesis testing to compare multiple algorithms

Description: Created null hypothesis, conducted t-statistic, p-value and then calculated mean.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.
Frames and libraries: pandas, scipy.

Machine Learning model on iris dataset

Description: Applied KNN classification model into the iris dataset.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.
Frames and libraries: pandas, numpy, sklearn, KNeibhborsClassifier.

Regression Model

Description: Using multiple linear regression model predict semantic similarity based on the given input data. Calculated R-square value and drawn residual plots.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.
Frames and libraries: pandas, statsmodels, seaborn, matplotlib.

Semantic Similarity

Description: Calculate semantic similarity between pair of data by using Jaccard, Resnik, AllPair, BestPair similarity algorithms.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.
Frames and libraries: pandas, numpy, colections.

String Matching using Z and KMP algorithms

Description: I created python script for Z and KMP algorithm to check if a pattern is present in the given text or not.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.

Digital Image Processing

Arithmetic Operation, Point Processing, and Geometric Transformation

Description: Using the given input image file, I created python script and performed subtraction, negation, and translation on the images.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.
Frames and libraries: numpy.

Co-occurrence Matrix, Correlation Coefficient, Downsampling, and Upsampling

Description: I have written python script for co-occurrence matrix, correlation coefficient, downsampling and upsampling algorithms. Then using the given image file, I performed those algorithms on the images.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.
Frames and libraries: numpy, math.

Histogram Processing and Spatial Filtering Algorithms

Description: I have written python script for histrogram processing, gaussian filtering, unsharp masking and highpass filtering. Then using the given image file, I performed those algorithms on the images.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.
Frames and libraries: numpy, math, subtract.

Morphological Algorithm, Edge Detection, and Boundary Extraction

Description: I have written python script for dilation, erosion, boundary extraction, edge detection by using gradient operators. Then using the given image file, I performed those algorithms on the images.
Work type: Individual course work.
Programming language and tools: Python 3, Jupyter Notebook.
Frames and libraries: numpy, math.

About

It contains my work for Data Science, Data Mining, Bioinformatics, and Digital Image Processing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published