Skip to content
View gmurage's full-sized avatar
πŸ’­
Working
πŸ’­
Working

Block or report gmurage

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
gmurage/README.md

Hi there, I'm Gladys Murage πŸ‘‹

PROFESSIONAL SUMMARY

AI-focused Data Scientist and current PhD Candidate (GPA: 3.89, Expected 2027) with 4+ years of experience specializing in transforming complex datasets into strategic initiatives that boost business and clinical performance. Experienced in healthcare, sales analytics, crime prediction, forecasting, customer segmentation and education, showcasing expertise with projects such as developing predictive models with over 96% accuracy and boosting student engagement by 70% through data-driven curriculum enhancements. Committed to leveraging AI, advanced analytics ,and machine learning techniques to support innovation and drive insightful solutions in dynamic settings.

About Me

πŸŽ“ PhD Candidate in Data Science at National University (Graduation expected in 2027)

Data Scientist at Bizmpya.com with nearly 4 years of hands-on experience driving data-driven insights and strategies.

Data Science Intern at Huntershightech.com (6 months) specializing in predictive modeling and data analytics.

πŸ’‘ Passionate About:

  • Harnessing AI and machine learning to solve complex, real-world problems.

  • Applying advanced predictive analytics to transform industries like Healthcare and Biopharmaceuticals with cutting-edge ML techniques.

πŸ“Š Specializations:

  • Predictive Analytics in healthcare, business, real estate, sales, crime prediction, Education & Census Data: Utilizing machine learning to extract insights and forecast trends.

  • Inventory Forecasting: Expertise in using time series models (ETS, ARIMA, SARIMA) to optimize store inventory management.

  • Classification & Regression: Proficient in ensemble methods (XGBoost, Random Forest, Bagging), decision trees, LDA, logistic regression, and KNN.

  • Linear Regression: Applying regression techniques to real-world scenarios, such as predicting housing prices, college GPA, and used vehicle prices.

  • Dimensionality Reduction: Extensive experience with PCA to reduce data complexity while retaining key information.

  • Regularized Regression: Skilled in using Elastic Net, Lasso, and Ridge regression for model optimization.

  • Advanced Predictive Methods: Strong expertise in SVM, Naive Bayes, and Polynomial Regression for classification and regression tasks.

  • Clustering & Customer Segmentation: Proficient in K-means and Hierarchical Clustering for market segmentation and targeted strategies.

πŸ”§ Technical Proficiencies:

  • Programming: Python, R, MySQL.

  • Data Science Libraries: Pandas, NumPy, Scikit-learn, TensorFlow, Keras, and more.

  • Advanced ML Techniques: Hyperparameter tuning, cross-validation, and feature engineering to maximize model performance.

πŸ“ˆ Business Acumen:

  • Proven expertise in Sales, Marketing, and Entrepreneurship, combining data insights with strategic decision-making.
  • Excellent ability to convey complex scientific knowldge in an easy to understand way to stakeholders using created data visualizations to show and tell.

Technical Skills

  • Languages & Tools:
    • Programming: Python (Pandas, NumPy, Scikit-learn, Seaborn, Matplotlib), R, MySQL
    • Analysis & Visualization: Jupyter, Google Colab, Tableau
    • Machine Learning: Supervised/Unsupervised Learning, Neural Networks (PyTorch, TensorFlow), Ensemble Methods (Random Forest, XGBoost, SVM)
    • Advanced Modeling: Polynomial Regression, Spline Regression, ETS, ARIMA/SARIMA
    • Platforms: KNIME, GitHub, Cloud (Azure, AWS, GCP)
    • CRM: Veeva, Salesforce.com

Machine Learning & Data Science

Framingham Heart Study Analysis

  • Identifies key risk factors for mortality using machine learning models

  • Implements Logistic Regression, Random Forest, and XGBoost

  • Provides insights to improve preventive healthcare strategies

Time-Series Store Sales Forecasting

  • Predicts future store sales using time-series models

  • Applies ETS and ARIMA forecasting techniques

  • Helps businesses optimize inventory and sales decisions

Unsupervised Machine Learning for Wine Data

  • Clusters wine samples based on characteristics using unsupervised learning

  • Uses PCA to reduce dimensionality and K-means for clustering

  • Supports sommelier decision-making and product differentiation

Obesity Risk Prediction Models

  • Classifies obesity risk factors using various machine learning models

  • Implements XGBoost, Random Forest, Decision Tree, Elastic Net, Bagging, LDA, SVM, Naive Bayes, and Multinomial Logistic Regression

  • Can assist healthcare professionals in early intervention strategies

Abalone Age Prediction Models

  • Predicts the number of rings in abalones (age indicator)

  • Applies Generalized Additive Models (GAM), Cubic Spline Regression, Principal Component Regression (PCR), Elastic Net, and Random Forest vs. XGBoost

  • Enhances seafood industry insights for sustainable harvesting

Exploratory & Analytical Work

Horsepower vs. MPG Prediction

  • Analyzes vehicle fuel efficiency trends using non-linear regression

  • Uses Polynomial Regression, Cubic Spline, and GAM models

  • Helps inform automotive engineering decisions for improved fuel economy

Exploratory Data Analysis (EDA)

  • Conducts detailed statistical analysis for data insights

  • Utilizes descriptive statistics and visualization techniques

  • Supports data-driven decision-making across industries

Database & Business Intelligence Project

  • Develops and analyzes a structured database using MySQL Workbench

  • Focuses on database management and optimization

  • Provides business intelligence insights

Portfolio & GitHub Contributions

Gladys Murage Data Science Portfolio

  • A curated collection showcasing various data science projects

  • Organizes work for professional presentation

GitHub Repo Management & Contributions

  • Demonstrates proficiency in version control and project organization

  • Includes cloned repositories and hands-on GitHub exploration

Connect with Me

Thanks for stopping by my profile. Let’s build innovative solutions together!

Pinned Loading

  1. Multinomial-Logistic-Regression Multinomial-Logistic-Regression Public

    A multinomial Logistic Regression classification model is carried out on Kaggle.com data, in order to carry out multi class prediction of obesity risk.

    Jupyter Notebook 1

  2. Support-Vector-Machine-SVM- Support-Vector-Machine-SVM- Public

    A Support Vector Machine (SVM) classification model is carried out on Kaggle.com data, in order to carry out multi class prediction of obesity risk.

    Jupyter Notebook 1

  3. Decision-Tree- Decision-Tree- Public

    A decision tree model carried out to classify Obesity based on Multi-Class Obesity Classification data provided by Kaggle.com.

    Jupyter Notebook 1

  4. Random-Forest Random-Forest Public

    A Random Forest model is carried out to classify Obesity based on Multi-Class Obesity Classification data provided by Kaggle.com.

    Jupyter Notebook 1

  5. XGBOOST XGBOOST Public

    An Extreme Gradient Boosting (XGBOOST) model is carried out to classify Obesity based on Multi-Class Obesity Classification data provided by Kaggle.com.

    Jupyter Notebook 1