Skip to content

tonydev-rsa/docs.portfolio

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 

Repository files navigation

Ade-s_Portfolio

- Missing Values Evaluation & Inputation.

Missing data is a common issue in data engineering and we often get data with nulls.

Challenge: The goal of this task is to fill nulls in length and reading columns of the given geospatial drillhole data.

- Web Scraping.

Challenge : Scraping SEC for inside trading information.

SEC Form 4 is used by officers, directors, and other corporate “insiders” to notify the U.S. Securities and Exchange Commission (SEC) of their personal transactions in their company's securities. For this exercise, please write a small scraper to extract information from Form 4 on SEC site.

- Web Scraping.

What is the theme of this webpage?

Background: We would like to track the word counts to analyse the main focus of the following webpage (included main and sub pages): https://www.verisk.com/.

Challenge: Write a simple web scraper to achieve the above task? Please include details in your python notebook. Please explain how you validate your result and findings?

- Dimentionality Reduction - PCA and t-SNE

Air pollution is all around us, indoors, outdoors, in cities, and in the countryside. The rise in permissible concentrations of various pollutants observed over the last few decades contributes to rising pollution levels. As a result, affecting the weather conditions in an unfavorable manner and the formation of smog phenomena. In fact, the affected air quality level in the atmosphere has a negative impact on individual health and may cause an economic imbalance. Diseases caused by rising pollution levels are one of the major issues confronting urban settlements

Objective :

- Explore and visualize the Air Pollution dataset, which contains information about air molecules and pollutants found in the air

- Reduce the number of features by using dimensionality reduction techniques such as PCA and t-SNE, and extract insights about the data.

- K Means, K Mediods, Gaussian Mixture Model, Hierarchical Clustering, DBSCAN

The study of socio-economic factors is foundational to understanding and shaping the future of societies and hence of extreme interest to various government and non-government institutions. While GDP is one of the important measures used in one of the popular economic vernacular, it is not the only measure of the growth and the state of an economy. This project aims to deep dive into one such dataset that contains various socio-economic attributes for countries around the world.

Objective : Identify if there exist various clusters of countries that are more similar to each other than others, in terms of certain socio-economic factors.

- Eigenvector, Betweenness, and Closeness Centrality measures

Project is based on information on the CAVIAR project and the role of certain individuals arrested following the investigation. The investigation lasted two years and ran from 1994 to 1996. The operation was brought together by investigation units of the Montreal police and the Royal Canadian Mounted Police of Canada. During these two years, 11 wiretap warrants, valid for about two months each, were obtained (11 matrices match these phases).

Objective - We will analyze and understand a time-varying criminal network that was repeatedly disrupted by police forces, and how the criminal network reoriented in response to the seizures of product by the police forces.

- Linear Regression, Ridge Regression, Decision tree regressor, Random Forest Regressor, Boosting Regressor Model, Hyperparameter Tuning

In today’s modern world, huge shopping centers such as big malls and marts are recording data related to sales of items or products as an important step to predict the sales and get an idea about future demands that can help with inventory management. Understanding what role certain properties of an item or store play and how they affect their sales is imperative to any retail business

The Data Scientists at Superkart have collected data for each products across 4 different type of stores. Also, certain attributes of each product and store have been defined. Using this data, Superkart is trying to understand the properties of products and stores which play a key role in increasing sales.

Objective - Build a predictive model that can find out the sales of each product at a particular store and the total sales of a particular store, then provide actionable recommendations to the Superkart sales team to understand the properties of products and stores which play a key role in increasing sales.

- Linear Regression, Ridge Regression, Decision tree regressor, Random Forest Regressor, Boosting Regressor Model, Hyperparameter Tuning

The problem at hand is to predict the housing prices of a town or a suburb based on the features of the locality provided to us. In the process, we need to identify the most important features affecting the price of the house. We need to employ techniques of data preprocessing and build a linear regression model that predicts the prices for the unseen data. (Kaggle Knowledge Hackathons)

Goal - The goal to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable.

- Logistics Regression, Decision Trees, Random Forest, Boosting Models , Voting Classifier, Hyperparameter Tuning

Travel time of the trains along with passenger information is published in a file named ‘Traveldata_train.csv’. These passengers were later asked to provide their feedback on various parameters related to the travel along with their overall experience. These collected details are made available in the survey report labeled ‘Surveydata_train.csv’

In the survey, each passenger was explicitly asked whether they were satisfied with their overall travel experience or not, and that is captured in the data of the survey report under the variable labeled ‘Overall_Experience’.

- The goal is to predict whether a passenger was satisfied or not considering his/her overall experience of traveling on the Shinkansen Bullet Train.

- LDA, QDA, Logistics Regression, Decision Trees, Random Forest, Boosting Models , Hyperparameter Tuning

One of the leading financial institutions in India wants to leverage Machine Learning techniques to determine the client’s loan repayment abilities and take proactive steps to reduce the magnitude of exposure to default

We are going to implement a number of Supervised Learning Classification Techniques such as LDA, QDA, Logistics regression and Boosting Classification Techniques to building a predictive model.

Goal: The goal of the problem is to predict whether a client will default on the loan payment or not, given the recent data of all the loan transactions. This can help the institution to distinguish future applicants who might default. For each ID in the Test Dataset, you must predict the “Default” level.

- Logistics Regression, Decision Trees, Random Forest, Boosting Models , Hyperparameter Tuning

McCurr Health Consultancy is an MNC that has thousands of employees spread across the globe. The company believes in hiring the best talent available and retaining them for as long as possible. A huge amount of resources is spent on retaining existing employees through various initiatives. The Head of People Operations wants to bring down the cost of retaining employees. For this, he proposes limiting the incentives to only those employees who are at risk of attrition.

Objective: - To identify the different factors that drive attrition in McCurr Health Consultancy - To build a model to predict if an employee will attrite or not

- Decision Trees, Random Forest, Boosting Models , Hyperparameter Tuning

The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew. While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others. (Kaggle Knowledge Hackathons)

Goal - The objective of this problem is to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using the passenger data (i.e., name, age, gender, socio-economic class, etc.).

- Time Series - AR, MA, ARMA and ARIMA

Crude oil production is considered one of the most important indicators of the global economy. Crude oil production forecasting is an important input into the decision-making process and investment scenario evaluation, which are crucial for oil-producing countries. Governments and businesses spend a lot of time and resources figuring out the production forecast that can help to identify opportunities and decide on the best way forward.

Objective - In this case study, we will analyze and use historical oil production data, from 1992 to 2018, for a country to forecast its future production. We need to build a time series forecasting model using the AR, MA, ARMA, and ARIMA models in order to forecast oil production.

Goal - The objective of this problem is to build a time series model that can forecast the consume price index (CPI) for the next 5 years from the dataset provided.

- Deep Learning - Artificial (Feed Forward) Neural Networks

One of the most interesting tasks in deep learning is to recognize objects in natural scenes. The ability to process visual information using machine learning algorithms can be very useful as demonstrated in various applications.

The SVHN dataset contains over 600,000 labeled digits cropped from street-level photos. It has been used in neural networks created by Google to improve the map quality by automatically transcribing the address numbers from a patch of pixels. The transcribed number with a known street address helps pinpoint the location of the building it represents.

Objective: To build a feed-forward neural network model that can recognize the digits in the images.

- Deep Learning - Convolutional Neural Networks

One of the most interesting tasks in deep learning is to recognize objects in natural scenes. The ability to process visual information using machine learning algorithms can be very useful as demonstrated in various applications.

The SVHN dataset contains over 600,000 labeled digits cropped from street-level photos. It has been used in neural networks created by Google to improve the map quality by automatically transcribing the address numbers from a patch of pixels. The transcribed number with a known street address helps pinpoint the location of the building it represents.

Objective: To build a CNN model that can recognize the digits in the images.

- Deep_Learning - Convoulutional Neural Networks

The facial expression recognition system built in this project is a technology capable of recognising the expression on a human face from a digital image or a database of faces, and this can form the basis on which other actions can be performed.

The aim of this project is to use Deep Learning and Artificial Intelligence techniques to create a computer vision model that can accurately detect facial emotions. The model should be able to perform multi-class classification on images of facial expressions, to classify the expressions according to the associated emotion.

Objective: Detect facial expressions and classify whether ‘happy’, ‘sad’, ‘surprise’ or ‘neutral’.

- Transfer Learning - VGG16, ResNet v2, and Efficient Net.

The facial expression recognition system built in this project is a technology capable of recognising the expression on a human face from a digital image or a database of faces, and this can form the basis on which other actions can be performed

The model should be able to perform multi-class classification on images of facial expressions, to classify the expressions according to the associated emotion 'happy', 'sad', 'neutral', or 'surprise'.

Objective: Using transfer learning to find the most efficient Deep Learning architecture that can accurately classify images of facial expressions into 4 classes namely 'happy', 'sad', 'neutral', or 'surprise'.

About

Data_Science_Projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published