Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. In simple words, the aim is to segregate groups with similar traits and assign them into clusters.
Hierarchical clustering involves creating clusters that have a predetermined ordering from top to bottom. For example, all files and folders on the hard disk are organized in a hierarchy. There are two types of hierarchical clustering, Divisive and Agglomerative
K means is an iterative clustering algorithm that aims to find local maxima in each iteration. This algorithm works in these 5 steps :
Step 1: Specify the desired number of clusters K.
Step 2: Randomly assign each data point to a cluster.
Step 3: Compute cluster centroids.
Step 4: Re-assign each point to the closest cluster centroid.
Step 5: Re-compute cluster centroids.
Repeat steps 4 and 5 until no improvements are possible : Similarly, we’ll repeat the 4th and 5th steps until we’ll reach global optima. When there will be no further switching of data points between two clusters for two successive repeats. It will mark the termination of the algorithm if not explicitly mentioned.
A cluster includes core points that are neighbors (i.e. reachable from one another) and all the border points of these core points. The required condition to form a cluster is to have at least one core point. Although very unlikely, we may have a cluster with only one core point and its border points.
Customer Personality Analysis is a detailed analysis of a company’s ideal customers. It helps a business to better understand its customers and makes it easier for them to modify products according to the specific needs, behaviors, and concerns of different types of customers. Customer personality analysis helps a business to modify its product based on its target customers from different types of customer segments. For example, instead of spending money to market a new product to every customer in the company’s database, a company can analyze which customer segment is most likely to buy the product and then market the product only on that particular segment.
- Check for unwanted columns, null values, replacing null values, duplicates etc.
- Perform label encoding
- Uni-variate analysis with considering relationships with other variables.
- Bi-variate analysis without considering relationships with other variables
- Scaling and Normalization
-
Feature Engineering
-
Basic Transformations
-
Data Pre- Processing
-
Model building
-
Model Building
-
Creating a python script
-
Create front-end: Python
-
Model Deploy