This project aims to analyze real estate prices in Australia through a detailed process starting with Exploratory Data Analysis (EDA). EDA involves filling null values, removing duplicates, detecting and handling outliers, extracting date and year into new variables, assigning data types, and understanding the nature of variables—categorical, numerical discrete, or continuous—across the dataset.
Next, inferential statistics are employed to test hypotheses about property prices, room availability, and other parameters. Normality of variables is checked using Q-Q plots and the Shapiro-Wilk test. Based on business requirements, appropriate hypotheses are formulated. This ensures that statistical testing aligns with the research goals.
-
Setting the hypothesis (H0 & H1): Define the null and alternate hypotheses based on the research question/business objectives.
-
Selection of the Alpha Value (Level of Significance): Decide on the significance level, (Alpha).
-
Selection of suitable statistical test: Choose the correct statistical test, depending on the type of variable under study.
-
Calculation of the P-value: Compute the p-value from the test results.
-
Conclusion: Based on the p-value, either reject the null hypothesis or fail to reject it.
-
Shapiro-Wilk test- To assess whether a given sample of data comes from a normally distributed population.
-
One-sample t-test- To determine whether the mean of a single sample differs significantly from a known or hypothesized population mean.
-
Independent sample t-test- To compare the means of two independent groups to determine if there is a significant difference between them.
-
Binomial Distribution- To test hypotheses about a proportion in a binomial setting, where the outcome of interest can have two possible results (often termed "success" and "failure").
-
Normal Distribution- It is a statistical approach that uses the properties of the normal distribution to test hypotheses about population parameters, particularly means.
-
For the suburb Altona, it is postulated that a typical property sells for $800,000. Use the data at hand to test this assumption. Is the typical property price really $800,000 or has it increased? Use a significance level of 5%
-
For the year 2016, is there any difference in prices of properties sold in the summer months vs winter months? Consider months from October till March as winter months and the rest as summer months. Use a significance level of 5%.
-
For the suburb Abbotsford, what is the probability that out of 10 properties sold, 3 will not have car parking? Use the column car in the dataset. Round off your answer to 3 decimal places.
-
In the suburb Abbotsford, what are the chances of finding a property with 3 rooms? Round your answer to 3 decimal places.
-
In the suburb Abbotsford, what are the chances of finding a property with 2 bathrooms? Round your answer to 3 decimal places.