This is Final Capstone Project for ALY6040 Data Mining Fall 2021 CPS.
Primarily to learn Data Analytics, Data Mining and Python.
Residential and commercial properties were assessed in Boston. The Boston Globe reported in May 2021 that the competitive Boston housing market drives up costs. As the pandemic continues, people demand larger homes. Finding a home became more difficult as most property managers and realtors could not display their properties to several people. This post was written to help individuals, realtors, and real estate brokers find a property at a reasonable price. We selected to use a few basic machine learning concepts to help us determine the best selling price for the house based on the amount of rooms, location, design, and other characteristics about the bath and kitchen. We only focused on residential property because it was in demand. This study's goal was to improve on initial EDA work by constructing predictive models that solved our business concerns. Finally, optimizing the model's performance.
View Paper Online · Report Bug · Request Feature
See the open issues for a list of proposed features (and known issues)
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the GPL v3 License. See LICENSE for more information.
Project Link: https://mascarenhasneil.github.io/Boston-Property-Assessment/
Click to expand!
- 40 Techniques Used by Data Scientists. (2020). Data Science Central. https://www.datasciencecentral.com/profiles/blogs/40-techniques-used-by-data-scientists
- Bhattacharyya, S. (2020, September 28). Ridge and Lasso Regression: L1 and L2 Regularization. Medium. https://towardsdatascience.com/ridge-and-lasso-regression-acomplete-guide-with-python-scikit-learn-e20e34bcbf0b
- Brendel, C. (2021, December 14). Quickly Compare Multiple Models - Towards Data Science. Medium. https://towardsdatascience.com/quickly-test-multiple-models-a98477476f0
- Brownlee, J. (2021, April 27). How to Develop a Light Gradient Boosted Machine (LightGBM) Ensemble. Machine Learning Mastery. https://machinelearningmastery.com/lightgradient-boosted-machine-lightgbm-ensemble/
- ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. (2020, June 30). Stack Overflow. Retrieved December 5, 2021, from https://stackoverflow.com/questions/62658215/convergencewarning-lbfgs-failed-toconverge-status-1-stop-total-no-of-iter
- Duca, A. L. (2021, October 24). Data Preprocessing with Python Pandas — Part 5 Binning. Medium. https://towardsdatascience.com/data-preprocessing-with-python-pandas-part-5-binning-c5bd5fd1b950
- How can I determine the optimal binning system for a continuous variable in Python? (2020, December 8). Cross Validated. Retrieved December 5, 2021, from https://stats.stackexchange.com/questions/499941/how-can-i-determine-the-optimalbinning-system-for-a-continuous-variable-in-pyth
- Malik, U. (2021, December 1). Principal Component Analysis (PCA) in Python with ScikitLearn. Stack Abuse. Retrieved December 3, 2021, from https://stackabuse.com/implementing-pca-in-python-with-scikit-learn/
- Miller, T. W. (2021). Modeling Techniques In Predictive Analytics With Python And R: A Guide To Data Science (1st ed.) [E-book]. Pearson Education.
- N. (2021, October 29). Key data science modeling techniques used in data evaluation and analysis. Selerity. https://seleritysas.com/blog/2021/01/22/key-data-science-modelingtechniques-used-in-data-evaluation-and-analysis/
- sklearn.feature_selection.SequentialFeatureSelector. (2010). Scikit-Learn. Retrieved December 4, 2021, from https://scikitlearn.org/stable/modules/generated/sklearn.feature_selection.SequentialFeatureSelector.html
- sklearn.linear_model.LogisticRegression. (n.d.). Scikit-Learn. Retrieved December 4, 2021, from https://scikitlearn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
- statsmodels Principal Component Analysis — statsmodels. (n.d.). StatsModel. Retrieved December 4, 2021, from https://www.statsmodels.org/dev/examples/notebooks/generated/pca_fertility_factors.html
- What is the difference between pandas.qcut and pandas.cut? (2015, May 13). Stack Overflow. Retrieved December 5, 2021, from https://stackoverflow.com/questions/30211923/whatis-the-difference-between-pandas-qcut-and-pandas-cut
- Wijaya, C. Y. (2021, October 12). 5 Feature Selection Method from Scikit-Learn you should know. Medium. Retrieved December 5, 2021, from https://towardsdatascience.com/5-feature-selection-method-from-scikit-learn-you-should-know-ed4d116e4172