// # Project 2: Survey Data Analysis:
CSC 172 - Project 2: Survey Data Analysis**
Team Members
- Ying Zhou
- Yuhan Wan (https://github.com/Ywan033)
Overview: This project analyzes survey data from lung cancer patients to extract insights about their quality of life. The data includes both demographic and medical information. We built a custom hash table to store and analyze the responses, and implemented several methods to compute distributions and life quality statistics.
How to Run the Project:
-
Make sure the following files are in the same directory:
Response.java,CustomHashTable.java,ReadFile.java,SurveyDataAnalyzer.java,Tests.java, andresponses.txt -
To run the analysis, execute: SurveyDataAnalyzer.java
-
To run the test suite, execute: Tests.java
File Format:
The program expects the responses.txt file to contain tab-separated values (TSV), not comma-separated, with the following fields:
ID, Gender, Age, Residence, Education, IncomeSource, MaritalStatus, Smoker, Year,
Q9, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q19, Q20, Q21, Q22,
Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30
Each row should have 31 values. The gender field is expected to be a letter: "F", "M", "O" (Other), or "-" (Unknown). You can adjust the data preprocessing if the dataset uses numeric codes instead.
Implemented Features:
We implemented the following 20 analysis methods in SurveyDataAnalyzer.java:
genderDistribution()ageGroupDistribution()residenceDistribution()educationDistribution()incomeDistribution()maritalDistribution()smokerDistribution()lifeQualityGeneral()lifeQualityGenderBased()lifeQualityAgeBased()lifeQualityResidenceBased()lifeQualityEducationBased()lifeQualityIncomeBased()lifeQualityMaritalBased()lifeQualitySmokerBased()mostCommonTreatment()mostCommonSymptoms()mostCommonLifeAspects()lifeQualityMixConditionsBased()lifeQualityResponseBased()
Each method has been tested using sample expected outputs in the Tests.java file.