Skip to content

Coding challenge for applications to ACM Research's Fall 2022 semester.

License

Notifications You must be signed in to change notification settings

cesar-gamez/ACM-Research-coding-challenge-22F

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

ACM Research coding challenge (Fall 2022)

Project Description/Rationale

Following the start of the COVID-19 pandemic, the used/new car market in the United States experienced a notable increase in purchasing/selling volume. The factors influencing this increase in the car market's volatility were seemingly ambiguous. Therefore, my ACM research project revolved around attempting to predict the factors that most heavily correlate to price volatility (both positively and negatively) within the car market- specifcially using the cars.com csv dataframe. Once the highest correlated factors were identified, I chose to visualize them through a bar chart.

Project Cleanup

The first step in preparing for this project was to reformat any data within the csv file that may have had undesirable prior formatting. The reformatted data included following columns: "Price" "Used/New" "Drivetrain"

Price Reformat

The original variable-type of the "Price" column was a String. To better read the data, I chose to reformat the string into an integer by removing the currency symbol and commas, then using the pd.astype() method.

Used/New Refomat

Any new cars within the dataframe were labeled with the format "{Manufacturer}-Certified". To reduce the difference in each of the datapoints, I reformatted all of the rows labelled with this format to "New", thus reducing variability.

Drivetrain Reformat

Each independent manufacturer had a respective method of defining their car's drivetrain. To reduce variability, I set an integer amount to the "wheel-drive" quantity of the car.

Correlation/Results

In conclusion, matplotlib and seaborn were both used to find and visualize the correlation between each of the car's datapoints and price. The following bar chart was the result of the fully processed data. In conclusion, mileage had the greatest inverse relation to price, and year had the greatest direct relation to price.

About

Coding challenge for applications to ACM Research's Fall 2022 semester.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%