Project Submission for First Project of Udacity Data Science Course - Write a Data Science Project
Weblog here
Weblog project here
This project collects data from Stack Overflow Survey - years 2011 and 2020. The main idea is to compare how users from some kind of group refers frow what programming language. How preferences are changing by time. And what to expect from next generation about using programming languages.
Some basic questions about how programmers preferences evolved by time can say something for the future of programming languages:
- what is the general evolution of Stack Overflow users by time
- what is the most refered language;
- and how it evolved in time;
- if US reality in programming languages usage is different from the rest of the world;
- what is the tendency, based on Z-Generation preferences.
- A library of about 20 functions for automatic preprocessing dataframes, for reshaping data:
- you can adapt them as your wish;
- and use Jupyter Notebook, or other Python IDE to fork and test your forkings.
- A fully documented Jupyter Notebook calling the functions, and showing the results:
- steps and challenges were documented on this Notebook;
- graphs illustrate my findings.
- both .csv datasets necessary for running the project:
- readme for the 2020 survey was also attached to this commitment;
- you can find everything related to these datasets, includin licences for using the data, here
This software is based on MIT Licence. The complete description of the licence can be readed from Wikipedia
Eduardo Passeto epasseto@gmail.com
Versions:
- 0.1 and 0.2 pre Alpha, where incomplete versions, made between april and may 2021;
- version 1.0, from july 2021 is the actual working version of the system.
Necessary files for running the project:
- input files:
- 2011 Stack Overflow Survey Results 2011.csv → for 2011 data
- survey_results_public.csv → for 2020 data
- Jupyter Notebook:
- First_Projectn.ipynb → Jupyter Notebook (Python 3.X)
- Python library:
- udacourse.py → Python 3.X functions collection
External libraries:
- Pandas
- Numpy
- Math
- Matplotlib
- Seaborn
- Time