This dataset collects information from 100k medical appointments in Brazil and is focused on the question of whether or not patients show up for their appointment. A number of characteristics about the patient are included in each row.
(Image is from a copyright-free website: https://www.pexels.com/royalty-free-images/.)
- ScheduledDay tells us on what day the patient set up their appointment;
- Neighborhood’ indicates the location of the hospital;
- Scholarship’ indicates whether or not the patient is enrolled in Brasilian welfare program Bolsa Família;
- Be careful about the encoding of the last column: it says ‘No’ if the patient showed up to their appointment, and ‘Yes’ if they did not show up.
Table of Contents |
---|
Prerequisites 🔍📜 |
Design 📐 |
Conclusions 📌 |
License 🔖 |
- Python 3.6.3
- Jupyter Notebook
- Anaconda-Navigator
Step One - Choose Data Set
Click this link to download the corresponding data.
Step Two - Get Organized
This project eventually contain:
- The report communicating any findings;
- Any Python code used during the analysis;
- The data set;
Step Three - Analyze
Brainstorm some questions that could be answered using the data set, then start answering those questions, we would mainly focus on looking at the relationships between multiple variables.
In current study, a good amount of profound analysis has been carried out. Prior to each step, deailed instructions was given and interpretions was also provided afterwards. The dataset included 110527 pieces of patients's information from only 2016, which is substantial but limited to only one year. Therefore, even based on such large amount of data, the analysis would not be very representative. The good aspect of current study was it didn't include NaN values nor duplicates, which could affect the process of analysis.