Data science is about turning rich data into actionable insight and making data impactful!
This course aims to familiarize you with various data science pipelines using examples with different data types. This course is suitable for students who already have some experience in processing data and will work (or are currently working) with a large amount of data, especially focusing on obtaining insights from data through prediction or explanation techniques. This course is not intended to cover all topics in data science exhaustively. Instead, it introduces ways of working with structured (e.g., sensor measurements) and unstructured data (e.g., text and image).
It is important to keep in mind that this course does not aim to teach you details in programming, machine learning, statistics, or visualization. Instead, this course will teach you how to integrate various techniques (e.g., data wrangling, statistical analysis, data modeling, data visualization) together to perform a data science task. Also, notice that this course assumes someone already collected datasets for you and does not teach you how to collect data in the real world. Data collection is a topic that could take a very long time to explain and is mostly out of the scope of this course.
By the end of the course, we expect you to be able to:
- Explain and execute the entire data science pipeline (including data pre-processing, wrangling, analysis, modeling, evaluation, and visualization).
- Perform data science tasks with images (e.g., object recognition), text (e.g., topic modeling), and structured data (e.g., those from sensor networks) using the Python programming language.
- Critically reflect on the model performance using various metrics and obtain meaningful insights from data analysis.
This course expects you to have the following prior knowledge:
- Intermediate level of Python programming (e.g., knowing different data types and data structures, knowing how to set up the Jupyter Notebook programming environment)
- Basic level of machine learning (e.g., knowing what supervised and unsupervised learning means, understanding the differences between classification and regression)
- Basic level of information visualization (e.g., knowing how to draw plots using python packages, understanding the differences between a bar chart and histogram)
- Basic level of research methods (e.g., knowing what “research questions” mean, understanding basic hypothesis testing methods like t-test)
- Lecture
- Seminar
- Self-study
- Jupyter Notebook with Python
There are two partial exams and weekly assignments.
All information of the course will be on Canvas.
Lectures will be given in English, as well as all the teaching materials and assessment materials. Work sessions will be given in either Dutch or English, depending on the TA’s choice.